Building an intereactive coffee map with AWS Lambda and DynamoDB

The idea to have a map showing countries I’ve tried coffee beans from has been haunting me for a while now. This week I’ve finally found some time to dedicate to this project. I’ve also always wanted a tool for documenting all the beans I’ve tried in a convenient way. Those two ideas gave birth to a new concept. A coffee map, a password protected input page on my website where I can give in the data about beans, a database to store that data, and another page on my website to display an up-to-date generated world map with markers of the countries I’ve tried beans from.

Concerning server hosting costs, an AWS Lambda function seemed like a very fitting solution. My website is hosted in AWS S3, so having the project on a single cloud provider kept things simple. For the same reasons I decided on DynamoDB as a database. With the whole NoSQL vs Relational DB conundrum I was not positive DynamoDB would fit my future use cases, but I’ve decided to not worry about it for now. I planned to save the minimum data needed: bean name, country of origin, and brew method. I created a table “coffee_beans_by_country” and here is what it looks like in the end:

Mmm yum I just tried a delicious coffee in a local cafe. Exactly for this reason I want an easy and accessible on-the-go way to save bean entries. I’ve decided to have a input form on my website for this purpose:

The input form is hidden under a URL that is not linked anywhere on the website, but the security issue still bothered me. Making this page password protected turned out to be not a very trivial task. Initially, I set up the input page HTML to be served by the Lambda function itself and to save input data to my newly created DynamoDB. When I tried to add basic HTTP Authentication to this setup though, it turned out AWS Lambda renames it’s headers, making it impossible to just send the needed headers to the client. After doing some research, I realized it would be way easier to create the form separately as a regular static website page and leave just the saving to the Lambda function. Then, since I am already distributing my website through S3 plus CloudFront, I can use Lambda@Edge to create a small authentication function that will be triggered by a viewer request to my specific URL. Here is the tutorial I found most helpful. My Lambda function looks very similar to the one there:

In my case I didn’t want to restrict access to the whole website though, just to a specific URL. I needed to create a cache behavior associated with my URL in CloudFront and then use this cache behavior when configuring CloudFront trigger for the authentication Lambda function.

So now, whenever someone will get hold of my secret URL for saving new coffee beans with a vicious plan to compromise my coffee map data, they would be forced to authenticate themselves first! Like so:

In the end the Lambda function itself looks something like this:

	import boto3
	from flask import request, Response
	import datetime
	from uuid import uuid4
	def save_map_data():
	    name = request.form['name']
	    country_of_origin = request.form['country_of_origin']

	    filter_brew = True
	    espresso_brew = True

	    if request.form['brew_method'] == 'filter_brew':
		filter_brew = True
		espresso_brew = False

	    elif request.form['brew_method'] == 'espresso_brew':
		filter_brew = True
		espresso_brew = False

	    coffee_beans_id = datetime.datetime.now().strftime(
		'%Y-%m-%d-%H-%M-%S-') + str(uuid4())
	    dynamodb = boto3.client('dynamodb')
	    dynamodb.put_item(TableName='coffee_beans_by_country', Item={"coffee_beans_id": {
		"S": coffee_beans_id}, "country_of_origin": {
		"S": country_of_origin}, "name": {"S": name}, "espresso_brew": {"BOOL": espresso_brew}, "filter_brew": {"BOOL": filter_brew}})
	    return Response(response=f"Inserted new coffee entry with id {coffee_beans_id}", status=200)

Having figured out how to input and save data to the DB securely, I moved on to creating the map on a separate publicly accessible page of my site. I decided to use the plotly library for the visualization since I was already familiar with its world map option. I created a dedicated ../coffee/map URL on my website and started a new Lambda function that would retrieve the needed data from DynamoDB, place it in a pandas dataframe, group it as needed and create a plot. The only trick I needed during this step was turning the country code I save to DB back to country name for better readability. For that I wrote a small helper function that uses pycountry library to get the country name for corresponding code:

def _get_country_code(country_code):
    return pycountry.countries.get(alpha_3=country_code).name

It was my first experience with DynamoDB, so to keep it simple I wrote logic to:

retrieve all items from the table with the .scan() method paginating through the results in a loop to work around the 1 MB request limit
create a pandas dataframe with the data
group by country of origin to have markers sized according to how many times I’ve tried coffee from a certain country
create a plot with plotly

Here is what it looks like:

	dynamodb = boto3.resource('dynamodb', config=config)
	table = dynamodb.Table('coffee_beans_by_country')
	response = table.scan()
	data = pd.DataFrame(response['Items'])
	while 'LastEvaluatedKey' in response:
		response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
		data.append(pd.read_json(response['Items']))
	data = data.groupby("country_of_origin", as_index=False).agg(
		count=("name", "size"), beans=("name", list))
	data['country_name'] = data['country_of_origin'].apply(_get_country_code)
	fig = px.scatter_geo(data, locations="country_of_origin", color="country_name", size="count",
                         hover_data=['beans'],
                         projection="miller")
	graphJSON = json.dumps(fig, cls=plotly.utils.PlotlyJSONEncoder)
	html_page_with_map = render_template('plotly_layout.html', graphJSON=graphJSON)

And the map:

After everything was set up and working I came across the problem that the map was loading very slowly. I guess whenever cache is invalidated it takes quite a while for lambda function to spin up and serve the html. I’ve decided to save the html that gets rendered every time I add new bean data directly to the s3 bucket instead, making sure the file is ready whenever someone wants to access it. So, to keep the map up-to-date, I’ve wrapped the first function (that saves the entry to DB) and the second function (reading from DB and rendering the map) into one general lambda function that gets triggered whenever I hit “Save” in my input form for new beans entry, using a simple Javascript AJAX function. It came at the cost of having a bit longer saving time when saving a new bean (since it rerenders the map and at the same time now), but at least I am sure no website visitors have to wait for the map page to load.

	s3 = boto3.resource('s3')
	object = s3.Object('mariashears.com', 'coffee/map/index.html')
	object.put(Body=html_page_with_map, ContentType='text/html')

Almost there but not quite. This setup didn’t fully work. Turns out CloudFront doesn’t return default root objects from subfolders, so client requests are only getting rewritten to index.html at the root of the bucket. The official AWS solution to this is using Lambda@Edge function again, that will run on the CloudFront edge nodes and look for these patterns and request the appropriate object key from the S3 origin - obviously for money. This seemed like something not really worth paying for for me, so I started looking for possible workarounds and found this post. So I added the following two lines to my script and adjusted my deployment script and it started working as expected:

	folder_file = s3.Object('mariashears.com', 'coffee/map')
	folder_file.put(Body=html_page_with_map, ContentType='text/html')
	slash_file = s3.Object('mariashears.com', 'coffee/map/')
	slash_file.put(Body=html_page_with_map, ContentType='text/html')

There is definitely room for improvement with this project. I think next I’ll enhance the visualization and the DB eventually to save more parameters, but for now I am very happy with the start.