10 things I learned while deploying my first python function to AWS Lambda

I spent a few days on and off trying to deploy a Flask REST service to AWS Lambda, just to experience what the cool kids were talking about. These are some of the things I learned along the way:

 

Zappa is the easiest packager/deployer for python (as of December 2018)

Zappa provides good quality feedback on the packaging/deployment process. It’s compatible with all the popular python REST frameworks. Minimally configure, deploy and you are done. It packages your whole environment and your own modules, so you will face minimal “module not found” errors.

Zappa provides a “tail” function that allows you to debug the errors in your deployment directly from the command line.

 

Zappa doesn’t work properly with Anaconda (as of December 2018)

I use Anaconda for environment management. Zappa is geared towards venv users. There are “hacky” ways of making it work. Setting the VIRTUAL_ENV environment variable, according to https://github.com/Miserlou/Zappa/issues/167, got me a long way. But I had a third-party module that kept failing. I spent 6 hours of a weekend that I’ll never get back on this.

 

Anaconda’s environment management is very different than venv

Files are stored elsewhere (conda info –envs is your friend). I’ve used Anaconda since forever, so I didn’t know that venv stored files locally within your project. Ugh! (am I getting this wrong? please tell me so). By default, Anaconda stores your requirements outside of your project folder. Naturally, this wreaks havoc with anything that is expecting an environment folder within your project.

 

Chalice is almost as good as Zappa

Chalice is the native tool from AWS for doing these kind of things. It’s not a packager / deployer like Zappa but a whole framework, so you have to refactor your code from whatever framework you are using. Fortunately for me the syntax is almost the same as Flask.

Chalice relies on your requirements.txt as a guide to package the dependencies of your lambda functions. Good if you have a messy environment.

Debugging the deployment is brutal.

 

Chalice doesn’t automatically package your own modules

According to the documentation you have to put all your modules in a magical “chalicelibs” directory. Even after doing that, I still had import problems (importing a local module that was importing another local module). I solved it by spelling out the location, using “from . import mymodule”

 

AWS Lambda packages are restricted to 50MB

Messy environment? too many requirements? you are out of luck. AWS Lambda packages are limited to 50MB, zipped.

I actually had a huge 28MB library (zipped!) that I had to strategically trim down (hi, technical debt!) in order to fit it into my package.

Probably this is a sign that I shouldn’t be uploading a python class, but the actual methods independently. I know, it’s lambda functions not lambda class with everything and the kitchen sink

 

You can upload a class with a bunch of methods and a REST api and enjoy the benefits of serverless

Not the best pattern, not the most efficient solution, but hey, for small stuff works. You get a million requests for free.

 

You can decrease the response time if you increase the memory allocation of your function

I was getting a 500ms response time, good but an order of magnitude slower that I was getting on my laptop. Until I read this text below the “Memory” slider in the Lambda console

“Your function is allocated CPU proportional to the memory configured.”

I moved the slider to 1024MB and the response time went down to 125ms!

 

Lambda functions are a good solution for APIs

I’ve been using Docker since 2016 for packaging small python APIs, but deploying them and managing the Dockerfiles is kind of a pain. If you manage to refactor your application down to single-purpose methods, AWS Lambda offers an impressively easy way to deploy pay-per-use load-balanced functions in a secure server whose hardware and OS stack you don’t have to manage. This schema works best for functions that will be run sporadically and don’t need an always-on server.
Come to think of it, few functions are rarely running *all* the time, and if the inputs are the same, then you can leverage the Cloudfront cache. This cost calculator might help you estimate total cost once you deploy.

 

Links that saved me (beside the docs)

The Right Way™ to do Serverless in Python

Building Serverless Python Apps Using AWS Chalice

The fear and frustration of migrating a simple web app to serverless

 

Uploading and downloading documents from Amazon S3 using bash

You need to upload a file to S3 and cannot install new packages in the server, nor the s3 client tools. You only have bash, openssl and sed. Go.

I found and adapted a script by Viktor Szakats, that creates all the proper headers expected by the latest AWS API. I had lots of trouble with whitespaces and empty lines. Be mindful of that when modifying these scripts:

s3-upload-aws4.sh

s3-download-aws4.sh

run them like:

. s3-upload-aws4.sh "myfile.ext" "my_bucket_name" "folder_name_in_bucket" "region" "REDUCED_REDUNDANCY" "MY_AWS_ACCESS_KEY" "MY_AWS_SECRET_KEY"

 

Protected: Readability scoring of the United Nations Corpus

This content is password protected. To view it, drop me a line at info@danielpradilla.info, and enter your password below:

Recommender system for finding subject matter experts using the Enron email corpus

This is a little project to create a recommender system to find mentors inside an organization, using Natural Language Processing. It started as an excuse to build a data visualization I had in mind: an interactive word cloud that… did something. When I started, I didn’t know anything about Topic Modeling, Topic Extraction, or Natural Language Processing; and fell head first into a rabbit hole.

TL;DR:

Topic extraction is deep and potentially rewarding. Sanitize properly. SpaCy and Gensim are your friends. Search YouTube for knowledge. This is related to “Topic Extraction from Scientific Literature for Competency Management” and “The Author-Topic Model for Authors and Documents“. Get the code of this project at https://github.com/danielpradilla/enron-playground

»

Getting IP location information with Angular 7

Using Angular Maps Components and a new service called ipapi, you will be able to quickly put together something that will allow you to get IP information from a client and put it on a map.

Angular Maps Components is really great, and the setup with ipapi is a no-brainer (they have a free tier for 30,000 requests or under). It literally took me more time to wait for the angular project to be set up than to implement the whole thing!

The code is in github: https://github.com/danielpradilla/angular-ipapi