An easier way to build AWS Lambda deployment packages — with Docker instead of EC2 (updated for 2020)

Aneesh Karve
Quilt
Published in
4 min readSep 5, 2018

--

Our team writes AWS Lambda functions in Python 3.6. As a result, we need to build Lambda deployment packages that contain custom Python modules. The AWS-recommended way of doing this is tedious, requiring developers to log in to an EC2 machine, install dependencies, zip site-packages, etc.

Needless to say, building deployment packages in your own development environment is risky. Your environment is probably different from the Lambda environment in meaningful ways. Python wheels with compiled binaries, for example, may work on Mac but fail on Linux.

Fortunately, AWS publishes an amazonlinux image that is nearly identical to the AMI that Lambda functions use. In this article, we’ll use theamazonlinux image to script the creation of Lambda deployment packages.

The goal is a faster edit-build-test cycle for Lambda functions.

Source code

The quiltdata/lambda repository contains this article’s source code, plus detailed comments. An image is available on Docker Hub, under the same name.

Which image do Lambda tasks use?

Python 3.6 Lambda tasks currently use a variant of the amazonlinux:2018.03 AMI, as documented by AWS. The aforementioned AMI is publicly available, but downloading it for use in a Docker container isn’t straightforward. Instead, we’re going to rely on a similar image,amazonlinux:2018.03, published by AWS on DockerHub.

What dependencies are in the image?

We can use AWS console to create a test lambda function as follows:

import json
import subprocess
def lambda_handler(event, context):
# TODO implement
print(subprocess.check_output(["pip --version"], shell=True))
print(subprocess.check_output(["pip freeze"], shell=True))
print(subprocess.check_output(["cat /etc/system-release"], shell=True))
return {
"status": 200
}

Be sure to select the runtime you wish to emulate, then toss an empty test event ({}) at your function, and Lambda will cough up its AMI release, pip version, and installed packages.

That leads us to the following Dockerfile:

FROM amazonlinux:2018.03

# Need to set "ulimit -n" to a small value to stop yum from hanging:
# https://bugzilla.redhat.com/show_bug.cgi?id=1715254#c1
RUN ulimit -n 1024 && yum -y update && yum -y install \
git \
gcc \
python36 \
python36-pip \
python36-devel \
zip \
&& yum clean all

COPY requirements.txt quilt/requirements.txt

RUN python3 -m pip install pip==18.1

# Requirements copied from lambda Python 3.6, but not in base image
# (Plus Cython which is a build-time requirement for numpy)
RUN python3 -m pip install -r quilt/requirements.txt

# Make it possible to build numpy:
# https://github.com/numpy/numpy/issues/14147
ENV CFLAGS=-std=c99

We’ve added Cython as a build-time dependency since some of our packages depend on numpy, and numpy requires Cython to build.

Build the container

docker build -t quiltdata/lambda:build .

Alternatively, docker pull quiltdata/lambda:build.

Build the deployment package

Now we’ll write a Bash script, package.sh, to create our deployment package:

#!/bin/bashmkdir tmp411
git clone "${GIT_REPO}"
python3 -m pip install REPO/DIR -t tmp411
rm -f /io/lambda.zip
cp -r /io/* tmp411
cd tmp411
zip -r /io/lambda.zip *

All that remains is to run package.sh inside our container:

docker run --rm -v $(pwd)/create_table:/io -t \
-e GIT_REPO quiltdata/lambda \
bash /io/package.sh

We use the volume /io as the parent directory for our Lambda handler, the parent directory for package.sh, and as the destination for the deployment package, lambda.zip. --rm prevents Docker from saving the container’s disk state to the host.

How to reduce package size

There are few ways to reduce the size of our deployment package:

  1. Delete *.py files (and retain *.pyc files), see ralienpp/simplipy
  2. Profile our handler and retain only those files that it uses, see Slimming down lambda deployment zips
  3. Download additional dependencies “just in time” from S3, see Large applications on AWS Lambda

For Quilt’s purposes, I chose not to do #1 since it only saves 3 MB (out of 51 MB). I haven’t tried #2 or #3, and probably won’t until and unless our deployment package surpasses 100 MB.

Alternative approaches

The Reddit thread, How can I add third party python dependencies to a Lambda function? mentions this Dockerfile from aws-samples. The aws-samples approach is workable, but is less than ideal since it rebuilds the container for each deployment.

Easier, still needs work

Do you have ideas for further optimizations? Contributions are welcome on GitHub.

Acknowledgements

Thanks to Rob Newman and Dima Ryazanov for feedback on this article.

--

--

Data, visualization, machine learning, and abstract algebra. CTO and co-founder @QuiltData. Recent talks https://goo.gl/U9VYr5.