Building AWS Lambda Layers with Docker & Amazon Linux 2

A Practical Example with GhostScript for PDF Processing

Ali Saif
AxOps Academy

--

Image generated by DALL.E 3

Being able to integrate tooling such as GhostScript into serverless applications on AWS, opens up a wide array of possibilities in solution architectures for advanced PDF processing use-cases.

This post is inspired by our day to day at AxOps, where we regularly solve a variety of engineering problems on AWS that often require us to develop our own solutions if the necessary tooling isn’t supported out-of-the-box.

This post is, therefore, a quick, no-fluff tutorial for developers interested in using GhostScript for performing PDF processing in AWS Lambda Functions with Python runtimes, a problem we had to solve by building our own Lambda Layer from scratch.

Note: A quick Google Search leads to plenty of publicly available Lambda Layers for GhostScript. In order to comply with our clients’ security and compliance requirements, we simply couldn’t use them — hence the need to build our own!

A working knowledge of AWS, Docker, Python, and Linux is assumed. However, if you need assistance, here’s a post to get you up to speed on Setting up Amazon Linux 2 Docker Containers on MacOS.

Let’s get into it 🛠

Step 1: Pull the Amazon Linux 2 Image

The foundation of our setup starts with pulling the Amazon Linux 2 image using Docker:

docker run amazonlinux:2

Step 2: Start a Docker Container

Launch a Docker container, mounting a local directory for convenient access to files on your local machine.

For example, on MacOS, this is achieved via the following command, making /Users/{replace_with_your_username}/Desktop/mydir on your local filesystem accessible from within the Docker container via /mnt:

docker run — platform linux/amd64 -v /Users/{your_username}/Desktop/mydir:/mnt -it amazonlinux:2 bash

Step 3: Configure Yum Cache Directory (optional)

If you need to use a different directory as your yum cache, then inside the Docker container, either edit the /etc/yum.conf file directly to set cachedir to a different directory, or use the sed command as shown below:

sed -i ‘s|^cachedir=.*|cachedir=/mnt/cache|g’ /etc/yum.conf

Step 4: Install Build Tools

You’ll need to install some essential build tools, including wget, tar, gzip, zip, GCC, and Make.

yum install -y wget tar gzip zip gcc make

Step 5: Download and Extract GhostScript

Fetch the GhostScript v9.54.0 (the version we used) source code from the official repository and extract it via the commands below:

wget https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs9540/ghostpdl-9.54.0.tar.gz
tar -xzf ghostpdl-9.54.0.tar.gz

Step 6: Build GhostScript from Source

Compile GhostScript, a process which might take a while 🕐

cd ghostpdl-9.54.0
./configure
make

Step 7: Verify and Prepare the GhostScript Binary

Once built, verify the GhostScript binary is present and set executable permissions.

ls -l bin/gs
chmod 755 bin/gs

Step 8: Prepare the Lambda Layer

Create a directory for the AWS Lambda layer (in our case, this is /mnt/medium.com/aws-lambda-ghostscriptand copy the bin and lib directories into it from the GhostScript directory ghostpdl-0.54.0 that you’re currently in.

mkdir /mnt/medium.com/aws-lambda-ghostscript/
cp -r bin lib /mnt/medium.com/aws-lambda-ghostscript/

Step 9: Create a Zip Package for AWS Lambda

Now we can package our directory’s contents into a zip file, preserving file permissions along the way:

cd /mnt/medium.com/aws-lambda-ghostscript/
zip -r -y ghostscript_lambda_layer.zip .

Step 10: Exit Docker and Deploy the Lambda Layer

After exiting Docker, upload the zipped layer to AWS Lambda, selecting “Custom Runtime Amazon Linux 2” under Runtime, alongwith any additional runtimes you need to support, e.g. Python 3.7 as in our case.

To ensure the binaries in the Lambda Layer have been placed correctly in the /opt/bin directory in the Lambda environment for our function code to access, we’ll need to write a test function as below.

Step 11: Testing GhostScript in AWS Lambda

Once deployed as a Lambda layer, you can test the integration with the following Python Lambda function:

import json
import subprocess

def lambda_handler(event, context):
gs_command = f'/opt/bin/gs'
output = subprocess.check_output(gs_command, shell=True)

return {
'statusCode': 200,
'body': json.dumps(output.decode('utf8'))
}

Conclusion

This guide provides a robust and straightforward method for AWS solutions architects and developers to integrate GhostScript into AWS Lambda. The process enables sophisticated PDF processing tasks not directly supported in AWS Lambda.

👉 If you found this post useful, please consider hitting the 👏 button 🙏

--

--