Running Elasticsearch Curator as an AWS Lambda function

AWS Lambda functions can be used to run the Elasticsearch Curator CLI in a serverless way. This can be very convenient as you don’t need yet another server in order to run your Curator jobs, especially if you are using an Elasticsearch cloud hosting service and don’t have any machines directly available.

AWS Lambda

AWS Lambda is a serverless compute service that allows for the execution of event triggered units of code called functions. Lambda functions can be written in Java, nodejs, C# and Python and some common use cases are processing items as they are added to Kinesis queues or to S3 buckets. Another nice approach is to cloudify old fashioned cron-jobs using Lambda functions and CloudWatch Events and since the price of running Lambda functions is based on execution time this is much cheaper than having an EC2 virtual machine running 24/7. The execution unit of Lambda functions are called GB-seconds and is calculated as the memory (GB) x execution time (seconds), if your function executes for 0,5 seconds using 0,128 GB of memory this will equal 0,064 GB seconds. Even better is that the first 400,000 GB seconds each month are free. Currently there is a maximum timeout of 300 seconds for a single execution of a function so Lambda functions are not suitable for long running tasks.

Elasticsearch Curator

Elasticsearch Curator lets you manage your Elasticsearch indices and snapshots and is a handy tool for doing various maintanance tasks. It is a python library that can be used either by directly using the API or by using the CLI and specify tasks using action files. The Curator documentation contains several examples of action files doing varios tasks (Note that you can specify several actions in a file and that they will be executed in order).

Running the Curator CLI as a Lambda function

Using the Curator Python API in your Lambda Function is pretty straight forward but maybe you want to move existing Curator jobs that are using the the CLI and have actions specified in action files. To do this you can simply execute the Curator CLI from within your Lambda function:

run("curator.yml", "actions.yml", dry_run=os.environ.get('DRY_RUN', True))

Doing this and packaging the action file together with the Lambda function in the deploy will then allow you to setup functions running different Curator jobs. To simplify it even more I have created a small lambda-curator package on github with a full implementation, including packaging and deployment of the function.

Integration with AWS Elasticsearch

If you are running your Elasticsearch cluster using AWS Elasticsearch you get another nice feature straight out of the box. The Curator requests will automatically be signed by the IAM role of your Lambda function (when having set aws_sign_request in the curator.yml) and will therefore be authenticated against your cluster (using the IAM).