Amazon Elasticsearch Service revised

Andreas Wittig – 22 May 2017

AWS first! is one of our consulting principles. Using a managed service provided by AWS is usually offering the most bang for the buck. But there are pitfalls and downsides hidden behind shiny marketing promises of these managed services as well.

Amazon Elasticsearch Service is offering the popular open-source search and analytics engine as a service. You can benefit from many advantages when using the service as described on AWS’s marketing page:

Easy to Use
Highly Available and Secure
Easily Scalable

But that’s only one part of the story. There are downsides when using the Amazon Elasticsearch Service as well. I will highlight common challenges when using the managed service in real-world scenarios as well as illustrate workarounds in this article.

VPC

Placing a database into a VPC is possible with the Relational Database Service (SQL database) and ElastiCache (in-memory database). The Amazon Elasticsearch Service does not support the Virtual Private Cloud (VPC). Accessing the service is only possible with a connection via the Internet.

Elasticsearch running outside VPC

All traffic from EC2 instances running in a private subnet to the Elasticsearch database running outside the VPC flows through a NAT gateway. The additional traffic is causing costs and consuming a limited resource.

Using a service outside the VPC is a no go because of compliance requirements in some scenarios as well.

Snapshot and Restore

Being able to restore your data is an important aspect of a managed service. Amazon Elasticsearch Service is creating snapshots of your data daily. Unfortunately, you are not able to restore a snapshot yourself. Only the AWS support team can restore a snapshot. Therefore, you need to create a support ticket if you want to restore your data. Wich increases the RTO (recovery time objective) and reduces your control over the restore process.

There is a workaround for this problem. Elasticsearch has built-in support for snapshots stored on S3. As illustrated in the following figure you can use AWS Lambda to trigger snapshots based on a schedule as well as cleaning up snapshots automatically.

Triggering Elasticsearch snapshots with Lambda

IAM

The Identity and Access Management (IAM) service is used to authenticate and authorize requests to the Amazon Elasticsearch Service. Almost every part of AWS offers integrations with IAM. As you can reuse IAM users, groups, and roles it is very easy to control access to your data stored within your Elasticsearch database.

But there some downsides caused by IAM as well. It is not possible to use the standard security mechanism provided by X-Pack (formerly Shield) with Amazon Elasticsearch Service.

A lot of tools from the Elasticsearch ecosystem do not support authentication with IAM out of the box. If you want to use logstash to ingest log data into your Elasticsearch database, an additional plugin is needed to handle the IAM-based authentication.

Amazon Elasticsearch Service offers Kibana out-of-the-box. But it is not possible to access Kibana when using IAM for authentication because your browser is not able to use IAM-based authentication. Using an IAM proxy as shown in the following figure allows you to work around this limitation.

IAM Proxy for Elasticsearch

Encryption

AWS is offering encryption integrated with KMS (Key Management Service) for almost all services storing data (RDS, EBS, S3, …). In contrast, the Amazon Elasticsearch Service does not provide server-side encryption for data-at-rest. Storing data unencrypted could be a no go if you must encrypt all data because of regulations.

Instance Types

The Amazon Elasticsearch Service does not support all instance types. For example, I3 instances would perfectly fit for a distributed database such as Elasticsearch but are not an option when using the Amazon Elasticsearch Service.

Limited Operations

Some Elasticsearch operations are not available when using the Amazon Elasticsearch Service. For example, it is not possible to open and close indices. A process that is required to make changes to analyzers.

Summary

I’d recommend using a managed service provided by AWS whenever possible. AWS is doing a great job at offering services fulfilling the needs of the majority of their customers. The Amazon Elasticsearch Service is taking away the burden of managing a distributed system. But if you are planning to use Amazon Elasticsearch Service you should consider the downsides I have experienced in real-world projects and documented in this article as well.

I’m planning to add a template to our collection of CloudFormation templates including workarounds for accessing Kibana (IAM Proxy) as well as Elasticsearch snapshots triggered by Lambda. Please let me know, if you’d be interested in such a template.

Thanks to Moritz Onken for providing his feedback regarding the Amazon Elasticsearch Service.

Andreas Wittig

I’ve been building on AWS since 2012 together with my brother Michael. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV,HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.