📣 Limited offer: subscribe to cloudonaut plus, get a t-shirt for free

📣 Limited offer: free cloudonaut t-shirt

Messaging on AWS

Michael Wittig – 01 Jul 2020 (updated 27 Nov 2020)

Previously, I compared all database options offered by AWS for you. In this post, I compare the available messaging options. The goal of messaging on AWS is to decouple the producers of messages from consumers.

The messaging pattern allows us to process the messages asynchronously. This has several advantages. You can roll out a new version of consumers of messages while the producers can continue to send new messages at full speed. You can also scale the consumers independently from the producers. You get a buffer in your system that can absorb spikes without overloading it.

Messaging on AWS

Do you prefer listening to a podcast episode over reading a blog post? Here you go!

In this blog post, I introduce all the messaging options that AWS offers. Afterward, I end with a comparison table of the options.

This is a cross-post from the Cloudcraft blog.

Amazon SQS Standard

Amazon Simple Queue Service (SQS) is a fully managed service. There is zero operational overhead, and you pay per message.

SQS comes in two flavors: Standard queues and FIFO queues. I’ll focus on standard queues in this section. The next section covers FIFO queues.

SQS standard queues are the most convenient option you can dream of. You can send nearly unlimited messages and read them at any rate you wish. A message is stored in the queue for up to 14 days. Once you read and delete the message from the queue, it will disappear. Usually, one message is received by one consumer only.
The following figure shows a typical SQS scenario. I’ve created the cloud diagram with Cloudcraft.

SQS Standard

Two limitations might surprise you. Firstly, SQS does not preserve the order of messages. It tries to, but there are no guarantees about message ordering. There is nothing that you can do to fix it so, you rely on order, SQS standard queues may not be for you.

Limitation number two: A message may be delivered more than once (at least once delivery). There are multiple reasons for that. A producer wants to send a message but receives an error (e.g., a network timeout). There is no way for the sender to know if the message was delivered or not. If the message is sent again to retry, the message could be created twice.

Consumers can also fail. If you read a message from SQS, the consumer has a certain amount of time to delete the message from the queue to acknowledge that it was processed successfully. If that acknowledgment comes too late, or not at all, because of a network error, the message will become available for a consumer again.

Last but not least, the SQS service itself can also be the cause of unwanted redelivery of a message. The best way to deal with the at least once semantics is an idempotent consumer.

Amazon SQS FIFO

As mentioned in the previous section, FIFO queues guarantee order. They also provide a way to ensure that a message is not created twice if a consumer retries the sending. The downside is: FIFO queues can only handle 300 messages per second or 3000 messages if you send messages in batches of ten.

SQS FIFO

To consume messages in order, you have to use a single consumer. If you only need an order within a subset of messages, you can define so-called groups of messages where the order is only guaranteed within a group (e.g., the customer id could form a group).

Special offer: cloudonaut t-shirt

Do you love our blog posts and podcast episodes? Unlock our weekly videos and online events by subscribing to cloudonaut plus.

Special offer: Join cloudonaut plus before November 30th, and we will send you a cloudonaut t-shirt for free.

Subscribe now!

Pro tip: Only use FIFO queues if you can ensure that no more than 300 messages are produced per second, even during rare traffic spikes.

Amazon SNS Standard

Amazon SNS is a fully managed, publish/subscribe system. You send a message to a topic, and all subscribers to that topic will receive a copy of the message. SNS is similar to SQS: zero operational effort but no order guarantee and at least once delivery of messages.

SNS

SNS can be used to implement message fanout. Keep in mind that if a topic has zero subscribers, the message is sent into a black hole and disappears.

SNS uses soft limits to throttle message producers. The default limits depend on the region. For example, in eu-west-1, the default limit is 9000 msg/sec. The hard limit is not disclosed.

Amazon SNS FIFO

An SNS topic with guarantee order. SNS FIFO also provides a way to ensure that a message is not created twice if a producer retries in case of failures. Therefore, exactly once delivery is possible.

The downsides are:

  • FIFO topics can only handle 300 messages per second or 3000 messages if you send messages in batches of ten but no more than 10 MB per second.
  • The only possible subscriber type for a FIFO topic is an SQS FIFO queue.

Amazon EventBridge

Amazon EventBridge (formerly CloudWatch Events) is a fully managed, publish/subscribe system. The publisher sends an event to an event bus. If you want to subscribe to events, you create a rule in an event bus. If the published event matches with a rule, the event is routed to up to five targets. More than 15 target types are supported (including SQS, SNS, Lambda). EventBridge guarantees are similar to SNS and SQS: zero operational effort but no order guarantee and at least once delivery of messages.

EventBridge can be used to implement message fanout. Keep in mind that if an event does not match with a rule, it disappears unnoticed. You can optionally archive all events delivered to an event bus. Archived events can be replayed at any time.

EventBridge uses soft limits to throttle message producers. The default limits depend on the region. For example, in eu-west-1, the default limit is 10,000 msg/sec. The hard limit is not disclosed.

Amazon Kinesis Data Streams

Amazon Kinesis provides capabilities related to real-time data. I’ll focus primarily on data streams in this section.

If you send a message to a stream, it is appended to the end of the stream (similar to a queue). The difference comes when you read the messages. A message does not disappear from the stream when you read it. Instead, the consumer reads through a stream and keeps track of its position in the stream. You can, at any time, start reading from the beginning of the stream. The only limitation: Kinesis drops the data when it gets older than 7 days. Kinesis guarantees to keep messages in order (FIFO). However, message ordering is only consistent within a shard. A shard is capable of handling up to 1 MB/s or 1000 messages/sec. You have to add/remove shards as needed and you pay for shards per hour.

Kinesis Data Stream

There is no way for a producer to avoid the resend problem mentioned in the SQS standard queue section. If a producer has to retry sending a message, it could end up twice in the stream. Keep in mind that the consumer has to keep track of the position in the stream while reading, which can also lead to reading a message twice. I use Kinesis data streams in scenarios where SQS standard queues are not an option because I have to rely on some order within a subset of my data (e.g., customer id mapped to shards).

Amazon MSK

Amazon Managed Streaming for Apache Kafka (MSK) offers Apache Kafka as a Service. You get a managed cluster and can start working with Kafka without the operation complexity. Kafka works in a similar way than Kinesis data streams.

The main benefits:

  • Kafka is open source, and you can use it outside of AWS.
  • Kafka topics can store your data forever if you wish.
  • Kafka can scale horizontally by adding brokers. Topics are divided into partitions, and you will find order only within a partition (the same as with Kinesis shards).

Amazon MQ

Amazon MQ offers Apache ActiveMQ as a Service. MQ is somehow similar to how RDS deploys databases. Two instances are running in two availability zones. One of them is active and used by producers and consumers while all data is replicated to a standby broker. In the case where the active broker fails, the producers and consumers will reconnect to the standby broker.

MQ

The problem with this architecture is obvious: The throughput of the system is limited to what a single broker (and storage) can provide. In this case, the storage layer is backed by Amazon EFS, which limits the throughput to 80 messages per second. You can choose other storage layers, but you will risk losing messages. Luckily, you can operate ActiveMQ in a network of brokers mode to increase the throughput. The most significant benefit is that ActiveMQ supports a wide range of protocols such as JMS, AMQP, MQTT, STOMP, OpenWire.

AWS IoT Core

AWS IoT Core is mostly used for IoT workloads, where a scalable MQTT broker is the foundation of the architecture. The order of messages is not guaranteed. MQTT QoS levels at most once and at least once are supported. The nice benefit of IoT Core is that you can not only access an MQTT broker, you can also define rules inside the system to react upon messages. E.g., you can trigger a Lambda function if a message is published on a topic.

Summary

There are many options with a broad range of capabilities. I compiled the following table to help you make your selection.

Amazon SQS Standard Amazon SQS FIFO Amazon SNS Standard Amazon SNS FIFO EventBridge Amazon Kinesis Data Streams Amazon MSK Amazon MQ AWS IoT Core
Scaling nearly unlimited 3000 msg/sec (batch write) not disclosed (soft limits apply) 3000 msg/sec (batch write) or 10 MB per second not disclosed (soft limits apply) 1 MB or 1000 msg/sec per shard; up to 500 shards; you need to manually add/remove shards 30 brokers per cluster; you need add/remove brokers and reassign partitions manually 80 msg/sec; can be increased with a network of brokers not disclosed
Max. message size 256 KB 256 KB 256 KB 256 KB 256 KB 1 MB configurable (default 1 MB) limited by disk space 128 KB
Persistence up to 14 days up to 14 days no no archiving is possible up to 7 days forever (up to 16384 GiB per broker) forever (up to 200 GB) up to 1 hour
Replication Multi-AZ Multi-AZ Multi-AZ Multi-AZ Multi-AZ Multi-AZ Multi-AZ (optional) Multi-AZ (optional) Multi-AZ
Order guarantee no yes no yes no within a shard within a partition yes no
Delivery guarantee at least once exactly-once possible at least once exactly-once possible at least once at least once up to the consumer exactly once; supports distributed (XA) transactions at least once / at most once
Pricing per message per message per message per message per message per shard hour per broker hour + provisioned storage per broker hour + used storage per message + connection duration
Protocols AWS Rest API AWS Rest API AWS Rest API AWS Rest API AWS Rest API AWS Rest API Kafka protocol JMS, AMQP, MQTT, STOMP, OpenWire MQTT, AWS Rest API
AWS Integrations Lambda Lambda Lambda, SQS, webhook SQS FIFO Lambda, SQS, SNS, and many more Lambda n/a n/a Lambda, SQS, SNS, and many more
License AWS only AWS only AWS only AWS only AWS only AWS only open source (Apache Kafka) open source (Apache ActiveMQ) AWS only
Encryption at rest yes yes yes yes yes yes yes yes no
Encryption in transit yes yes yes yes yes yes yes yes yes
Michael Wittig

Michael Wittig

I’m an independent consultant, technical writer, and programming founder. All these activities have to do with AWS. I’m writing this blog and all other projects together with my brother Andreas.

In 2009, we joined the same company as software developers. Three years later, we were looking for a way to deploy our software—an online banking platform—in an agile way. We got excited about the possibilities in the cloud and the DevOps movement. It’s no wonder we ended up migrating the whole infrastructure of Tullius Walden Bank to AWS. This was a first in the finance industry, at least in Germany! Since 2015, we have accelerated the cloud journeys of startups, mid-sized companies, and enterprises. We have penned books like Amazon Web Services in Action and Rapid Docker on AWS, we regularly update our blog, and we are contributing to the Open Source community. Besides running a 2-headed consultancy, we are entrepreneurs building Software-as-a-Service products.

We are available for projects.

You can contact me via Email, Twitter, and LinkedIn.

Briefcase icon
Hire me