Messaging on AWS

Michael WittigUpdated 22 Feb 2021

Previously, I compared all database options offered by AWS for you. In this post, I compare the available messaging options. The goal of messaging on AWS is to decouple the producers of messages from consumers.

The messaging pattern allows us to process the messages asynchronously. This has several advantages. You can roll out a new version of consumers of messages while the producers can continue to send new messages at full speed. You can also scale the consumers independently from the producers. You get a buffer in your system that can absorb spikes without overloading it.

Messaging on AWS

Do you prefer listening to a podcast episode over reading a blog post? Here you go!

In this blog post, I introduce all the messaging options that AWS offers. Afterward, I end with a comparison table of the options.

This is a cross-post from the Cloudcraft blog.

Amazon SQS Standard

Amazon Simple Queue Service (SQS) is a fully managed service. There is zero operational overhead, and you pay per message.

SQS comes in two flavors: Standard queues and FIFO queues. I’ll focus on standard queues in this section. The next section covers FIFO queues.

SQS standard queues are the most convenient option you can dream of. You can send nearly unlimited messages and read them at any rate you wish. A message is stored in the queue for up to 14 days. Once you read and delete the message from the queue, it will disappear. Usually, one message is received by one consumer only.
The following figure shows a typical SQS scenario. I’ve created the cloud diagram with Cloudcraft.

SQS Standard

Two limitations might surprise you. Firstly, SQS does not preserve the order of messages. It tries to, but there are no guarantees about message ordering. There is nothing that you can do to fix it so, you rely on order, SQS standard queues may not be for you.

Limitation number two: A message may be delivered more than once (at least once delivery). There are multiple reasons for that. A producer wants to send a message but receives an error (e.g., a network timeout). There is no way for the sender to know if the message was delivered or not. If the message is sent again to retry, the message could be created twice.

Consumers can also fail. If you read a message from SQS, the consumer has a certain amount of time to delete the message from the queue to acknowledge that it was processed successfully. If that acknowledgment comes too late, or not at all, because of a network error, the message will become available for a consumer again.

Last but not least, the SQS service itself can also be the cause of unwanted redelivery of a message. The best way to deal with the at least once semantics is an idempotent consumer.

Amazon SQS FIFO

As mentioned in the previous section, FIFO queues guarantee order. They also provide a way to ensure that a message is not created twice if a consumer retries the sending. The downside is: FIFO queues can only handle 300 messages per second or 3000 messages if you send messages in batches of ten.

SQS FIFO

To consume messages in order, you have to use a single consumer. If you only need an order within a subset of messages, you can define so-called groups of messages where the order is only guaranteed within a group (e.g., the customer id could form a group).

Pro tip: Only use FIFO queues if you can ensure that no more than 300 messages are produced per second, even during rare traffic spikes.

Amazon SNS Standard

Amazon SNS is a fully managed, publish/subscribe system. You send a message to a topic, and all subscribers to that topic will receive a copy of the message. SNS is similar to SQS: zero operational effort but no order guarantee and at least once delivery of messages.

SNS

SNS can be used to implement message fanout. Keep in mind that if a topic has zero subscribers, the message is sent into a black hole and disappears.

SNS uses soft limits to throttle message producers. The default limits depend on the region. For example, in eu-west-1, the default limit is 9000 msg/sec. The hard limit is not disclosed.

Amazon SNS FIFO

An SNS topic with guarantee order. SNS FIFO also provides a way to ensure that a message is not created twice if a producer retries in case of failures. Therefore, exactly once delivery is possible.

The downsides are:

  • FIFO topics can only handle 300 messages per second or 3000 messages if you send messages in batches of ten but no more than 10 MB per second.
  • The only possible subscriber type for a FIFO topic is an SQS FIFO queue.

Amazon EventBridge

Amazon EventBridge (formerly CloudWatch Events) is a fully managed, publish/subscribe system. The publisher sends an event to an event bus. If you want to subscribe to events, you create a rule in an event bus. If the published event matches with a rule, the event is routed to up to five targets. More than 15 target types are supported (including SQS, SNS, Lambda). EventBridge guarantees are similar to SNS and SQS: zero operational effort but no order guarantee and at least once delivery of messages.

EventBridge can be used to implement message fanout. Keep in mind that if an event does not match with a rule, it disappears unnoticed. You can optionally archive all events delivered to an event bus. Archived events can be replayed at any time.

EventBridge uses soft limits to throttle message producers. The default limits depend on the region. For example, in eu-west-1, the default limit is 10,000 msg/sec. The hard limit is not disclosed.

Amazon Kinesis Data Streams

Amazon Kinesis provides capabilities related to real-time data. I’ll focus primarily on data streams in this section.

If you send a message to a stream, it is appended to the end of the stream (similar to a queue). The difference comes when you read the messages. A message does not disappear from the stream when you read it. Instead, the consumer reads through a stream and keeps track of its position in the stream. You can, at any time, start reading from the beginning of the stream. The only limitation: Kinesis drops the data when it gets older than 365 days. Kinesis guarantees to keep messages in order (FIFO). However, message ordering is only consistent within a shard. A shard is capable of handling up to 1 MB/s or 1000 messages/sec. You have to add/remove shards as needed and you pay for shards per hour.

Kinesis Data Stream

There is no way for a producer to avoid the resend problem mentioned in the SQS standard queue section. If a producer has to retry sending a message, it could end up twice in the stream. Keep in mind that the consumer has to keep track of the position in the stream while reading, which can also lead to reading a message twice. I use Kinesis data streams in scenarios where SQS standard queues are not an option because I have to rely on some order within a subset of my data (e.g., customer id mapped to shards).

Amazon MSK

Amazon Managed Streaming for Apache Kafka (MSK) offers Apache Kafka as a Service. You get a managed cluster and can start working with Kafka without the operation complexity. Kafka works in a similar way than Kinesis data streams.

The main benefits:

  • Kafka is open source, and you can use it outside of AWS.
  • Kafka topics can store your data forever if you wish.
  • Kafka can scale horizontally by adding brokers. Topics are divided into partitions, and you will find order only within a partition (the same as with Kinesis shards).

Amazon MQ

Amazon MQ offers Apache ActiveMQ as a Service. MQ is somehow similar to how RDS deploys databases. Two instances are running in two availability zones. One of them is active and used by producers and consumers while all data is replicated to a standby broker. In the case where the active broker fails, the producers and consumers will reconnect to the standby broker.

MQ

The problem with this architecture is obvious: The throughput of the system is limited to what a single broker (and storage) can provide. In this case, the storage layer is backed by Amazon EFS, which limits the throughput to 80 messages per second. You can choose other storage layers, but you will risk losing messages. Luckily, you can operate ActiveMQ in a network of brokers mode to increase the throughput. The most significant benefit is that ActiveMQ supports a wide range of protocols such as JMS, AMQP, MQTT, STOMP, OpenWire.

AWS IoT Core

AWS IoT Core is mostly used for IoT workloads, where a scalable MQTT broker is the foundation of the architecture. The order of messages is not guaranteed. MQTT QoS levels at most once and at least once are supported. The nice benefit of IoT Core is that you can not only access an MQTT broker, you can also define rules inside the system to react upon messages. E.g., you can trigger a Lambda function if a message is published on a topic.

Summary

There are many options with a broad range of capabilities. I compiled the following table to help you make your selection.

Amazon SQS StandardAmazon SQS FIFOAmazon SNS StandardAmazon SNS FIFOAmazon EventBridge (formerly CloudWatch Events)Amazon Kinesis Data StreamsAmazon MSKAmazon MQAWS IoT Core
Scalingnearly unlimited3000 msg/sec (batch write)not disclosed (default soft limit depends on region; e.g., 9000 msg/sec in eu-west-1)3000 msg/sec (batch write) or 10 MB per secondnot disclosed (default soft limit depends on region; e.g., 10,000 msg/sec in eu-west-1)1 MB or 1000 msg/sec per shard; up to 500 shards; you need to manually add/remove shards30 brokers per cluster; you need add/remove brokers and reassign partitions manually80 msg/sec; can be increased with a network of brokersnot disclosed
Max. message size256 KB256 KB256 KB256 KB256 KB1 MBconfigurable (default 1 MB) limited by disk space128 KB
Persistenceup to 14 daysup to 14 daysnonoarchiving is possibleup to 365 daysforever (up to 16384 GiB per broker)forever (up to 200 GB)up to 1 hour
ReplicationMulti-AZMulti-AZMulti-AZMulti-AZMulti-AZMulti-AZMulti-AZ (optional)Multi-AZ (optional)Multi-AZ
Order guaranteenoyesnoyesnowithin a shardwithin a partitionyesno
Delivery guaranteeat least onceexactly-once possibleat least onceexactly-once possibleat least onceat least onceup to the consumerexactly once; supports distributed (XA) transactionsat least once / at most once
Pricingper messageper messageper messageper messageper messageper shard hourper broker hour + provisioned storageper broker hour + used storageper message + connection duration
ProtocolsAWS Rest APIAWS Rest APIAWS Rest APIAWS Rest APIAWS Rest APIAWS Rest APIKafka protocolJMS, AMQP, MQTT, STOMP, OpenWireMQTT, AWS Rest API
AWS IntegrationsLambdaLambdaLambda, SQS, webhookSQS FIFOLambda, SQS, SNS, and many moreLambdaLambdaLambdaLambda, SQS, SNS, and many more
LicenseAWS onlyAWS onlyAWS onlyAWS onlyAWS onlyAWS onlyopen source (Apache Kafka)open source (Apache ActiveMQ)AWS only
Encryption at restyesyesyesyesyesyesyesyesno
Encryption in transityesyesyesyesyesyesyesyesyes

Michael Wittig

Michael Wittig

I’ve been building on AWS since 2012 together with my brother Andreas. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.