Unboxing Amazon Timestream

Michael Wittig – 14 Oct 2020

My first job after graduation in 2011 was all about time-series data. My first task was to connect an exchange data feed with our on-premises time-series database (we used kdb+ by KX Systems). Whenever the exchange matches a buyer and seller, a trade is published on its data feed. A trade contains the following information: time, financial instrument, price, and size of the deal. E.g., At 8:10 am, 50 Amazon shares traded for $3373,23. Trades are naturally ordered by time. But collecting data is not where the story ends.

Time-series data

Do you prefer listening to a podcast episode over reading a blog post? Here you go!

My next step was to analyze the data. Either for humans or computers that buy and sell financial instruments. A couple of questions I had to answer:

  • What is the latest price of the Amazon share on the New York Stock Exchange?
  • What is the best price of the Amazon share across all marketplaces at the moment?
  • Show me today’s Amazon share prices at a granularity of 1 minute
  • Show me the last three years of Amazon share prices at a daily granularity
  • Is the Amazon share cheap or expensive compared to a basket of financial instruments in the last month?
  • How strong does the Amazon share price correlate with the Apple share price in the 300 days?

One thing that all of those questions have in common is that they operate on a time series. In SQL terms, think of WHERE time>'2020-10-01T00:00:00.000', ORDER BY time, or GROUP BY time.minute. A time-series database is optimized for such queries.

Introduction

So far, ingesting, storing, and analyzing time-series data on AWS was more challenging than it could be. Luckily, AWS released Amazon Timestream to all of us on Sep 30, 2020.

Amazon Timestream is a fully managed time-series database with zero operational overhead. Timestream auto-scales both the ingestion and the query processing layer. You pay for ingesting data, storing data, and analyzing data independently. Timestream keeps your latest data (up to one year) in memory and stores it in a durable way. Afterward, Timestream rolls the data to magnetic disks where you can keep it for up to 200 years. SSD support is coming soon. Keep in mind that time-series databases perform many sequential reads, and magnetic disks are very good at that.

The DynamoDB Book

Stop fighting your database.

DynamoDB is exploding in popularity. It's fast, scalable, and fully-managed. But it's not your father's database. Data modeling in DynamoDB is different than the relational data model you're used to.

Learn how to properly design your data model with DynamoDB to avoid problems later.

The DynamoDB Book by Alex DeBrie is available now!

Get The DynamoDB Book now

“Unboxing xyz” is a new series on cloudonaut that helps you to get used to new services on AWS. We demonstrate the service based on a use case. We also cover the basic concepts and the core features of the service.
I’m looking forward to writing an in-depth review of Timestream soon!

Concepts

So let’s look a bit closer at Timestream. You start by creating a database by specifying a name. Within the database, you can create tables. A table has a name, and you configure how long data is kept in memory and on-disk.

Within a table, data is organized in time series. A time series is identified by its dimensions that you defined during data ingestions. To store stock prices, I used one dimension symbol with values such as AMZN, AAPL, MSFT. Each row in the time series is called a record. A record has a timestamp and one or many measures. To store stock prices, I used two measures: size and price. Attention: It is impossible to store two records with the same timestamp in a time series!

There are a bunch of integrations available to help you with the data ingestion and analytics part. If you want to write queries from scratch, check out the SQL based Timestream Query Language Reference. If you are interested in the details, I created a video for our ambitious cloudonaut plus subscribers, where I demonstrate how I ingest and analyze stock prices with Timestream.

Summary

I’m impressed by the simplicity of Timestream. The time-series database I worked with before was much harder to operate and use. I enjoy the following features a lot:

  • I can run a single query over in-memory and on-disk data. It just works. Timestream cares about merging the two data sources.
  • Timestream infers the data schema based on the data that I ingest.
  • Timestream scales ingestion, storage, and data processing independently.
  • You only pay for what you use. There are no hourly fees.
  • CloudFormation fully supports databases and tables.

I can only imagine the effort it takes to build such a system. In early 2015, I started a SaaS business with Andreas: TimeSeries.Guru. Our offering might sound familiar: A time-series database as a service on AWS. It was based on the kdb+ technology. It was a lot of work, but the technology was expensive. We closed down the service after one year and focused on our consulting business. But it was a lot of fun!

PS: Andreas and I produce exclusive video content on the latest AWS topics. Check out cloudonaut plus and stay up-to-date.

Michael Wittig

Michael Wittig

I’m an independent consultant, technical writer, and programming founder. All these activities have to do with AWS. I’m writing this blog and all other projects together with my brother Andreas.

In 2009, we joined the same company as software developers. Three years later, we were looking for a way to deploy our software—an online banking platform—in an agile way. We got excited about the possibilities in the cloud and the DevOps movement. It’s no wonder we ended up migrating the whole infrastructure of Tullius Walden Bank to AWS. This was a first in the finance industry, at least in Germany! Since 2015, we have accelerated the cloud journeys of startups, mid-sized companies, and enterprises. We have penned books like Amazon Web Services in Action and Rapid Docker on AWS, we regularly update our blog, and we are contributing to the Open Source community. Besides running a 2-headed consultancy, we are entrepreneurs building Software-as-a-Service products.

We are available for projects.

You can contact me via Email, Twitter, and LinkedIn.

Briefcase icon
Hire me