Unboxing Amazon Timestream
My first job after graduation in 2011 was all about time-series data. My first task was to connect an exchange data feed with our on-premises time-series database (we used kdb+ by KX Systems). Whenever the exchange matches a buyer and seller, a trade is published on its data feed. A trade contains the following information: time, financial instrument, price, and size of the deal. E.g., At 8:10 am, 50 Amazon shares traded for $3373,23. Trades are naturally ordered by time. But collecting data is not where the story ends.
Do you prefer listening to a podcast episode over reading a blog post? Here you go!
My next step was to analyze the data. Either for humans or computers that buy and sell financial instruments. A couple of questions I had to answer:
- What is the latest price of the Amazon share on the New York Stock Exchange?
- What is the best price of the Amazon share across all marketplaces at the moment?
- Show me today’s Amazon share prices at a granularity of 1 minute
- Show me the last three years of Amazon share prices at a daily granularity
- Is the Amazon share cheap or expensive compared to a basket of financial instruments in the last month?
- How strong does the Amazon share price correlate with the Apple share price in the 300 days?
One thing that all of those questions have in common is that they operate on a time series. In SQL terms, think of
ORDER BY time, or
GROUP BY time.minute. A time-series database is optimized for such queries.
So far, ingesting, storing, and analyzing time-series data on AWS was more challenging than it could be. Luckily, AWS released Amazon Timestream to all of us on Sep 30, 2020.
Amazon Timestream is a fully managed time-series database with zero operational overhead. Timestream auto-scales both the ingestion and the query processing layer. You pay for ingesting data, storing data, and analyzing data independently. Timestream keeps your latest data (up to one year) in memory and stores it in a durable way. Afterward, Timestream rolls the data to magnetic disks where you can keep it for up to 200 years. SSD support is coming soon. Keep in mind that time-series databases perform many sequential reads, and magnetic disks are very good at that.
Stop fighting your database.
DynamoDB is exploding in popularity. It's fast, scalable, and fully-managed. But it's not your father's database. Data modeling in DynamoDB is different than the relational data model you're used to.
Learn how to properly design your data model with DynamoDB to avoid problems later.
The DynamoDB Book by Alex DeBrie is available now!
“Unboxing xyz” is a new series on cloudonaut that helps you to get used to new services on AWS. We demonstrate the service based on a use case. We also cover the basic concepts and the core features of the service.
I’m looking forward to writing an in-depth review of Timestream soon!
So let’s look a bit closer at Timestream. You start by creating a database by specifying a name. Within the database, you can create tables. A table has a name, and you configure how long data is kept in memory and on-disk.
Within a table, data is organized in time series. A time series is identified by its dimensions that you defined during data ingestions. To store stock prices, I used one dimension symbol with values such as
MSFT. Each row in the time series is called a record. A record has a timestamp and one or many measures. To store stock prices, I used two measures: size and price. Attention: It is impossible to store two records with the same timestamp in a time series!
There are a bunch of integrations available to help you with the data ingestion and analytics part. If you want to write queries from scratch, check out the SQL based Timestream Query Language Reference. If you are interested in the details, I created a video where I demonstrate how I ingest and analyze stock prices with Timestream.
Watch Michael analyzing data from the stock market. The video explains how to ingest data into a Timestream table and shows how to query the data.
I’m impressed by the simplicity of Timestream. The time-series database I worked with before was much harder to operate and use. I enjoy the following features a lot:
- I can run a single query over in-memory and on-disk data. It just works. Timestream cares about merging the two data sources.
- Timestream infers the data schema based on the data that I ingest.
- Timestream scales ingestion, storage, and data processing independently.
- You only pay for what you use. There are no hourly fees.
- CloudFormation fully supports databases and tables.
I can only imagine the effort it takes to build such a system. In early 2015, I started a SaaS business with Andreas: TimeSeries.Guru. Our offering might sound familiar: A time-series database as a service on AWS. It was based on the kdb+ technology. It was a lot of work, but the technology was expensive. We closed down the service after one year and focused on our consulting business. But it was a lot of fun!