Scaling a traditional, relational database is very hard because transactional guarantees (Atomicity, Consistency, Isolation, Durability also known as ACID) require communication between all nodes of the database. The more nodes you add, the slower your database gets, because more and more nodes must coordinate transactions between each other. The way to tackle this has been to use databases which don’t adhere to these guarantees. We call them NoSQL databases.
Amazon provides a NoSQL database service called DynamoDB. Unlike Relational Database Service (RDS) which is effectively providing several common RDBMS engines like MySQL, Oracle Database, Microsoft SQL Server and PostgreSQL, DynamoDB is a fully managed, proprietary, closed source key-value store. DynamoDB is highly available and highly durable. You can scale from one item to billions and from one request per second to tens of thousands of requests per second.
That having been said, this article is going to look how to use DynamoDB - both how to administer it like any other service but also how to program your applications to use it.
You don’t need to worry about installation, updates, servers, storage or backups.
- DynamoDB is not a software you can download. Instead it is a NoSQL database as a Service. Therefore you really can’t install DynamoDB like you install MySQL or MongoDB. This also means that you must not care about updating your database. The software is maintained by AWS.
- DynamoDB runs on a fleet of servers operated by AWS. They care about the operating system and all security related questions. From a security perspective it’s your job to grant the right permissions in IAM to the users of your DynamoDB tables.
- DynamoDB replicates your data between multiple servers and across multiple data centers. So there is no need for a backup from a durability point of view - the backup is already inside the database.
DynamoDB is made for developers!
DynamoDB is a key-value store that organizes your data in tables. Each table contains items (values) that are identified by keys. A table can also maintain secondary indexes for data lookup besides the primary key. You will now have a look at these basic building blocks of DynamoDB.
A DynamoDB table has a name and organizes a collection of items. An item is a collection of attributes. An attribute is a name-value pair. The attribute value can be scalar (number, string, binary, boolean), multi-valued (number set, string set, binary set) or JSON document (object, array). Items in a table are not required to have the same attributes, there is no enforced schema.
You can create a table with the Management Console, CloudFormation, SDKs, or the CLI. The following example shows how you create a table with the CLI:
1: $ aws dynamodb create-table --table-name app-entity \ 2: --attribute-definitions AttributeName=id,AttributeType=S \ 3: --key-schema AttributeName=id,KeyType=HASH \ 4: --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
In line 1 you define the name of the table. Line 2 defines the attributes that are used in the key schema in line 3. Line 4 defines the provisioned throughput (can be changed whenever you like).
==HINT If you plan to run multiple applications that use DynamoDB, it’s good practice to prefix your tables with the name of your application.==
A primary key is unique within a table and identifies an item. You need the primary key to look up an item. The primary key is either a hash or a hash and a range.
A hash key uses a single attribute of an item to create a hash index. If you want to look up an item based on its hash key, you need to know the exact hash key. A user table could use the user’s email as a hash primary key. A user then can be retrieved if you know the hash key (email, in this case).
A hash and range key uses two attributes of an item to create a more powerful index. The first attribute is the hash part of the key, and the second part is the range. To look up an item, you need to know the exact hash part of the key, but you
don’t need to know the range part. The range part is sorted within the hash. This allows you to query the range part of the key from a certain starting point. A message table can use a hash and range as its primary key; the hash is the email of the user, and the range is a timestamp. You can now look up all messages of a user that are newer than a specific timestamp.
Keep in mind that comparing DynamoDB and RDS is like comparing apples and oranges. The only thing they have in common is that both are called a database.
|How can I ...||DynamoDB||RDS|
|create a table||Management Console, SDK, or CLI
|insert, update, or delete data||SDK||SQL
|query data||If you query the primary key: SDK.
Querying non-key attributes isn’t possible, but you can add a secondary index or scan the entire table
|increase storage||No action needed. DynamoDB grows with your items.||Provision more storage.|
|increase performance||Horizontal, by increasing capacity. DynamoDB will add more servers under the hood.||Vertical, by increasing instance size; or horizontal, by adding read replicas. But there is an upper limit.|
|install the database on my machine||DynamoDB is not available for download. You can only use it as a service.||Download MySQL, Oracle Database, Microsoft SQL Server, or PostgreSQL, and install it on your machine.|
|hire an expert||Search for special DynamoDB skills.||Search for general SQL skills or special skills, depending on the database engine.|