Introducing the Object Store: S3
Back in the old days, data was managed as files in a hierarchy consisting of folders and files. The file was the representation of the data. In an object store, data is stored as objects. Each object consists of a globally unique identifier, some metadata, and the data itself, as figure 1 illustrates. An object’s globally unique identifier is also known as its key; addressing the object from different devices and machines in a distributed system is possible with the globally unique identifier.
The separation of metadata and data allows clients to work only with the metadata for managing and querying data. You only have to load the data if you really need it. Metadata is also used to store access-control information and for other management tasks.
The Amazon S3 object store is one of the oldest services on AWS. Amazon S3 is an acronym for Amazon Simple Storage Service. It’s a typical web service that lets you store and retrieve data in an object store via an API reachable over HTTPS.
The service offers unlimited storage space and stores your data in a highly available and durable way. You can store any kind of data, such as images, documents, and binaries, as long as the size of a single object doesn’t exceed 5 TB. You have to pay for every GB you store in S3, and you also incur minor costs for every request and transferred data. As figure 2 shows, you can access S3 via HTTPS using via the Management Console, the command-line interface (CLI), SDKs, and third-party tools, to upload and download objects.
S3 uses buckets to group objects. A bucket is a container for objects with a globally unique name. By unique we really mean unique—you have to choose a bucket name that isn’t used by any other AWS customer in any other region, so we advise you to prefix the buckets with your domain name (such as com.mydomain.*) or your company name. Figure 3 shows the concept.
Typical use cases are as follows:
- Backing up and restoring files with S3 and the help of the AWS CLI
- Archiving objects with Amazon Glacier to save money compared to Amazon S3
- Integrating Amazon S3 into applications with the help of the AWS SDKs to store and fetch objects such as images
- Hosting static web content that can be viewed by anyone with the help of S3
- Building data pipelines with S3 Event Notifications and Lambda