A neglected serverless data store: Cloud Directory

Michael Wittig – 18 Sep 2018

Lately, I’ve been having much fun with Amazon Cloud Directory. Three months before, Cloud Directory was all new to me. Today, I am convinced that Cloud Directory is a neglected Serverless data store that deserves much more attention. Let me explain what Cloud Directory is and why it is an excellent choice in Serverless architectures.

A short introduction to Cloud Directory

Cloud Directory is a fully managed, hierarchical data store on AWS. Let’s talk about hierarchical data first.

Introducing hierarchical data and Cloud Directory terminology

In this post, I model a Slack like chat system using a hierarchical data model. There are two teams. Team widdix use the general channel to discuss work-related topics, and they discuss AWS announcements in the news channel. The AWS team is busy planning re:Invent in the reinvent channel, and they use the general channel to discuss other topics. The following figure shows the chat system data hierarchy.

Cloud Directory Hierarchy

In Cloud Directory, a Directory always comes with an implicit single Root Node object. When you create a Node object, you have to specify the parent (Root Node object or Node object). Every Node object has exactly one parent, and a Child Link is automatically created for you. Therefore, every Node object has [0...N] children.

Accessing a Node

You can access a Node object in the Directory with a Selector that can be a Cloud Directory generated Object Identifier or a Path that you can control. The Selector is a string where the first character is important.

  • Selectors that start with a $ point to an Object Identifier, e.g., $287d. The value is generated by Cloud Directory when you create an object.
  • Selectors that start with a / point to a Path, e.g., /aws/general. The path is a concatenation of the Child Links Link Name.

Let’s look at the Path more closely. As you already learned, when you create a Node, a Child Link is automatically created for you that points to the parent. The Child Link has a Link Name that you have to supply, and it must be unique for all nodes sharing the same parent. The following figure shows the Link Names and Object Identifiers of the chat system.

Cloud Directory Paths and Ids

The path /widdix/general uniquely identifies the general channel of team widdix while the path /aws/general uniquely identifies the general channel of team AWS. The Link Name general can be used multiple times in the Directory but not for the same parent.

For each Node, you can get:

Storing data based on a schema

So far, I have not talked about the ability to store data besides the hierarchy. This changes now. A Facet defines a collection of Attributes. Compared to the relational database world, you can think of a Facet as a table and an Attribute as a column for now. An Attribute is defined by:

  • type (STRING, NUMBER, BINARY, BOOLEAN, DATETIME)
  • required or optional
  • mutable or immutable
  • default value
  • validation rules
    • String length
    • Binary length
    • String from set
    • Number comparison

A Node can be associated with one or many Facets. The following figure shows the Facets of the chat system (each Node is associated only with a single Facet).

Cloud Directory Facets

A Schema is a collection of Facets. Schemas are a bit tricky. When you create a new Schema, this is called a Development Schema, and you can modify it (create facets, change facets, delete facets). When you are happy with the Development Schema, you can publish it, and a Published Schema is created for you that is versioned. The Published Schema cannot be changed. It can only be upgraded in a backward compatible way (add a facet, add an optional attribute to an existing facet). When you create a Directory, you have to select a Published Schema which is then copied and becomes an Applied Schema. The Applied Schema can also be upgraded in a backward compatible way. The following table summaries the different types of Schemas.

Schema type Mutable Versioned Backward compatible Upgrade
Development yes no -
Published no yes yes
Applied no yes yes

You can now store a hierarchy together with attributes. Let’s look at one more feature: Linking nodes.

Besides the Child Link, Cloud Directory supports a Typed Link that goes from one Node to another Node. The cool thing about the Typed Link is that you can attach Attributes to it as well defined in a Typed Link Facet. Some (or all) of the Attributes are used as the identity of the Typed Link.

You can not use a Path to navigate with Typed Links.

The following figure shows how a Typed Link can be used in the chat system to share channels between teams. The AWS team now has access the to news channel of the widdix team, and it appears as widdix-news in the AWS team.

Cloud Directory Typed Links

For each Node, you can get:

That is the end of the short introduction to Cloud Directory. Let’s look at how Cloud Directory fits into the Serverless space next.

How does Cloud Directory fit into the Serverless space?

Cloud Directory is truly Serverless. There is zero configuration.

  • Cloud Directory Requires no management of machines or software
  • You pay per request and used storage
  • Scales automatically
  • Is fault tolerant/highly available

Cloud Directory can be used from Lamba with the AWS SDK.

Additional Feature

Besides the features you learned about so far, Cloud Directory also has support for:

  • Transactions
  • Multiple consistency levels
  • Policies with efficient retrieval
  • Leaf nodes (can have [0..N]parents)
  • Indexes

Missing features: No backup/restore capability and no visual explorer of your directory.

Summary

If you use Cognito or Organizations, your data is already stored in Cloud Directory under the covers. Cloud Directory is a serverless hierarchical data store on AWS. It is a niche service that deserves more attention because it is easy to operate (zero configuration) and comes with transactional support.

Thanks to Thorsten Höger for reviewing this article.

Michael Wittig

Michael Wittig

I’ve been building on AWS since 2012 together with my brother Andreas. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.

Further reading