Engaging your users with AWS Step Functions

Michael Wittig – 13 Oct 2017

Imagine a new user signs up for your service. You send an automated welcome message to your new user explaining how the service works. But what if your user struggles with the first steps? You want to send a second email with additional information. To abstract this a little bit, the following steps are needed:

Steps to engage your users

  1. Send a welcome message to the new user.
  2. Wait some time.
  3. Check if the user completed the initial steps.
    a. If yes, done.
    b. If no, continue.
  4. Send a message with additional information to the new user.
  5. Wait some time.
  6. Check if the user completed the initial steps.
    a. If yes, done.
    b. If no, continue.
  7. Send a message to the new user offering a Chime call.

This is nothing more than a state machine. It has a start (new user signed up) and an end (the last message was sent) and a few state transitions in between. With AWS Step Functions, you can implement a state machine. To do so, you have to translate the steps into the right format and implement the business logic. I will use AWS Lambda to implement the business logic in this post. Let’s get started.

Anatomy of a state machine in AWS Step Functions

A state machine in AWS Step Functions can take input data in JSON and consists of states:

  • There is one start state that gets the input when starting the state machine.
  • Each state can either be an end state or will point to the next state.
  • There are one or many end states.
  • A state is of a specific type.
  • By default, the input of a state is outputted. Some states change this.

In this example, four different state types are used, but there are much more. The four used state types are:

Type Description
Task Calls a Lambda function.The event of the Lambda function is the input of the state. By default, the output of the Lambda function is the output of the state. If the Lambda function fails, it can be retried.
Wait Waits for a specific amount of time in seconds. You are not billed for the waiting time.
Choice So far, a state has only one next state. But sometimes you need to make a choice (e.g., if the user completed initial steps, then ..., else ...). Depending on a precondition, you can have several next states.
Succeed Indicates a successful end of a state machine.

Now, you have to map the engaging steps to states.

Example state machine in AWS Step Functions

The start state is SendMessage1.

Id Type Description Next
SendMessage1 Task Send a welcome message to the new user. Wait1
Wait1 Wait Wait some time. FetchActivityCount1
FetchActivityCount1 Task Fetch number of activities the new user performed. CheckActivityCount1
CheckActivityCount1 Choice Did the user completed the initial steps? If yes, then Done, else SendMessage2
SendMessage2 Task Send a message with additional information to the new user. Wait2
Wait2 Wait Wait some time. FetchActivityCount2
FetchActivityCount2 Task Fetch number of activities the new user performed. CheckActivityCount2
CheckActivityCount2 Choice Did the user completed the initial steps? If yes, then Done, else SendMessage3
SendMessage3 Task Send a message to the new user offering a Chime call. Done
Done Succeed Done. -

Now, the state machine is defined. Are you surprised by states FetchActivityCount1 and CheckActivityCount1? The step Check if the user completed the initial steps was translated to two states:

  • Task FetchActivityCount1: Fetch number of activities the new user performed.
  • Choice CheckActivityCount1: Did the user completed the initial steps?.

The reason for this is that a state can either do something (like getting the number of activities performed by the user from the database) or it can make a choice. You can not do both in a single state. Also, the Lambda function cannot perform that choice for you. Only the state machine can make a choice based on input data.

Now, the business logic (states of type Task) needs to be implemented.

Implementing tasks

A task can either call a Lambda function or an activity. If your business logic cannot be implemented with Lambda, you can fall back to activities. I will not cover activities in this example.

Send welcome message

I provide a dummy implementation here in Node.js that fails in 30% of the time to demonstrate how retries work.

'use strict'
module.exports.handler = (event, context, cb) => {
console.log(JSON.stringify(event));
if (Math.random() < 0.3) { // fail 30% of the time
cb(new Error('error happened'));
} else {
cb(null, {});
}
};

Fetch number of activities

I provide a dummy implementation here in Node.js that fails in 30% of the time and returns that the user did not complete any activities in 50% of the time.

'use strict'
module.exports.handler = (event, context, cb) => {
console.log(JSON.stringify(event));
if (Math.random() < 0.3) {
cb(new Error('error happened')); // fail 30% of the time
} else {
cb(null, {activities: Date.now() % 2}); // return zero 50% of the time
}
};

So far, the state machine is not really defined in a machine readable format. You will change this in the next section.

Translate the state machine to JSON

State machines are defined in a JSON document like this:

{
"Comment": "AWS Step Functions Example",
"StartAt": "SendMessage1",
"Version": "1.0",
"States": {


}
}

The StartAt property defines the first state in the state machine. Let’s see how states are defined.

The first state is SendMessage1 of type Task:

"SendMessage1": {
"Type": "Task",
"Resource": "<Lambda ARN>",
"Retry": [{
"ErrorEquals": ["States.TaskFailed"],
"IntervalSeconds": 2,
"MaxAttempts": 16,
"BackoffRate": 2
}],
"Next": "Wait1"
}
  • The Resource property contains the ARN of the Lambda function (e.g., arn:aws:lambda:$region:$account-id:function:$function-name).
  • The Retry property defines that if the Lambda function returns an error.
    • The first retry is performed after IntervalSeconds.
    • The next retries performed after IntervalSeconds*BackoffRate*NoOfRetry.
    • Only retry MaxAttempts times.
  • The Next property points to the next state.

Now, the message is sent, so we have to wait.

"Wait1": {
"Type": "Wait",
"Seconds": 3,
"Next": "FetchActivityCount1"
}

After that, we have to get the number of activities the user did (e.g., query a database).

"FetchActivityCount1": {
"Type": "Task",
"Resource": "<Lambda ARN>",
"Next": "CheckActivityCount1"
}

After we have the information, it’s time to make a decision:

"CheckActivityCount1": {
"Type": "Choice",
"Choices": [{
"Variable": "$.activities",
"NumericEquals": 0,
"Next": "SendMessage2"
}],
"Default": "Done"
}
  • The Choices property defines an array of rules. Each rule:
    • Selects a property from the input using JsonPath in Variable.
    • Compares it, e.g., with NumericEquals or many others.
    • Defines the next state in Next.
  • The Default property indicates the state of no other state was selected in Choices.

Finally, the last state is reached.

"Done": {
"Type": "Succeed"
}

Now it’s time to wire everything together with CloudFormation.

CloudFormation template

The following is only an excerpt of the full CloudFormation template.

---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'AWS Step Functions Example'
Resources:
# Step functions state machine
StateMachineOnboardingInstall:
Type: 'AWS::StepFunctions::StateMachine'
Properties:
DefinitionString: !Sub |
{
"Comment": "AWS Step Functions Example",
"StartAt": "SendMessage1",
"Version": "1.0",
"States": {
"SendMessage1": {
"Type": "Task",
"Resource": "${FunctionSendMessage.Arn}",
"Retry": [{
"ErrorEquals": ["States.TaskFailed"],
"IntervalSeconds": 2,
"MaxAttempts": 16,
"BackoffRate": 2
}],
"Next": "Wait1"
},
# [...]
"Done": {
"Type": "Succeed"
}
}
}
RoleArn: !GetAtt 'RoleOnboardingInstall.Arn'



# Lambda functions
FunctionSendMessage:
Type: 'AWS::Lambda::Function'
Properties:
# [...]
FunctionFetchActivityCount:
Type: 'AWS::Lambda::Function'
Properties:
# [...]



# IAM roles
RoleOnboardingInstall:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: !Sub 'states.${AWS::Region}.amazonaws.com'
Action: 'sts:AssumeRole'
Policies:
- PolicyName: lambda
PolicyDocument:
Statement:
- Effect: Allow
Action: 'lambda:InvokeFunction'
Resource:
- !GetAtt 'FunctionSendMessage.Arn'
- !GetAtt 'FunctionFetchActivityCount.Arn'
RoleSendMessage:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: 'lambda.amazonaws.com'
Action: 'sts:AssumeRole'
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
RoleFetchActivityCount:
Type: 'AWS::IAM::Role'
Properties:
# [...]

AWS Step Functions State Machine

Installation

Download the source code an create a stack:

aws cloudformation create-stack --stack-name example --template-body file://template.yml --capabilities CAPABILITY_IAM
aws cloudformation wait stack-create-complete --stack-name example

After some minutes, CloudFormation created a bunch of Lambda functions, IAM roles, and a State Machine for you.

Creating an execution

To create a state machine execution:

  1. Visit the Step Functions Management Console.
  2. Click on the only state machine.
  3. Press the New execution button.
  4. Supply an Execution id (e.g., 1).
  5. Press the Start Execution button.

Depending on chance, you will take one of many paths trough the state machine (keep in mind that Lambdas fail in 30% of the time and return no or one activity by chance). Therefore, our execution graph will likely look slightly different.

One thing that I want to highlight is the retry mechanism for Task states. Below the Visual Workflow, you can see a full log of the execution. Mine looked like this:

AWS Step Functions Execution Log

  • In line 11, the Lambda has executed the first time, but it failed in line 12 at 8:22:43.
  • In line 14, the Lambda has executed again at 8:22:45 (exactly 2 seconds later, as defined in the Retry property!).
  • Line 15 tells us that this time, the Lambda executed without an error.

Keep in mind that your log will look different. But you likely see log types of LambdaFunctionFailed. If no, create a few more execution and look at them.

Clean up

Don’t forget to cleanup the CloudFormation stack:

aws cloudformation delete-stack --stack-name example
aws cloudformation wait stack-delete-complete --stack-name example

Summary

In this post you learned, that:

  • You can implement state machines with AWS Step Functions.
  • Each state can do different things depending on the Type of the state.
  • A Lambda function can be called from a state of type Task and can be retried in the case of a failure.
  • The Choice state type can select the next state based in input data.

Michael Wittig

Michael Wittig

I’ve been building on AWS since 2012 together with my brother Andreas. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.