AWS Step Functions: How to Orchestrate Workflows Waiting for 3rd Parties

Andreas Wittig – 22 Oct 2025

My go-to service for automating workflows in serverless applications is AWS Step Functions. Recently, I was working on an enhancement for HyperEnv, our solution to deploy self-hosted GitHub runners on AWS with ease. The challenge was to track the status of GitHub jobs with the serverless backend of HyperEnv. In the following, I will share the architecture that you can use to orchestrate workflows that need to wait for 3rd parties.

How to Orchestrate Workflows Waiting for 3rd Parties: Tracking GitHub Jobs with AWS Step Functions

The challenge

GitHub webhooks enable applications to subscribe to events, like the workflow_job event informing about jobs getting created, starting to run, or completing. To ensure HyperEnv provides a self-hosted runner for every job, even in case of failures, HyperEnv needs to keep track of every GitHub job. Doing so allows HyperEnv to retry launching a self-hosted runner in case a job does not start running within 5 minutes, for example.

The challenge: keep track of the state of a GitHub job by using webhook events

The solution

The following diagram illustrates the solution. Let me walk you through the architecture diagram from left to right.

The solution: API Gateway, Lambda, Step Functions, DynamoDB

First, an API Gateway receives the GitHub webhook events via HTTP and invokes the Lambda function Webhook.

Next, the Lambda function Webhook starts the execution of the state machine Job State Machine.

The following screenshot shows the state machine in more detail.

  • Start is where the state machine starts.
  • Queued is the initial state of a GitHub job.
  • InProgress indicates that the GitHub job is running.
  • Completed means that the GitHub job finished.
  • End is the end of the state machine.

Insights into Job State Machine

Then, the state machine Job State Machine invokes the Lambda function Job State by using the Wait for Callback integration (see Wait for a Callback with Task Token).

The Lambda function Job State creates an item in the DynamoDB table Job State and persists the task token needed to resolve the callback.

If next, GitHub sends a webhook event with a status update, the process continues.

The API Gateway receives the event. Then, the Lambda function Webhook queries the DynamoDB table Job State and fetches the task token. With the task token, the Lambda function Webhook sends a task success signal to the state machine Job State Machine. Which will resolve the wait for the Queued task and continue with the next one InProgress.

In the next section, I will share implementation details of the architecture.

Deep Dive: Step Functions and Lambda with Wait for Callback

The following listing shows an excerpt of the state machine’s definition.

The Lambda function is called with Wait for Callback (see arn:aws:states:::lambda:invoke.waitForTaskToken).

The task token is needed to send a success signal as soon as the state is done. Therefore, $states.context.Task.Token is added to the payload of the Lambda function invocation.

{
"Comment": "Job State Machine",
"QueryLanguage": "JSONata",
"StartAt": "Queued",
"States": {
"Queued": {
"Arguments": {
"FunctionName": "arn:aws:lambda:eu-west-1:xxxxxxxxxxxx:function:JobStateFunctionFunction",
"Payload": "{% $merge([$states.context.Execution.Input, {\"task\": \"Queued\", \"taskToken\": $states.context.Task.Token}]) %}" // <= Adds the task token to the payload when invoking the Lambda function
},
"Next": "InProgress",
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken", // <= Calls Lambda function in Wait for Callback mode
"TimeoutSeconds": 300,
"Type": "Task"
},
"InProgress": {
"Arguments": {
"FunctionName": "arn:aws:lambda:eu-west-1:xxxxxxxxxxxx:function:JobStateFunctionFunction",
"Payload": "{% $merge([$states.context.Execution.Input, {\"task\": \"InProgress\", \"taskToken\": $states.context.Task.Token}]) %}" // <= Adds the task token to the payload when invoking the Lambda function
},
"Next": "Completed",
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken", // <= Calls Lambda function in Wait for Callback mode
"TimeoutSeconds": 21600,
"Type": "Task"
},
...
},
"TimeoutSeconds": 3600
}

Next, let’s take a look into a possible implementation of the Lambda function Job State.

Depending on the event.task property, the function updates an item in the DynamoDB table.

The expire_at attribute is used to define a time-to-live for the DynamoDB item to ensure the stored state gets deleted from the table automatically.

The task token is stored in the item’s attribute task_token_queued or task_token_in_progress.

import { DynamoDBClient, UpdateItemCommand } from '@aws-sdk/client-dynamodb';
const ddbClient = new DynamoDBClient();

export async function handler(event) {
switch(event.task) {
case 'Queued': {
await ddbClient.send(new UpdateItemCommand({
ExpressionAttributeValues: {
':state': {
S: 'queued'
},
':task_token': {
S: event.taskToken
},
':expire_at': {
N: Math.floor((new Date().getTime() + 1 * 24 * 60 * 60 * 1000) / 1000).toString(), // expire in 24 hours
},
':created_at': {
N: Math.floor(new Date().getTime() / 1000).toString()
},
},
ExpressionAttributeNames: {
'#s': 'state',
},
Key: {
run_job_id: {
S: `run-${event.runId}-job-${event.jobId}`
}
},
ReturnValues: 'ALL_NEW',
TableName: JOB_STATE_TABLE_NAME,
UpdateExpression: 'SET #s = :state, task_token_queued = :task_token, expire_at = if_not_exists(expire_at, :expire_at), created_at = if_not_exists(created_at, :created_at)'
}));
break;
}
case 'InProgress':
await ddbClient.send(new UpdateItemCommand({
ExpressionAttributeValues: {
':state': {
S: 'in_progress'
},
':task_token': {
S: event.taskToken
}
},
ExpressionAttributeNames: {
'#s': 'state'
},
Key: {
run_job_id: {
S: `run-${event.runId}-job-${event.jobId}`
}
},
ReturnValues: 'ALL_NEW',
TableName: JOB_STATE_TABLE_NAME,
UpdateExpression: 'SET #s = :state, task_token_in_progress = :task_token'
}));
break;
case 'Completed':
// ...
break;
}
}

Finally, let’s take a look into the implementation of the Lambda function Webhook.

The following snippet shows how the Lambda function Webhook starts the execution of the state machine Job State Machine for each new GitHub job.

if (body.action === 'queued') {
await sfnClient.send(new StartExecutionCommand({
stateMachineArn: JOB_STATE_MACHINE_ARN,
input: JSON.stringify({
organizationName: body.repository.owner.login,
jobLabels: body.workflow_job.labels,
runId: body.workflow_job.run_id,
jobId: body.workflow_job.id,
installationId: body.installation.id,
}),
name: `run-${body.workflow_job.run_id}-job-${body.workflow_job.id}`
}));
return {
statusCode: 200,
body: `created state machine execution run-${body.workflow_job.run_id}-job-${body.workflow_job.id}`
};
}

Next, when GitHub sends a webhook indicating that the job changed its status from queued to in in_progress the Lambda function executes the following code.

  1. Fetch the state of the GitHub job from the DynamoDB table Job State.
  2. Use the task token task_token_queued to send a task success notification to the state machine.
if (body.action === 'in_progress') {
const queryResult = await ddbClient.send(new QueryCommand({
ExpressionAttributeValues: {
':run_job_id': {
S: `run-${body.workflow_job.run_id}-job-${body.workflow_job.id}`
}
},
KeyConditionExpression: 'run_job_id = :run_job_id',
TableName: JOB_STATE_TABLE_NAME
}));
await sfnClient.send(new SendTaskSuccessCommand({
taskToken: queryResult.Items[0].task_token_queued.S,
output: JSON.stringify({})
}));
return {
statusCode: 200,
body: 'state updated'
};
}

The process is similar for the transition from in_progress to completed.

Summary & Feedback

To track the progress of a workflow depending on 3rd parties, the following patterns are helpful when using AWS Step Functions.

  • Use Wait for Callback mode to invoke Lambda functions. Store the task token in DynamoDB. Optionally call the 3rd party. Then wait for a response from the 3rd party.
  • As AWS Step Functions does not provide a way to get information about the current task (aka. state), store information about the current state in DynamoDB.

I hope following along with my solution helps you to come up with a suitable approach for your use case. In case you found another approach, please share it with me. I’m happy to learn from you!

Andreas Wittig

Andreas Wittig

I’ve been building on AWS since 2012 together with my brother Michael. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, attachmentAV, HyperEnv, and marbot.

Here are the contact options for feedback and questions.