🎉 We are launching a new weekly show: Hot off the Cloud

🎉 We are launching a new weekly show

Step Functions pitfall: The execution reached the maximum number of history events (25000)

Michael Wittig – 31 Aug 2022

AWS Step Functions is an execution environment for finite state machines. Lately, I was running into the error “The execution reached the maximum number of history events (25000).” when listing all objects in an S3 bucket page by page. This blog post will teach you why the error happens and how to avoid it.

The execution reached the maximum number of history events

Introducing Step Functions

In Step Functions, a state machine is a collection of states. A state does some work (e.g., invoke a Lambda function, terminate the state machine, if/else statement, and many more) and points to the next state. When executing a state machine, you can hand over data used as the input into the first state. Each state gets input data and outputs data as well. The following figure shows four states:

State Machine to page trough all objects in an S3 bucket

  • Run: Invokes a Lambda function to call the S3 API to list objects starting at NextKeyMarker from the input. Outputs the new NextKeyMarker as well as the IsTruncated response of the S3 API. Next state: CheckEndOfPage.
  • CheckEndOfPage: Checks if IsTruncated is set to true. Outputs the input. If yes, transition to Wait. If no, transition to Done.
  • Wait: Waits for a couple of seconds. Outputs the input. Transition to Run.
  • Done: All objects are fetched. Successfully terminates the state machine. Outputs the input.

The pitfall

The number of state transitions allowed in the execution of the state machine is limited. But the limit is not well defined. Instead, Step Functions limits the number of history events per execution to 25,000. Wait? What are history events?
The official developer guide does not mention them. But you can see them in the UI, as the screenshot shows.

AWS Step Functions UII history events

The execution listed the objects of an S3 bucket with a handful of objects. Only one page was available. But still, 11 history events are emitted. The Run state emits 5 history events, and CheckEndOfPage emits two. Wait was never reached, but it would emit two history events. For each page of S3 objects, 9 history events are emitted. I divide the maximum allowed number of history events by 9, and I get a maximum of 2,777 (25,000/9) pages I can fetch. The S3 API returns not more than 1,000 objects per API call. Doing the math again reveals that my state machine can list buckets with no more than 2.7 mio objects. My bucket was, of course, larger than that. The execution errored: “The execution reached the maximum number of history events (25000).”

Solving the issue

The 25,000 history events limit will not go away. We have to work around it. I suggest first understanding the relationship between the state types in your state machine and the number of history events emitted. I was stunned to learn that invoking a Lambda function emits 5 history events. I was looking for a table that tells me how many history events I can expect for a given state type. Unfortunately, such a table does not exist, so I created my own (please reach out to me to fill the ? gaps if your state machine uses these state types):


Looking for a new challenge?

  • tecRacer

    Cloud Consultant • AWS Migrations

    tecRacer • Premier AWS Consulting Partner • Germany, Austria, Portugal, and Switzerland
    Assessment Transformation Change Management
  • DEMICON

    Senior Lead Full Stack Developer

    DEMICON • AWS Advanced Consulting Partner • Remote
    AWS JavaScript/TypeScript Angular React

State Type Task Type Expected number of history events
Choice - 2
Fail - ?
Map - ?
Parallel - ?
Pass - ?
Succeed - 2
Task servicename ?
Task activity ?
Task function 5
Wait - 2

Remember that one history event is emitted at the beginning of the execution and one at the end.

Now that we understand how many history events we emit, we can choose between two strategies to solve the issue.

  1. Get more work done in a state.
  2. Start a new state machine execution before we reach the limit.

Get more work done in a state.

This was the strategy I implemented to list more S3 objects. Instead of making a single S3 API call to fetch 1,000 objects, I now call the S3 API 100 times and fetch up to 100,000 objects in one Lambda function execution. I also optimized my Lambda function with a neat trick to cut the execution time in half.

Downsides:

  1. The Lambda timeout limit must be taken into account. In my case, listing 100,000 objets takes under 1 minute and is well below the 15-minute limit.
  2. Step Functions is great at retrying a state if something goes wrong. E.g., imagine one S3 API call fails with a 500 error. The state machine can be configured to retry the Lambda function execution. The more work you do in your state, the more work is retried. In my case, that’s okay because I only read a lot of data. If you perform expensive writes, that might be different.

Start a new state machine execution before we reach the limit.

This strategy is recommended by AWS and can be implemented like this.

Downsides:

  1. You can create an expensive infinite loop.

I hope this article will help you avoid the pitfall.

Become a cloudonaut supporter

Michael Wittig

Michael Wittig ( Email, Twitter, or LinkedIn )

We launched the cloudonaut blog in 2015. Since then, we have published 360 articles, 49 podcast episodes, and 48 videos. It's all free and means a lot of work in our spare time. We enjoy sharing our AWS knowledge with you.

Please support us

Have you learned something new by reading, listening, or watching our content? With your help, we can spend enough time to keep publishing great content in the future. Learn more

$
Amount must be a multriply of 5. E.g, 5, 10, 15.

Thanks to Alan Leech, Alex DeBrie, ANTHONY RAITI, Christopher Hipwell, Jaap-Jan Frans, Jason Yorty, Jeff Finley, Jens Gehring, jhoadley, Johannes Grumböck, Johannes Konings, John Culkin, Jonas Mellquist, Juraj Martinka, Kamil Oboril, Ken Snyder, Markus Ellers, Ross Mohan, Ross Mohan, sam onaga, Satyendra Sharma, Shawn Tolidano, Simon Devlin, Thorsten Hoeger, Todd Valentine, Victor Grenu, and all anonymous supporters for your help! We also want to thank all supporters who purchased a cloudonaut t-shirt.