Download YouTube videos with AWS Lambda and store them on S3
Recently, I was faced with the challenge to download videos from YouTube and store them on S3.
Sounds easy? Remember than Lambda comes with a few limitations:
- 512 MB of disk space available at
- 3008 MB of memory
- 15 minutes maximum execution time
While working on a solution, I encountered multiple problems:
- Download the video from YouTube to
/tmpand then upload it to S3: Does not work with videos larger than 512 MB.
- Download the video from YouTube into memory and then upload it to S3: Does not work with videos larger than ~3 GB.
- Download the video from Youtube and stream it to S3 while downloading: Works for all videos that can be processed within 15 minutes. I have not found a video that took longer than a few minutes to process.
Let’s look at how I finally solved the problem with a streaming approach in Node.js. I use the youtube-dl library to get easy access to YouTube videos.
First, we create a PassThrough stream in Node.js. A pass-through stream is a duplex stream where you can write on one side and read on the other side.
Special offer: cloudonaut t-shirt
Do you love our blog posts and podcast episodes? Unlock our weekly videos and online events by subscribing to cloudonaut plus.
Special offer: Join cloudonaut plus before November 30th, and we will send you a cloudonaut t-shirt for free.Subscribe now!
const stream = require('stream');
Next, we need to write data to the stream. This is done by the
const youtubedl = require('youtube-dl');
And finally, we need to upload the stream to S3. We make use of the Multipart Upload feature of S3 which allows us to upload a big file in smaller chunks. This way, we only have to buffer the small junk (64 MB in this case) in memory and not the whole file.
const AWS = require('aws-sdk');
That’s it. Now you can download YouTube videos of any size with Lambda and upload them to S3. I recommend running the code in a “big” Lambda function with 3008 MB of memory for better network performance.
You can find the full source code on GitHub including a SAM template to provision the AWS resources. Have fun!