Analyzing CloudTrail with Athena
Which IAM users have been active within your AWS account within the last 30 days? Are all of the 999 IAM roles still in use, or can you remove some of them to clean up your infrastructure? Is it safe to remove the action s3:GetObject
from the IAM policy, or will something on the EC2 instance i-abcdefg
break? Is your boss still using the root account to access your AWS account, even if you have told him to lock away the credentials?
In this post, you learn to answer these and other security and compliance related questions with the help of CloudTrail and Athena as shown in this post. As shown in the following figure CloudTrail is recording every - or almost every - request to the AWS API. For example, if an engineer launches a new EC2 instance, deletes an S3 bucket, or changes the Security Group of an RDS instance, CloudTrail records this. CloudTrail can push the recorded events to CloudWatch Logs and S3 buckets, as well. Even better, you can use a single S3 bucket to collect CloudTrail events from multiple AWS accounts. However, how to make sense out of that data? Athena is a query service allowing you to query JSON files stored on S3 easily.
I show you the necessary steps to query CloudTrail events with the help of Athena in the following. Before you start, make sure you have created a trail that is sending log files to S3. We offer a CloudFormation template to setup CloudTrail as well.
Creating a table and partitioning data
First, open Athena in the Management Console. Next, double check if you have switched to the region of the S3 bucket containing the CloudTrail logs to avoid unnecessary data transfer costs. Afterward, execute the following query to create a table. Make sure you are replacing <BUCKET>
with the name of your S3 bucket. The query derives from the AWS documentation with some advancements as discussed later.
CREATE EXTERNAL TABLE cloudtrail_logs ( |
In general, Athena needs to process all data stored in your S3 bucket to be able to answer your queries. It is essential to reduce the processed data per query to keep costs and response times low.
CloudTrail uses the following key aka. directory structure to store log files in your S3 bucket.
|-- AWSLogs |
That is why the table you have created is partitioned by the following attributes:
account
: the AWS account ID (e.g.,111111111111
)region
: the region (e.g.,us-east-1
)year
: the year (e.g.,2018
)
To be able to query data from your table, you need to add partitions. Do so, by executing the following query for each partition you want to add. Replace <BUCKET>
with the name of your S3 bucket, <ACCOUNT>
with the AWS account ID, <REGION>
with the region, and <YEAR>
with the year of the partition.
ALTER TABLE cloudtrail_logs |
Repeat the process of adding partitions to your table until the data for all accounts, regions, and years is accessible through the table. Have a look at Partitioning CloudTrail Logs in Athena if you are looking for a way to automate that process.
If the amount of data stored in your S3 bucket is below 1 GB you could also create a table without partitions. To do so remove the line PARTITIONED BY (account string, region string, year string)
from the CREATE EXTERNAL TABLE
statement.
Querying CloudTrail logs
Finally, you are ready to analyze the data. The following queries demonstrate how you can query CloudTrail events. Replace <ACCOUNT>
with an AWS account ID you and <YEAR>
with the current year.
Which IAM roles have been active within the last 7 days?
Execute the following query to get a list of IAM roles (ARN) that have been sending requests to the AWS API within the last 7 days. Replace <ACCOUNT>
with an AWS account ID you and <YEAR>
with the current year.
SELECT DISTINCT useridentity.sessioncontext.sessionissuer.arn |
Which IAM users have been active within the last 30 days?
A similar query allows you to find out which IAM users have been active in an account within the last 30 days. Replace <ACCOUNT>
with an AWS account ID you and <YEAR>
with the current year.
SELECT DISTINCT useridentity.arn |
Was the root user accessing the account within the last 7 days?
You should not use the root user for day to day work due to increase security. If the following query results with a 0
you are fine. If the result is larger than zero, you should find out who is using your root user credentials. Replace <ACCOUNT>
with an AWS account ID you and <YEAR>
with the current year.
SELECT DISTINCT COUNT(*) FROM cloudtrail_logs |
Which services and actions is an IAM role accessing?
Following the least privilege principle when creating IAM roles and attaching policies is another crucial security best practice on AWS. However, how do you know if you can remove an action from the policy attached to a role? That’s simple. The following query returns the services and actions accessed by an IAM role. Replace <ACCOUNT>
with an AWS account ID you, <YEAR>
with the current year, and
SELECT DISTINCT eventsource, eventname FROM cloudtrail_logs |
Which IAM users have logged into the Management Console within the last 30 days?
Do you want to find out which users have logged into the Management Console within the last 30 days? Here is your query. Replace <ACCOUNT>
with an AWS account ID you and <YEAR>
with the current year.
SELECT DISTINCT useridentity.arn FROM cloudtrail_logs |
In conclusion, querying CloudTrail events with Athena is a mighty way to answer security and compliance related questions. You should have the Athena table set up and be familiar with the possible queries to be able to answer such questions quickly.
Are there any other questions you have answered with the help of CloudTrail events queried by Athena? I’m curious about the examples you send me.
Further reading
- Article Cleaning up an S3 bucket with the help of Athena
- Article AWS Security Primer
- Article Event Driven Security Automation on AWS
- Tag security
- Tag cloudtrail
- Tag athena