Analyzing CloudTrail with Athena

Andreas Wittig – 27 Jul 2018

Which IAM users have been active within your AWS account within the last 30 days? Are all of the 999 IAM roles still in use, or can you remove some of them to clean up your infrastructure? Is it safe to remove the action s3:GetObject from the IAM policy, or will something on the EC2 instance i-abcdefg break? Is your boss still using the root account to access your AWS account, even if you have told him to lock away the credentials?

In this post, you learn to answer these and other security and compliance related questions with the help of CloudTrail and Athena as shown in this post. As shown in the following figure CloudTrail is recording every - or almost every - request to the AWS API. For example, if an engineer launches a new EC2 instance, deletes an S3 bucket, or changes the Security Group of an RDS instance, CloudTrail records this. CloudTrail can push the recorded events to CloudWatch Logs and S3 buckets, as well. Even better, you can use a single S3 bucket to collect CloudTrail events from multiple AWS accounts. However, how to make sense out of that data? Athena is a query service allowing you to query JSON files stored on S3 easily.

Analyzing CloudTrail with Athena

I show you the necessary steps to query CloudTrail events with the help of Athena in the following. Before you start, make sure you have created a trail that is sending log files to S3. We offer a CloudFormation template to setup CloudTrail as well.

Creating a table and partitioning data

First, open Athena in the Management Console. Next, double check if you have switched to the region of the S3 bucket containing the CloudTrail logs to avoid unnecessary data transfer costs. Afterward, execute the following query to create a table. Make sure you are replacing <BUCKET> with the name of your S3 bucket. The query derives from the AWS documentation with some advancements as discussed later.

CREATE EXTERNAL TABLE cloudtrail_logs (
eventversion STRING,
useridentity STRUCT<
type:STRING,
principalid:STRING,
arn:STRING,
accountid:STRING,
invokedby:STRING,
accesskeyid:STRING,
userName:STRING,
sessioncontext:STRUCT<
attributes:STRUCT<
mfaauthenticated:STRING,
creationdate:STRING>,
sessionissuer:STRUCT<
type:STRING,
principalId:STRING,
arn:STRING,
accountId:STRING,
userName:STRING>>>,
eventtime STRING,
eventsource STRING,
eventname STRING,
awsregion STRING,
sourceipaddress STRING,
useragent STRING,
errorcode STRING,
errormessage STRING,
requestparameters STRING,
responseelements STRING,
additionaleventdata STRING,
requestid STRING,
eventid STRING,
resources ARRAY<STRUCT<
ARN:STRING,
accountId:STRING,
type:STRING>>,
eventtype STRING,
apiversion STRING,
readonly STRING,
recipientaccountid STRING,
serviceeventdetails STRING,
sharedeventid STRING,
vpcendpointid STRING
)
PARTITIONED BY (account string, region string, year string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<BUCKET>/AWSLogs/';

In general, Athena needs to process all data stored in your S3 bucket to be able to answer your queries. It is essential to reduce the processed data per query to keep costs and response times low.

CloudTrail uses the following key aka. directory structure to store log files in your S3 bucket.

|-- AWSLogs
|-- 111111111111
|-- CloudTrail
|-- us-east-1
|-- 2018
|-- 06
|-- 01
|-- 111111111111_CloudTrail_....json.gz
|-- 02
|-- ...
|-- 07
|-- ...
|-- us-east-2
|-- ...
|-- 222222222222
|-- ...

That is why the table you have created is partitioned by the following attributes:

  • account: the AWS account ID (e.g., 111111111111)
  • region: the region (e.g., us-east-1)
  • year: the year (e.g., 2018)

To be able to query data from your table, you need to add partitions. Do so, by executing the following query for each partition you want to add. Replace <BUCKET> with the name of your S3 bucket, <ACCOUNT> with the AWS account ID, <REGION> with the region, and <YEAR> with the year of the partition.

ALTER TABLE cloudtrail_logs 
ADD PARTITION (account='<ACCOUNT>', region='<REGION>', year='<YEAR>')
LOCATION 's3://<BUCKET>/AWSLogs/<ACCOUNT>/CloudTrail/<REGION>/<YEAR>/';

Repeat the process of adding partitions to your table until the data for all accounts, regions, and years is accessible through the table. Have a look at Partitioning CloudTrail Logs in Athena if you are looking for a way to automate that process.

If the amount of data stored in your S3 bucket is below 1 GB you could also create a table without partitions. To do so remove the line PARTITIONED BY (account string, region string, year string) from the CREATE EXTERNAL TABLE statement.

Querying CloudTrail logs

Finally, you are ready to analyze the data. The following queries demonstrate how you can query CloudTrail events. Replace <ACCOUNT> with an AWS account ID you and <YEAR> with the current year.

Which IAM roles have been active within the last 7 days?

Execute the following query to get a list of IAM roles (ARN) that have been sending requests to the AWS API within the last 7 days. Replace <ACCOUNT> with an AWS account ID you and <YEAR> with the current year.

SELECT DISTINCT useridentity.sessioncontext.sessionissuer.arn 
FROM cloudtrail_logs
WHERE year = '<YEAR>' AND account = '<ACCOUNT>'
AND useridentity.type = 'AssumedRole'
AND from_iso8601_timestamp(eventtime) > date_add('day', -7, now());

Which IAM users have been active within the last 30 days?

A similar query allows you to find out which IAM users have been active in an account within the last 30 days. Replace <ACCOUNT> with an AWS account ID you and <YEAR> with the current year.

SELECT DISTINCT useridentity.arn 
FROM cloudtrail_logs
WHERE year = '<YEAR>' AND account = '<ACCOUNT>'
AND useridentity.type = 'IAMUser'
AND useridentity.arn IS NOT NULL
AND from_iso8601_timestamp(eventtime) > date_add('day', -30, now());

Was the root user accessing the account within the last 7 days?

You should not use the root user for day to day work due to increase security. If the following query results with a 0 you are fine. If the result is larger than zero, you should find out who is using your root user credentials. Replace <ACCOUNT> with an AWS account ID you and <YEAR> with the current year.

SELECT DISTINCT COUNT(*) FROM cloudtrail_logs 
WHERE year = '<YEAR>'
AND account = '<ACCOUNT>'
AND useridentity.type = 'Root'
AND useridentity.invokedby != 'support.amazonaws.com'
AND from_iso8601_timestamp(eventtime) > date_add('day', -7, now());

Which services and actions is an IAM role accessing?

Following the least privilege principle when creating IAM roles and attaching policies is another crucial security best practice on AWS. However, how do you know if you can remove an action from the policy attached to a role? That’s simple. The following query returns the services and actions accessed by an IAM role. Replace <ACCOUNT> with an AWS account ID you, <YEAR> with the current year, and with the ARN of the IAM role.

SELECT DISTINCT eventsource, eventname FROM cloudtrail_logs 
WHERE year = '<YEAR>'
AND account = '<ACCOUNT>'
AND (useridentity.sessioncontext.sessionissuer.arn = '<ROLE_ARN>')
ORDER BY eventsource, eventname;

Which IAM users have logged into the Management Console within the last 30 days?

Do you want to find out which users have logged into the Management Console within the last 30 days? Here is your query. Replace <ACCOUNT> with an AWS account ID you and <YEAR> with the current year.

SELECT DISTINCT useridentity.arn FROM cloudtrail_logs 
WHERE year = '<YEAR>'
AND account = '<ACCOUNT>'
AND eventsource = 'signin.amazonaws.com'
AND eventname = 'ConsoleLogin'
AND responseelements NOT LIKE '%ConsoleLogin%Failure%'
AND from_iso8601_timestamp(eventtime) > date_add('day', -30, now());

In conclusion, querying CloudTrail events with Athena is a mighty way to answer security and compliance related questions. You should have the Athena table set up and be familiar with the possible queries to be able to answer such questions quickly.

Are there any other questions you have answered with the help of CloudTrail events queried by Athena? I’m curious about the examples you send me.

Andreas Wittig

Andreas Wittig

I’ve been building on AWS since 2012 together with my brother Michael. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV,HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.