Using Athena and Lambda to get daily notifications about your Cloudfront website requests
The problem
After deploying this website I wanted to track the number of daily visitors. While Cloudfront provides default distribution metrics such as the total number of requests and error percentages, my focus was on obtaining the daily count of unique visitors.
Cloudfront logs offer numerous fields which offer insights into things such as the source IP, HTTP method, protocol version or response times. These logs are available in two formats: standard, which are delivered multiple times per hour, and real-time. Since my requirement was to analyze the daily number of requests without real-time constraints, I opted for standard logs for my query.
The solution
The architecture used for this solution is presented below:

Let’s split it into smaller, sequential steps:
-
Prior to enabling the ‘standard logs’ feature on the Cloudfront distribution you need to create a S3 bucket which will serve as the destination for the Cloudwatch logs. You should enable ACLs on the bucket and adjust the object ownership to something other than ‘bucket owner enforced’.
-
After you create the destination S3 bucket, go to the ‘Telemetry/Logs’ menu in your Cloudfront distribution and enable ‘standard logs’, referencing the destination bucket in the S3 bucket section.
-
The logs will start populating the S3 bucket anytime after 1 hour of enabling them. From my experience they tend to arrive within 5-10 minutes after receiving a request. They are compressed with the GZIP format, however that poses no issues when reading the files with Athena since reading GZIP compressed files is supported out-of-the-box.
-
Before running any queries against the Cloudfront logs, you need to create a table. You can use the below command to create a table containing all the fields documented in the Cloudfront ‘standard logs’ documentation. Remember to replace ‘DESTINATION-BUCKET’ with the name of your destination bucket.
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_standard_logs (
`date` DATE,
time STRING,
x_edge_location STRING,
sc_bytes BIGINT,
c_ip STRING,
cs_method STRING,
cs_host STRING,
cs_uri_stem STRING,
sc_status INT,
cs_referrer STRING,
cs_user_agent STRING,
cs_uri_query STRING,
cs_cookie STRING,
x_edge_result_type STRING,
x_edge_request_id STRING,
x_host_header STRING,
cs_protocol STRING,
cs_bytes BIGINT,
time_taken FLOAT,
x_forwarded_for STRING,
ssl_protocol STRING,
ssl_cipher STRING,
x_edge_response_result_type STRING,
cs_protocol_version STRING,
fle_status STRING,
fle_encrypted_fields INT,
c_port INT,
time_to_first_byte FLOAT,
x_edge_detailed_result_type STRING,
sc_content_type STRING,
sc_content_len BIGINT,
sc_range_start BIGINT,
sc_range_end BIGINT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://DESTINATION-BUCKET/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )
- With the table now created, you can run queries against it. Before you run your first Athena Query you will need to create a destination bucket for your Athena query results. Later you will run the query below as part of a Lambda function to count the unique requests from today. For now you can manually run it in the Athena console to test it:
SELECT COUNT(DISTINCT c_ip) AS unique_viewers FROM cloudfront_standard_logs
WHERE cs_method='GET' AND cs_uri_stem='/' AND date=current_date AND sc_status=200
LIMIT 1000;
-
As part of this architecture you will receive notifications via SNS. Before you create your Lambda function, create a standard SNS topic. After the topic is created you will need to create a SNS subscription with your email address to receive notifications. Remember to verify your email and confirm your subscription after creating it.
-
With the Athena table, Athena query destination bucket and SNS topic now in place, you can create your Lambda function. Since the Lambda function will query Athena and publish a message to SNS, it needs the right permissions. You will need to create a Lambda execution role and grant it permissions to read/write objects in the Cloudfront log destination S3 bucket, run Athena queries, get Athena query results, write Cloudwatch logs and publish a message to a SNS topic. To speed up things, you can attach the below policies to your role (however, in case you deploy a production-grade solution, you need to adhere to the principle of least privilege, the policies listed below are overly permissive):
- AmazonAthenaFullAccess
- AmazonSNSFullAccess
- CloudWatchLogsFullAccess
-
Create your Lambda function and assign the execution role from the previous step. Feel free to use the below code to run an Athena query and publish the result to the SNS topic you created earlier:
import boto3
import time
def lambda_handler(event, context):
# Initiate the Boto3 Client
client_athena = boto3.client('athena')
client_sns = boto3.client('sns')
# Start the query execution
response = client_athena.start_query_execution(
QueryString="SELECT COUNT(DISTINCT c_ip) AS unique_viewers FROM cloudfront_standard_logs WHERE cs_method='GET' AND cs_uri_stem='/' AND date=current_date AND sc_status=200 LIMIT 1000;",
QueryExecutionContext={
'Database': 'default'
},
ResultConfiguration={
'OutputLocation': 's3://YOUR_ATHENA_OUTPUT_BUCKET_NAME/'
}
)
query_id = response['QueryExecutionId']
time.sleep(1)
# Retrieve query results
results = client_athena.get_query_results(QueryExecutionId=query_id)
cloudfront_requests = results['ResultSet']['Rows'][1]['Data'][0]['VarCharValue']
sns_response = client_sns.publish(TopicArn='YOUR_SNS_TOPIC_ARN',Message=f"Your website had {cloudfront_requests} views today")
-
With the Lambda function in place, you need a way to trigger it once a day. You can create an EventBridge Schedule with the Lambda function as a target. If you do it via the AWS console, EventBridge will automatically create an execution role which will allow it to invoke your Lambda function. You will also need to define your cron expresion (e.g. if you want to trigger your lambda function every day at midnight, you can use 00 00 * * ? *)
If you completed all the steps above, congratulations! You should now have a working solution.
N.B. If you keep the ‘standard logs’ enabled, each request will be logged by Cloudfront and a file will be stored in S3. The same applies to the Athena queries that you run from your Lambda function, each query result will be saved as a csv in the Athena query result S3 bucket. You should set a lifecycle policy on both buckets so that logs and query results are purged regularly. In this way you will avoid unecessary storage costs.
comments powered by Disqus