
Amazon CloudWatch is a system monitoring and observation service on AWS. In this project, CloudWatch helps monitor the entire inference pipeline — from Lambda, SageMaker Endpoint, to DynamoDB — to ensure performance, detect errors early, and optimize costs.
/aws/lambda/ml-inference-lambda).latencyMs).
/aws/sagemaker/Endpoints/ and select the corresponding endpoint.
Combine Lambda and SageMaker logs for faster error diagnosis when inference fails.
For more detailed monitoring (e.g., inferences per minute, average latency), you can send Custom Metrics from Lambda to CloudWatch.
Update the Lambda function as follows:
import boto3
import time
import os
cloudwatch = boto3.client('cloudwatch')
def publish_metrics(latency_ms, success=True):
cloudwatch.put_metric_data(
Namespace='InferencePipeline',
MetricData=[
{
'MetricName': 'LatencyMs',
'Value': latency_ms,
'Unit': 'Milliseconds'
},
{
'MetricName': 'SuccessCount' if success else 'ErrorCount',
'Value': 1,
'Unit': 'Count'
}
]
)
# Call this function after each successful inference
publish_metrics(latency_ms, success=True)
Dashboard helps you monitor performance in real time, supporting model and resource optimization.
To receive alerts when the system has problems:

If you don’t see metrics, check Lambda’s IAM permissions (cloudwatch:PutMetricData). Make sure Lambda sends metrics after each inference. Check the time zone when reading dashboard data.