
In this section, we will deploy the trained model from step 5 to Amazon SageMaker Endpoint, allowing inference calls via API or Lambda function. This is an important step to turn your ML model into a service that can be used in the real world.
After training and registering the model (step 5), we will create a SageMaker Model based on that output.
ml-blog-modelSageMakerExecutionRole)382416733822.dkr.ecr.ap-southeast-1.amazonaws.com/xgboost:latest (or the image you used when training)s3://ml-pipeline-bucket/model/xgboost-model.tar.gz

📌 Note: The container image and artifact path must match the previously created train job.
ml-blog-endpoint-configml-blog-modelml.m5.large (or ml.t2.medium if you want to save costs)1
ml-blog-endpointml-blog-endpoint-config📸 Example deployment interface:

AmazonSageMakerFullAccess, AmazonS3ReadOnlyAccess).Once the endpoint is in InService state, check the inference with the following Python code:
import boto3
import json
runtime = boto3.client('sagemaker-runtime')
payload = {
"features": [0.56, 0.32, 0.78, 0.12] # example input data
}
response = runtime.invoke_endpoint(
EndpointName='ml-blog-endpoint',
ContentType='application/json',
Body=json.dumps(payload)
)
result = json.loads(response['Body'].read().decode())
print("📊 Predicted result:", result)
.
Go to CloudWatch → Logs to view the inference log. Monitor metrics such as:
This helps evaluate model performance in production environments.
📌 You can enable Auto Scaling for the endpoint by using Application Auto Scaling to automatically scale up/down the number of instances based on inference traffic.
You have successfully deployed a SageMaker Endpoint from the trained model.