In this section, we will initialize and configure a SageMaker Training Job to train a machine learning model using preprocessed data from Lambda and stored on S3.
Create Training Job on SageMaker using data in S3.
Configure training parameters such as container, instance type, S3 input/output.
Monitor training progress and verify results.
Go to AWS Management Console → find and open Amazon SageMaker.
In the left navigation bar, select Training jobs → Create training job.

Training job name: ml-pipeline-training-job
IAM Role: Select a role that has access to S3 and SageMaker (e.g., SageMakerExecutionRole).
Algorithm source:
Select Your own algorithm container if you have a custom script.
Or select Built-in algorithm (e.g., XGBoost) for quick testing.

💡 If this is your first time experimenting, we recommend choosing the XGBoost built-in container to simplify the training process.
In the Input data configuration section:
trainAdd a new channel for validation:
validation📁 The S3 structure should be as follows:
ml-pipeline-bucket/
└─ processed/
├─ train/
│ └─ train.csv
└─ validation/
└─ val.csv

ml.m5.large (or choose GPU instance if model requires)110 GB3600 (limited to 1 hour of training)

⚠️ Choose an instance size that fits your budget. Instances like ml.m5.large are in the Free Tier and are powerful enough for demos.
In the Output data configuration section:
SageMaker will save the trained model file (e.g. model.tar.gz) here.

Check all the configurations again.
Click Create training job to start the training process.
📊 Interface when the job is running:

In the list of Training jobs, select the job you just created.
Check the status: InProgress → Completed.
View detailed logs in CloudWatch Logs to monitor the training process.
Once completed, the model file will be saved at: s3://ml-pipeline-bucket/model/model.tar.gz
train/ and validation/ has a valid structure and format.
You have successfully created a SageMaker Training Job and trained the model using preprocessed data from Lambda.