In this section, we will learn how to validate training results to ensure that the model has been successfully trained and is of good quality before deploying to production. This is a very important step in the ML pipeline process because it helps you evaluate the quality of the model, check the output and decide whether to deploy or need to re-tune the data/hyperparameters.
Go to AWS Management Console → Amazon SageMaker → Training jobs to see the list of training jobs created in step 5.2 – Create Training Job.
Completed – Completed successfully.Failed – Training failed (check log).InProgress – Training in progress.
If the status is Failed, check the CloudWatch logs to determine the cause (e.g., S3 permission error, training script code error, or missing data).
After successful training, SageMaker will save the model to S3 at the path you specified: s3://ml-pipeline-bucket/model/model.tar.gz
model.tar.gz contains the serialized model (e.g., pickle, joblib, or framework format like TensorFlow SavedModel).
You can download this file to your computer and check the structure to ensure the content is valid.

Go to CloudWatch Logs → Log groups → find the log group corresponding to the training job.
View the log output from the training script - you will see information like:
[INFO] Training accuracy: 0.912
[INFO] Validation accuracy: 0.905
[INFO] Model saved to /opt/ml/model/
📌 If you write metrics to stdout or save to /opt/ml/output/metrics.json, SageMaker will automatically collect them to display in the interface.

Metrics are the main measure to decide whether the model can be put into production or not.
Some common metrics:
| Metric | Significance |
|---|---|
| Accuracy | Ratio of correct predictions to total samples. |
| Precision | Accuracy when the model predicts Positive. |
| Recall | Ability to detect all Positive cases. |
| F1-Score | Harmonic average between Precision and Recall. |
| AUC-ROC | Ability to discriminate between classes. |
You should set a minimum threshold for these metrics (e.g. Accuracy > 0.90) to automatically approve or reject the model before registration.
import joblib
model = joblib.load("model.tar.gz")
print(model.predict([[5.1, 3.5, 1.4, 0.2]]))
If using TensorFlow/PyTorch, check with the load_model() or torch.load() function.
If the model cannot be loaded, the training script may not have saved the model in the correct format.
Based on the validation results, you can make the following decisions:
✅ If the model meets the requirements → Continue registering to the Model Registry or deploying.
⚠️ If the model is not good enough → Adjust data, hyperparameters or model → retrain.
📊 Example of evaluation results:
| Metric | Value | Required threshold | Conclusion |
|---|---|---|---|
| Accuracy | 0.912 | >0.90 | ✅ Passed |
| Precision | 0.908 | >0.85 | ✅ Passed |
| Recall | 0.901 | >0.85 | ✅ Passed |
| F1-Score | 0.905 | >0.85 | ✅ Passed |
![]() |