Assignment 4 - MLOps Portfolio

Overview

Objective

Tune a custom PyTorch Transformer for English → Hindi translation so it can match or beat the notebook baseline BLEU in fewer epochs.

What Changed

Notebook training loop refactored into a reusable script
Ray Tune search with Optuna-guided trial selection
ASHA early stopping for cheaper sweeps
Auto-generated summary, markdown report, and PDF export

Core Stack

PyTorch Ray Tune Optuna ASHA NLTK BLEU

Submission Layout

The code, report, and docs live in the GitHub repository, while the large `.pth` checkpoints are hosted on Hugging Face Hub to avoid GitHub's file-size limit.

Baseline

52.32 min

Training Time

100 epochs recorded in `en_to_hi.ipynb`

0.0974

Final Loss

Notebook checkpoint average loss

52.47

BLEU Score

NLTK corpus BLEU from the notebook

100

Baseline Epochs

Hardcoded notebook run

Final Results

76.98

Best Sweep BLEU

Best Ray Tune trial after 20 tuning epochs

84.72

Best Final BLEU

Best model after final retraining

6

Epochs To Beat Baseline

Exceeded the 52.47 BLEU notebook baseline early

14.22 min

Final Training Time

20-epoch best-model run

Best Configuration

Hyperparameter	Best Value
Learning Rate	8.8844e-05
Batch Size	16
Attention Heads	4
Feedforward Dimension	1536
Dropout	0.1902
Number of Layers	4
Weight Decay	1.1495e-06

Completed Sweep Summary

Item	Value
Sweep Size	20 trials × 20 epochs
Scheduler	ASHAScheduler on epoch-level BLEU
Search Algorithm	OptunaSearch
Best Trial Loss	0.5256
Final Best-Model Loss	0.5256
Efficiency Goal	Met by epoch 6

Tuning Setup

Search Space

Hyperparameter	Range
Learning Rate	loguniform(1e-5, 1e-3)
Batch Size	{16, 32, 64}
Attention Heads	{4, 8}
Feedforward Dimension	{1024, 1536, 2048}
Dropout	uniform(0.10, 0.40)
Number of Layers	{4, 6}
Weight Decay	loguniform(1e-6, 1e-3)

Ray Tune Function

`train_tune(config)` builds the model, loader, optimizer, and criterion from the sampled config and reports epoch metrics to Ray.

Search Strategy

`OptunaSearch` proposes configurations while `ASHAScheduler` prunes bad trials early to save GPU time.

Metrics

Each epoch reports training loss and BLEU, and the best config is re-trained into a standalone model artifact.

Compatibility Note

The current Ray version in the repository environment uses `tune.report(...)` inside Tune trials for compatibility.

Implementation Notes

The pipeline was smoke-tested first and then scaled to the final 20-trial sweep used for submission.

Step	Outcome
Baseline training script	Notebook metrics captured for the report baseline
Ray workers reading dataset	Validated with absolute dataset paths
Metric reporting and best-trial extraction	`tune.report(...)` feeds ASHA and Optuna each epoch
Summary/report generation	Markdown, JSON, and PDF artifacts exported

Deliverables

Code

`b23cs1075_ass_4_tuned_en_to_hi.py` contains baseline, tuning, final training, recovery, and report generation.

Repository root

Report

`b23cs1075_ass_4_report.md` is regenerated from the latest summary JSON, and `b23cs1075_ass_4_report.pdf` is exported for submission.

Repository root

Model Artifacts

`b23cs1075_ass_4_best_model.pth` and `transformer_translation_final.pth` are hosted on Hugging Face Hub for download.

HF model repository

Metrics + Configs

`artifacts/assignment4/` stores baseline metrics, best config, final metrics, tuning summary, and the combined submission summary.

Artifacts directory

Open GitHub Repository Open HF Model Repository

Run Commands

uv pip install --python ../ops_venv/bin/python -r assignment4_requirements.txt
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action baseline
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action tune --num-samples 20 --tune-epochs 20 --cpus-per-trial 4 --gpus-per-trial 1
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action final --final-epochs 20 --target-bleu 0.5247
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action report