Overview
Objective
Tune a custom PyTorch Transformer for English → Hindi translation so it can match or beat the notebook baseline BLEU in fewer epochs.
What Changed
- Notebook training loop refactored into a reusable script
- Ray Tune search with Optuna-guided trial selection
- ASHA early stopping for cheaper sweeps
- Auto-generated summary, markdown report, and PDF export
Core Stack
Submission Layout
The code, report, and docs live in the GitHub repository, while the large `.pth` checkpoints are hosted on Hugging Face Hub to avoid GitHub's file-size limit.
Baseline
Final Results
Best Configuration
| Hyperparameter | Best Value |
|---|---|
| Learning Rate | 8.8844e-05 |
| Batch Size | 16 |
| Attention Heads | 4 |
| Feedforward Dimension | 1536 |
| Dropout | 0.1902 |
| Number of Layers | 4 |
| Weight Decay | 1.1495e-06 |
Completed Sweep Summary
| Item | Value |
|---|---|
| Sweep Size | 20 trials × 20 epochs |
| Scheduler | ASHAScheduler on epoch-level BLEU |
| Search Algorithm | OptunaSearch |
| Best Trial Loss | 0.5256 |
| Final Best-Model Loss | 0.5256 |
| Efficiency Goal | Met by epoch 6 |
Tuning Setup
Search Space
| Hyperparameter | Range |
|---|---|
| Learning Rate | loguniform(1e-5, 1e-3) |
| Batch Size | {16, 32, 64} |
| Attention Heads | {4, 8} |
| Feedforward Dimension | {1024, 1536, 2048} |
| Dropout | uniform(0.10, 0.40) |
| Number of Layers | {4, 6} |
| Weight Decay | loguniform(1e-6, 1e-3) |
Ray Tune Function
`train_tune(config)` builds the model, loader, optimizer, and criterion from the sampled config and reports epoch metrics to Ray.
Search Strategy
`OptunaSearch` proposes configurations while `ASHAScheduler` prunes bad trials early to save GPU time.
Metrics
Each epoch reports training loss and BLEU, and the best config is re-trained into a standalone model artifact.
Compatibility Note
The current Ray version in the repository environment uses `tune.report(...)` inside Tune trials for compatibility.
Implementation Notes
The pipeline was smoke-tested first and then scaled to the final 20-trial sweep used for submission.
| Step | Outcome |
|---|---|
| Baseline training script | Notebook metrics captured for the report baseline |
| Ray workers reading dataset | Validated with absolute dataset paths |
| Metric reporting and best-trial extraction | `tune.report(...)` feeds ASHA and Optuna each epoch |
| Summary/report generation | Markdown, JSON, and PDF artifacts exported |
Deliverables
Code
`b23cs1075_ass_4_tuned_en_to_hi.py` contains baseline, tuning, final training, recovery, and report generation.
Repository rootReport
`b23cs1075_ass_4_report.md` is regenerated from the latest summary JSON, and `b23cs1075_ass_4_report.pdf` is exported for submission.
Repository rootModel Artifacts
`b23cs1075_ass_4_best_model.pth` and `transformer_translation_final.pth` are hosted on Hugging Face Hub for download.
HF model repositoryMetrics + Configs
`artifacts/assignment4/` stores baseline metrics, best config, final metrics, tuning summary, and the combined submission summary.
Artifacts directoryRun Commands
uv pip install --python ../ops_venv/bin/python -r assignment4_requirements.txt
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action baseline
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action tune --num-samples 20 --tune-epochs 20 --cpus-per-trial 4 --gpus-per-trial 1
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action final --final-epochs 20 --target-bleu 0.5247
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action report