← Back to Home

Assignment 4

Optimizing Transformer Translation with Ray Tune & Optuna

Overview

Objective

Tune a custom PyTorch Transformer for English → Hindi translation so it can match or beat the notebook baseline BLEU in fewer epochs.

What Changed

  • Notebook training loop refactored into a reusable script
  • Ray Tune search with Optuna-guided trial selection
  • ASHA early stopping for cheaper sweeps
  • Auto-generated summary, markdown report, and PDF export

Core Stack

PyTorch Ray Tune Optuna ASHA NLTK BLEU

Submission Layout

The code, report, and docs live in the GitHub repository, while the large `.pth` checkpoints are hosted on Hugging Face Hub to avoid GitHub's file-size limit.

Baseline

52.32 min
Training Time
100 epochs recorded in `en_to_hi.ipynb`
0.0974
Final Loss
Notebook checkpoint average loss
52.47
BLEU Score
NLTK corpus BLEU from the notebook
100
Baseline Epochs
Hardcoded notebook run

Final Results

76.98
Best Sweep BLEU
Best Ray Tune trial after 20 tuning epochs
84.72
Best Final BLEU
Best model after final retraining
6
Epochs To Beat Baseline
Exceeded the 52.47 BLEU notebook baseline early
14.22 min
Final Training Time
20-epoch best-model run

Best Configuration

Hyperparameter Best Value
Learning Rate 8.8844e-05
Batch Size 16
Attention Heads 4
Feedforward Dimension 1536
Dropout 0.1902
Number of Layers 4
Weight Decay 1.1495e-06

Completed Sweep Summary

Item Value
Sweep Size 20 trials × 20 epochs
Scheduler ASHAScheduler on epoch-level BLEU
Search Algorithm OptunaSearch
Best Trial Loss 0.5256
Final Best-Model Loss 0.5256
Efficiency Goal Met by epoch 6

Tuning Setup

Search Space

Hyperparameter Range
Learning Rate loguniform(1e-5, 1e-3)
Batch Size {16, 32, 64}
Attention Heads {4, 8}
Feedforward Dimension {1024, 1536, 2048}
Dropout uniform(0.10, 0.40)
Number of Layers {4, 6}
Weight Decay loguniform(1e-6, 1e-3)

Ray Tune Function

`train_tune(config)` builds the model, loader, optimizer, and criterion from the sampled config and reports epoch metrics to Ray.

Search Strategy

`OptunaSearch` proposes configurations while `ASHAScheduler` prunes bad trials early to save GPU time.

Metrics

Each epoch reports training loss and BLEU, and the best config is re-trained into a standalone model artifact.

Compatibility Note

The current Ray version in the repository environment uses `tune.report(...)` inside Tune trials for compatibility.

Implementation Notes

The pipeline was smoke-tested first and then scaled to the final 20-trial sweep used for submission.

Step Outcome
Baseline training script Notebook metrics captured for the report baseline
Ray workers reading dataset Validated with absolute dataset paths
Metric reporting and best-trial extraction `tune.report(...)` feeds ASHA and Optuna each epoch
Summary/report generation Markdown, JSON, and PDF artifacts exported

Deliverables

Code

`b23cs1075_ass_4_tuned_en_to_hi.py` contains baseline, tuning, final training, recovery, and report generation.

Repository root

Report

`b23cs1075_ass_4_report.md` is regenerated from the latest summary JSON, and `b23cs1075_ass_4_report.pdf` is exported for submission.

Repository root

Model Artifacts

`b23cs1075_ass_4_best_model.pth` and `transformer_translation_final.pth` are hosted on Hugging Face Hub for download.

HF model repository

Metrics + Configs

`artifacts/assignment4/` stores baseline metrics, best config, final metrics, tuning summary, and the combined submission summary.

Artifacts directory

Run Commands

uv pip install --python ../ops_venv/bin/python -r assignment4_requirements.txt
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action baseline
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action tune --num-samples 20 --tune-epochs 20 --cpus-per-trial 4 --gpus-per-trial 1
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action final --final-epochs 20 --target-bleu 0.5247
../ops_venv/bin/python b23cs1075_ass_4_tuned_en_to_hi.py --action report