Overview
Objective
Train ResNet-18, ResNet-50 and SVM classifiers with various hyperparameters on MNIST and FashionMNIST datasets.
Key Tasks
- Q1(a): Deep learning classification with ResNet
- Q1(b): SVM classification with poly & rbf kernels
- Q2: CPU vs GPU performance analysis
Technologies
PyTorch
CUDA
Scikit-learn
AMP
Key Results
99.20%
Best MNIST (ResNet-18)
B16, SGD, LR=0.001
92.60%
Best FashionMNIST (ResNet-50)
B16, Adam, LR=0.0001
97.61%
Best SVM MNIST
poly kernel, C=10.0
30x
GPU Speedup
vs CPU training
Q1(a) Deep Learning Results - MNIST
| Batch Size | Optimizer | Learning Rate | ResNet-18 (%) | ResNet-50 (%) |
|---|---|---|---|---|
| 16 | SGD | 0.001 | 99.20 | 99.11 |
| 16 | SGD | 0.0001 | 97.70 | 97.27 |
| 16 | Adam | 0.001 | 98.19 | 98.46 |
| 16 | Adam | 0.0001 | 99.15 | 98.24 |
| 32 | SGD | 0.001 | 98.95 | 98.67 |
| 32 | SGD | 0.0001 | 96.59 | 94.21 |
| 32 | Adam | 0.001 | 98.96 | 98.75 |
| 32 | Adam | 0.0001 | 98.71 | 98.02 |
* Results with pin_memory=True, Epochs=5
Q1(a) Deep Learning Results - FashionMNIST
| Batch Size | Optimizer | Learning Rate | ResNet-18 (%) | ResNet-50 (%) |
|---|---|---|---|---|
| 16 | SGD | 0.001 | 92.06 | 91.97 |
| 16 | SGD | 0.0001 | 90.41 | 89.63 |
| 16 | Adam | 0.001 | 92.11 | 89.11 |
| 16 | Adam | 0.0001 | 92.43 | 92.60 |
| 32 | SGD | 0.001 | 90.59 | 89.71 |
| 32 | SGD | 0.0001 | 88.86 | 85.04 |
| 32 | Adam | 0.001 | 91.12 | 50.81 |
| 32 | Adam | 0.0001 | 92.06 | 92.43 |
* Results with pin_memory=False, Epochs=10
Q1(b) SVM Classification Results
| Dataset | Kernel | C | Accuracy (%) | Train Time (ms) |
|---|---|---|---|---|
| MNIST | poly | 10.0 | 97.61 | 286,093 |
| MNIST | rbf | 10.0 | 96.99 | 368,092 |
| FashionMNIST | rbf | 10.0 | 89.89 | 479,740 |
| FashionMNIST | poly | 10.0 | 89.84 | 367,438 |
Q2 CPU vs GPU Performance (FashionMNIST)
| Compute | Model | Accuracy (%) | Train Time (ms) | FLOPs |
|---|---|---|---|---|
| GPU | ResNet-18 | 87.24 | 62,192 | 1.824G |
| CPU | ResNet-18 | 87.40 | 1,149,299 | 1.824G |
| GPU | ResNet-50 | 85.69 | 137,549 | 4.132G |
| CPU | ResNet-50 | 83.47 | 4,234,740 | 4.132G |
Key Insights
Optimizer Choice
SGD with LR=0.001 provides best results for quick convergence. Adam with lower LR (0.0001) offers more stable training.
Batch Size Impact
Smaller batch size (16) consistently yields better accuracy. Larger batches train faster but may require LR tuning.
GPU Acceleration
GPU provides 18-30x speedup. Larger models benefit more from GPU parallelization.