Overview
Objective
Train a CNN on CIFAR-10 with custom dataloader, visualize gradient flow and weight updates using W&B.
Key Tasks
- Custom CIFAR-10 DataLoader
- SimpleCNN model (~500K params)
- FLOPs counting with ptflops
- Gradient flow visualization
- Weight update tracking
Technologies
PyTorch
W&B
ptflops
CUDA
Model & Results
SimpleCNN
Model Architecture
3 Conv Blocks + 2 FC Layers
~500K
Parameters
Lightweight for fast training
30
Epochs
With cosine annealing
W&B
Logging
All visualizations tracked
SimpleCNN Architecture
| Layer | Type | Output Shape | Parameters |
|---|---|---|---|
| Input | - | 3 × 32 × 32 | - |
| Conv Block 1 | Conv2d + BN + ReLU + MaxPool | 32 × 16 × 16 | ~1K |
| Conv Block 2 | Conv2d + BN + ReLU + MaxPool | 64 × 8 × 8 | ~18K |
| Conv Block 3 | Conv2d + BN + ReLU + MaxPool | 128 × 4 × 4 | ~74K |
| FC1 | Linear + ReLU + Dropout | 256 | ~524K |
| FC2 (Output) | Linear | 10 | ~2.5K |
Training Configuration
| Parameter | Value |
|---|---|
| Dataset | CIFAR-10 |
| Train/Val/Test Split | 40K / 10K / 10K |
| Batch Size | 128 |
| Optimizer | Adam |
| Learning Rate | 0.001 |
| Weight Decay | 1e-4 |
| LR Scheduler | Cosine Annealing |
| Epochs | 30 |
Visualizations
Gradient Flow
Bar charts showing max and average gradient magnitudes per layer. Helps detect vanishing/exploding gradients.
Weight Histograms
Distribution of weights in each layer, logged every 5 epochs to W&B for analysis.
Weight Updates
Tracking how much weights change between epochs, showing learning dynamics.
Training Curves
Loss and accuracy curves for training and validation, plus learning rate schedule.
Key Findings
- Gradient Flow: Gradients flow properly through all layers without vanishing/exploding issues
- BatchNorm Effect: Helps maintain stable gradient magnitudes across layers
- Weight Updates: FC layers show larger updates than conv layers
- Learning Dynamics: Cosine annealing provides smooth convergence
- Data Augmentation: RandomCrop and HorizontalFlip improve generalization