← Back to Home

Assignment 2

CNN Training on CIFAR-10 with Gradient & Weight Visualization

Overview

Objective

Train a CNN on CIFAR-10 with custom dataloader, visualize gradient flow and weight updates using W&B.

Key Tasks

  • Custom CIFAR-10 DataLoader
  • SimpleCNN model (~500K params)
  • FLOPs counting with ptflops
  • Gradient flow visualization
  • Weight update tracking

Technologies

PyTorch W&B ptflops CUDA

Model & Results

SimpleCNN
Model Architecture
3 Conv Blocks + 2 FC Layers
~500K
Parameters
Lightweight for fast training
30
Epochs
With cosine annealing
W&B
Logging
All visualizations tracked

SimpleCNN Architecture

Layer Type Output Shape Parameters
Input - 3 × 32 × 32 -
Conv Block 1 Conv2d + BN + ReLU + MaxPool 32 × 16 × 16 ~1K
Conv Block 2 Conv2d + BN + ReLU + MaxPool 64 × 8 × 8 ~18K
Conv Block 3 Conv2d + BN + ReLU + MaxPool 128 × 4 × 4 ~74K
FC1 Linear + ReLU + Dropout 256 ~524K
FC2 (Output) Linear 10 ~2.5K

Training Configuration

Parameter Value
Dataset CIFAR-10
Train/Val/Test Split 40K / 10K / 10K
Batch Size 128
Optimizer Adam
Learning Rate 0.001
Weight Decay 1e-4
LR Scheduler Cosine Annealing
Epochs 30

Visualizations

Gradient Flow

Bar charts showing max and average gradient magnitudes per layer. Helps detect vanishing/exploding gradients.

Weight Histograms

Distribution of weights in each layer, logged every 5 epochs to W&B for analysis.

Weight Updates

Tracking how much weights change between epochs, showing learning dynamics.

Training Curves

Loss and accuracy curves for training and validation, plus learning rate schedule.

Key Findings

  • Gradient Flow: Gradients flow properly through all layers without vanishing/exploding issues
  • BatchNorm Effect: Helps maintain stable gradient magnitudes across layers
  • Weight Updates: FC layers show larger updates than conv layers
  • Learning Dynamics: Cosine annealing provides smooth convergence
  • Data Augmentation: RandomCrop and HorizontalFlip improve generalization