Learning Rate Schedulers
StepLR
Decays learning rate by a factor (gamma) every stepSize epochs. lr = lr₀ · γ^⌊epoch/stepSize⌋. Simple and predictable.
MultiStepLR
Decays learning rate by gamma at each specified milestone epoch. Useful when you know the epochs where learning rate should drop.
ExponentialLR
Decays learning rate by gamma every epoch: lr = lr₀ · γ^epoch. Smooth exponential decay.
CosineAnnealingLR
Anneals learning rate following a cosine curve from lr₀ to etaMin over T_max epochs. Smooth warmup/cooldown behavior. Popular in modern training recipes.
ReduceLROnPlateau
Reduces learning rate when a metric (e.g., validation loss) stops improving for patience epochs. Adaptive — reacts to actual training progress.
LinearLR
Linearly increases or decreases the learning rate over a given number of epochs. Used for learning rate warmup.
OneCycleLR
1-cycle policy: ramps lr from low → max → low over training, with cosine annealing. Achieves super-convergence.
WarmupLR
Learning rate warmup scheduler. Gradually increases learning rate from a small value to the base lr over a specified number of steps. Prevents early training instability.
import { Adam, StepLR, CosineAnnealingLR, ReduceLROnPlateau } from "deepbox/optim";import { Sequential, Linear, ReLU } from "deepbox/nn";const model = new Sequential(new Linear(4, 16), new ReLU(), new Linear(16, 1));const optimizer = new Adam(model.parameters(), { lr: 0.01 });// StepLR: decay by 0.1 every 30 epochsconst stepLR = new StepLR(optimizer, { stepSize: 30, gamma: 0.1 });// Cosine annealing: smooth decay over 100 epochsconst cosine = new CosineAnnealingLR(optimizer, { tMax: 100, etaMin: 1e-6 });// Reduce on plateau: lower lr when validation loss stallsconst plateau = new ReduceLROnPlateau(optimizer, { patience: 10, factor: 0.5 });// Training loopfor (let epoch = 0; epoch < 100; epoch++) { // ... train ... stepLR.step(); // Update lr after each epoch // plateau.step(valLoss); // Or pass the monitored metric}When to Use
- StepLR — Simple decay at known intervals. Good baseline.
- CosineAnnealingLR — Smooth decay. Popular default in modern training.
- ReduceLROnPlateau — When you want the schedule to adapt to actual progress.
- OneCycleLR — Fast training with super-convergence. Best with SGD.
- LinearLR — Learning rate warmup at the start of training.