GitHub
deepbox/optim

Learning Rate Schedulers

Adjust the learning rate during training to improve convergence. Call scheduler.step() after each epoch (or step, depending on the scheduler).

StepLR

Decays learning rate by a factor (gamma) every stepSize epochs. lr = lr₀ · γ^⌊epoch/stepSize⌋. Simple and predictable.

MultiStepLR

Decays learning rate by gamma at each specified milestone epoch. Useful when you know the epochs where learning rate should drop.

ExponentialLR

Decays learning rate by gamma every epoch: lr = lr₀ · γ^epoch. Smooth exponential decay.

CosineAnnealingLR

Anneals learning rate following a cosine curve from lr₀ to etaMin over T_max epochs. Smooth warmup/cooldown behavior. Popular in modern training recipes.

ReduceLROnPlateau

Reduces learning rate when a metric (e.g., validation loss) stops improving for patience epochs. Adaptive — reacts to actual training progress.

LinearLR

Linearly increases or decreases the learning rate over a given number of epochs. Used for learning rate warmup.

OneCycleLR

1-cycle policy: ramps lr from low → max → low over training, with cosine annealing. Achieves super-convergence.

WarmupLR

Learning rate warmup scheduler. Gradually increases learning rate from a small value to the base lr over a specified number of steps. Prevents early training instability.

schedulers.ts
import { Adam, StepLR, CosineAnnealingLR, ReduceLROnPlateau } from "deepbox/optim";import { Sequential, Linear, ReLU } from "deepbox/nn";const model = new Sequential(new Linear(4, 16), new ReLU(), new Linear(16, 1));const optimizer = new Adam(model.parameters(), { lr: 0.01 });// StepLR: decay by 0.1 every 30 epochsconst stepLR = new StepLR(optimizer, { stepSize: 30, gamma: 0.1 });// Cosine annealing: smooth decay over 100 epochsconst cosine = new CosineAnnealingLR(optimizer, { tMax: 100, etaMin: 1e-6 });// Reduce on plateau: lower lr when validation loss stallsconst plateau = new ReduceLROnPlateau(optimizer, { patience: 10, factor: 0.5 });// Training loopfor (let epoch = 0; epoch < 100; epoch++) {  // ... train ...  stepLR.step();              // Update lr after each epoch  // plateau.step(valLoss);   // Or pass the monitored metric}

When to Use

  • StepLR — Simple decay at known intervals. Good baseline.
  • CosineAnnealingLR — Smooth decay. Popular default in modern training.
  • ReduceLROnPlateau — When you want the schedule to adapt to actual progress.
  • OneCycleLR — Fast training with super-convergence. Best with SGD.
  • LinearLR — Learning rate warmup at the start of training.