Learning Rate Schedule Visualizer

Schedule Type

Configuration

Warmup Steps 100

Total Steps 5000

Max LR 3e-4

Min LR (decay floor) 3e-5

Schedule Plot

Show comparison overlay

Off

Compare all schedule types

0

Peak LR

0

Final LR

0

Average LR

0%

Warmup %

Loss Curve Estimate

How different LR schedules affect training loss. Lighter line = underlying noise, bold = smoothed trend. Bad schedules cause instability, slow convergence, or early plateau.

Bad Schedules Gallery

Click any card to load its configuration and see why it fails.

Constant, Too High

Constant LR at 3e-3. Loss often diverges (NaN) within the first 100 steps. The optimizer overshoots and never recovers.

Diverges

Constant, Too Low

Constant LR at 1e-5. Training is stable but painfully slow. Needs 5-10x more steps to reach the same loss as a good schedule.

Slow

No Warmup

Cosine decay with 0 warmup steps. The first few updates are chaotic with noisy gradients at full LR, causing early instability.

Unstable

Warmup Too Long

50% of steps spent in warmup. Most of the training budget is wasted at sub-optimal LRs when the model could be learning faster.

Wasted compute

Well-Tuned

100 warmup steps + cosine decay to 10% of max. Fast early progress, smooth convergence. The gold standard for transformers.

Recommended

Aggressive Decay

Decays to 0 in very few steps. The model gets stuck early with minimal LR, converging to a poor local minimum.

Too fast decay

Export: get_lr()

Copy this Python function directly into your training script.