Training Batch Shape Debugger

See exactly how your dataset gets sliced into training batches. Compute samples, tokens, and memory for any configuration.

Week 1 · Day 2 Week 3 · Day 2

Dataset Configuration

Dataset size 1.00M

Sequence length (seq_len) 128

Batch size 32

Training steps (N) 1000

Sliding window (stride=1) Instead of non-overlapping chunks, slide by 1 token for maximum samples per epoch.

Batch Statistics

7,812

Samples per Epoch

sequences in the dataset

244

Batches per Epoch

at current batch size

4.1M

Tokens after N steps

32 × 128 × 1000 steps

0.5 MB

Input/ target tensor memory

FP32, one batch

Tokens per batch: 4,096

Input shape: (32, 128)

Target shape: (32, 128)

Epochs after N steps: 4.1

Chunking Visualizer

See how a sample text gets sliced into input and target chunks with the one-position offset. Each cell is one token ID.

Sample text (modify to see different slicing)

Input tokens (x)

Target tokens (y)

Both (overlap)

How to read this: Each row is one sample. The blue cells are the model's input (x), the green cells are the targets (y). Notice how y is x shifted left by one position: the model predicts the next token at every position simultaneously. Purple cells appear where a token serves as both input (for predicting the next position) and target (predicted from the previous position).

Formulas & Details

▶

Non-overlapping chunks:
samples_per_epoch = dataset_size ÷ seq_len
batches_per_epoch = samples_per_epoch ÷ batch_size

Sliding window (stride=1):
samples_per_epoch = dataset_size − seq_len + 1

Memory per batch (FP32):
bytes = 2 × batch_size × seq_len × 4 bytes
(2 tensors: inputs + targets)

Tokens seen after N steps:
tokens = batch_size × seq_len × N
epochs = tokens ÷ dataset_size

Key insight: With non-overlapping chunks, each token appears in exactly one input position per epoch. With stride=1, each token appears in seq_len different samples per epoch, giving the model more opportunities to learn from each token but at the cost of seeing highly correlated samples.