deepbox/datasets
DataLoader
An iterable batch loader for training loops. Wraps a feature matrix X and target vector y, then yields fixed-size batches on each iteration. Shuffling is re-applied at the start of every iteration (every epoch) so the model sees different batch compositions. Implements the JavaScript iterable protocol — use for...of directly.
DataLoader
Iterable data loader implementing Symbol.iterator. On each iteration, optionally shuffles row indices, then slices X and y into consecutive batches of batchSize rows. The last batch may be smaller unless dropLast is true. Shuffling uses Fisher-Yates and respects the global random seed if set.
DataLoaderOptions
type DataLoaderOptions = { batchSize?: number; // Samples per batch (default: 32) shuffle?: boolean; // Re-shuffle indices each iteration (default: false) dropLast?: boolean; // Drop final batch if < batchSize (default: false)};Constructor
- new DataLoader(X: Tensor, y: Tensor, opts?: DataLoaderOptions)
- X — Feature matrix of shape [nSamples, ...featureDims]
- y — Target tensor of shape [nSamples] or [nSamples, nTargets]
- batchSize — Number of samples per yielded batch (default: 32). Total batches = ceil(nSamples / batchSize).
- shuffle — If true, Fisher-Yates shuffles row indices at the start of every for...of iteration. Essential for stochastic gradient descent.
- dropLast — If true, the final batch is discarded when it contains fewer than batchSize samples. Useful when batch normalization requires consistent batch sizes.
dataloader.ts
import { DataLoader } from "deepbox/datasets";import { tensor } from "deepbox/ndarray";const X = tensor([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]);const y = tensor([0, 1, 0, 1, 0]);const loader = new DataLoader(X, y, { batchSize: 2, shuffle: true,});// Iterate over batches (iterable protocol)for (const [batchX, batchY] of loader) { console.log(batchX.shape); // [2, 2] (last batch: [1, 2]) console.log(batchY.shape); // [2] (last batch: [1])}console.log(`Total batches: ${[...loader].length}`); // 3 (ceil(5/2))Tips
- Always set shuffle: true for training loaders to prevent the model from memorizing data order
- Use dropLast: true when using BatchNorm1d — it requires consistent batch sizes
- Create separate DataLoaders for training (shuffle: true) and evaluation (shuffle: false)
- The loader is re-iterable — each for...of loop reshuffles if shuffle is enabled