deepbox/datasets

DataLoader

An iterable batch loader for training loops. Wraps a feature matrix X and target vector y, then yields fixed-size batches on each iteration. Shuffling is re-applied at the start of every iteration (every epoch) so the model sees different batch compositions. Implements the JavaScript iterable protocol — use for...of directly.

DataLoader

Iterable data loader implementing Symbol.iterator. On each iteration, optionally shuffles row indices, then slices X and y into consecutive batches of batchSize rows. The last batch may be smaller unless dropLast is true. Shuffling uses Fisher-Yates and respects the global random seed if set.

DataLoaderOptions

type DataLoaderOptions = {  batchSize?: number;   // Samples per batch (default: 32)  shuffle?: boolean;    // Re-shuffle indices each iteration (default: false)  dropLast?: boolean;   // Drop final batch if < batchSize (default: false)};

Constructor

new DataLoader(X: Tensor, y: Tensor, opts?: DataLoaderOptions)
X — Feature matrix of shape [nSamples, ...featureDims]
y — Target tensor of shape [nSamples] or [nSamples, nTargets]
batchSize — Number of samples per yielded batch (default: 32). Total batches = ceil(nSamples / batchSize).
shuffle — If true, Fisher-Yates shuffles row indices at the start of every for...of iteration. Essential for stochastic gradient descent.
dropLast — If true, the final batch is discarded when it contains fewer than batchSize samples. Useful when batch normalization requires consistent batch sizes.

dataloader.ts

import { DataLoader } from "deepbox/datasets";import { tensor } from "deepbox/ndarray";const X = tensor([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]);const y = tensor([0, 1, 0, 1, 0]);const loader = new DataLoader(X, y, {  batchSize: 2,  shuffle: true,});// Iterate over batches (iterable protocol)for (const [batchX, batchY] of loader) {  console.log(batchX.shape); // [2, 2] (last batch: [1, 2])  console.log(batchY.shape); // [2]   (last batch: [1])}console.log(`Total batches: ${[...loader].length}`); // 3 (ceil(5/2))

Tips

Always set shuffle: true for training loaders to prevent the model from memorizing data order
Use dropLast: true when using BatchNorm1d — it requires consistent batch sizes
Create separate DataLoaders for training (shuffle: true) and evaluation (shuffle: false)
The loader is re-iterable — each for...of loop reshuffles if shuffle is enabled

Synthetic Data Generators