Data Splitting
Split arrays into random train and test subsets. Returns [XTrain, XTest, yTrain, yTest].
opts.testSize: number - Fraction of data for testing (default: 0.25)opts.randomState: number - Seed for reproducibilityopts.stratify: Tensor - If provided, ensures each split has the same class distributionopts.shuffle: boolean - Whether to shuffle before splitting (default: true)KFold
K-Fold cross-validation iterator. Splits data into k consecutive folds. Each fold is used once as test set while the remaining k−1 folds form the training set.
StratifiedKFold
Stratified K-Fold that preserves class distribution in each fold. Ensures each fold has approximately the same percentage of each class as the complete set.
LeaveOneOut
Leave-One-Out cross-validation. Each sample is used once as the test set. Equivalent to KFold(n) where n is the number of samples. Computationally expensive.
LeavePOut
Leave-P-Out cross-validation. All possible subsets of p samples are used as the test set. Generalizes LeaveOneOut.
GroupKFold
K-Fold variant that ensures the same group is not in both training and test sets. Useful when samples from the same group (e.g., same patient) should not be split.
import { trainTestSplit, KFold, StratifiedKFold } from "deepbox/preprocess";import { tensor } from "deepbox/ndarray";const X = tensor([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16], [17, 18], [19, 20]]);const y = tensor([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]);// Simple train/test splitconst [XTrain, XTest, yTrain, yTest] = trainTestSplit(X, y, { testSize: 0.2, randomState: 42,});// K-Fold cross-validationconst kf = new KFold({ nSplits: 5, shuffle: true, randomState: 42 });for (const { trainIndex, testIndex } of kf.split(X)) { // trainIndex and testIndex are arrays of indices}// Stratified K-Fold (preserves class distribution)const skf = new StratifiedKFold({ nSplits: 3 });for (const { trainIndex, testIndex } of skf.split(X, y)) { // Each fold has same class ratio as the full dataset}