Synthetic Data Generators
Generate isotropic Gaussian blobs for clustering and classification. Each cluster is sampled from N(μₖ, σ²I) where μₖ is the cluster center and σ is clusterStd. Centers are randomly placed unless explicitly provided. The default produces well-separated clusters ideal for testing KMeans.
opts.nSamples: number - Total number of points equally divided among clusters (default: 100)opts.nFeatures: number - Dimensionality of each point (default: 2)opts.centers: number | number[][] - Number of clusters or explicit center coordinates (default: 3)opts.clusterStd: number - Standard deviation of each cluster (default: 1.0). Lower values → tighter clusters.opts.randomState: number - Seed for reproducible generationGenerate two concentric circles — an inner ring (class 0) and an outer ring (class 1). Points are uniformly distributed around each circle with optional Gaussian noise. Linear classifiers cannot separate these classes; use kernel SVM, decision trees, or neural networks instead. The factor parameter controls how close the two circles are.
opts.nSamples: number - Total number of points, split equally between inner and outer circles (default: 100)opts.noise: number - Standard deviation of Gaussian noise added to each point (default: 0)opts.factor: number - Ratio of inner circle radius to outer circle radius, in (0, 1). 0.5 means the inner circle has half the radius. (default: 0.8)opts.randomState: number - Seed for reproducibilityGenerate two interleaving half-circle (crescent/moon) shapes. Class 0 is the upper moon, class 1 is the lower moon shifted right and down. With low noise, the two moons interlock but do not overlap. Another classic non-linear binary classification test — widely used to demonstrate decision boundaries.
opts.nSamples: number - Total points split equally between the two moons (default: 100)opts.noise: number - Standard deviation of Gaussian noise (default: 0)opts.randomState: number - Seed for reproducibilityGenerate a random n-class classification problem with fine-grained control over difficulty. Creates nInformative truly useful features, nRedundant linear combinations of informative features, and fills the rest with noise. Flip a fraction of labels to inject label noise. This is the most flexible generator for stress-testing classifiers.
opts.nSamples: number - Number of samples (default: 100)opts.nFeatures: number - Total number of features (default: 20)opts.nInformative: number - Number of informative features (default: 2)opts.nRedundant: number - Number of redundant (linear combo) features (default: 2)opts.nClasses: number - Number of classes (default: 2)opts.flipY: number - Fraction of labels to randomly flip (default: 0.01)opts.randomState: number - Seed for reproducibilityGenerate a random linear regression problem: y = Xw + noise. Features are drawn from N(0, 1), and the true coefficient vector w is returned so you can verify your model recovers it. Only nInformative features have non-zero coefficients; the rest are noise columns.
opts.nSamples: number - Number of samples (default: 100)opts.nFeatures: number - Total number of features (default: 10)opts.nInformative: number - Features with non-zero coefficients (default: 10)opts.noise: number - Standard deviation of Gaussian noise added to y (default: 0)opts.randomState: number - Seed for reproducibilityGenerate an isotropic Gaussian cloud and partition samples into classes based on quantiles of the Mahalanobis distance from the origin. This creates concentric, roughly spherical class boundaries. With 2 classes the result looks like makeCircles but in arbitrary dimensions; with more classes you get nested shells.
opts.nSamples: number - Number of samples (default: 100)opts.nFeatures: number - Number of features (default: 2)opts.nClasses: number - Number of quantile-based classes (default: 3)opts.randomState: number - Seed for reproducibilitymakeBlobs
Where:
- μₖ = Center of cluster k
- σ = clusterStd parameter
makeRegression
Where:
- w = True coefficient vector (returned as coef)
- ε = Gaussian noise
makeGaussianQuantiles
Where:
- ‖x‖₂ = Euclidean distance from origin
import { makeBlobs, makeCircles, makeMoons, makeRegression, makeClassification } from "deepbox/datasets";// ── Gaussian blobs for clustering ──const [X, y] = makeBlobs({ nSamples: 300, centers: 3, clusterStd: 0.5, randomState: 42,});console.log(X.shape); // [300, 2]console.log(y.shape); // [300] — cluster labels// ── Concentric circles (non-linear binary) ──const [circlesX, circlesY] = makeCircles({ nSamples: 200, noise: 0.05, factor: 0.5 });// ── Interleaving moons (non-linear binary) ──const [moonsX, moonsY] = makeMoons({ nSamples: 200, noise: 0.1 });// ── Regression ──const [regX, regY] = makeRegression({ nSamples: 100, nFeatures: 5, noise: 0.1, randomState: 42 });console.log(regX.shape); // [100, 5]// ── Complex classification with noise ──const [clsX, clsY] = makeClassification({ nSamples: 500, nFeatures: 20, nInformative: 5, nRedundant: 3, nClasses: 4, flipY: 0.05, randomState: 42,});Choosing a Generator
- makeBlobs — Clustering (KMeans, DBSCAN), Gaussian mixture testing, simple multiclass classification
- makeCircles — Non-linear binary classification benchmarks (kernel SVM, neural nets vs linear models)
- makeMoons — Non-linear binary classification with interleaving structure (decision boundary visualization)
- makeClassification — Stress-testing classifiers with controlled difficulty, feature redundancy, and label noise
- makeRegression — Verifying regression models recover known coefficients; testing regularization behavior
- makeGaussianQuantiles — Non-linear multiclass in arbitrary dimensions with concentric spherical boundaries