Example 22
beginner
22
Datasets
Data Loading

Built-in Datasets

Deepbox ships with 24 real-world datasets and 6 synthetic data generators, enabling you to start experimenting immediately without downloading external data. The built-in datasets include classic ML benchmarks (Iris, Digits, Breast Cancer, Diabetes, Linnerud) and domain-specific datasets (Housing, Student Performance, Weather, Crop Yield, Customer Segments, Energy Efficiency, and more). Synthetic generators (makeBlobs, makeCircles, makeMoons, makeClassification, makeRegression, makeGaussianQuantiles) let you create datasets with known properties for testing algorithms. This example loads several datasets, prints their shapes and feature names, and generates synthetic data with controlled difficulty.

Deepbox Modules Used

deepbox/datasets

What You Will Learn

  • 24 built-in datasets ready for immediate use — no downloads needed
  • Classic benchmarks: Iris (classification), Digits (image), Diabetes (regression)
  • Synthetic generators let you control difficulty, noise, and structure
  • makeBlobs for clustering, makeCircles/makeMoons for non-linear classification
  • makeRegression generates data with known true coefficients for verification

Source Code

22-datasets/index.ts
1import {2  loadIris, loadDigits, loadBreastCancer, loadDiabetes,3  loadHousingMini, loadStudentPerformance,4  makeBlobs, makeCircles, makeMoons, makeRegression5} from "deepbox/datasets";67console.log("=== Built-in Datasets ===\n");89// Classic datasets10const iris = loadIris();11console.log("Iris:", iris.data.shape, "features:", iris.featureNames);12console.log("  Classes:", iris.targetNames);1314const digits = loadDigits();15console.log("\nDigits:", digits.data.shape, "images:", digits.images.shape);1617const cancer = loadBreastCancer();18console.log("Breast Cancer:", cancer.data.shape, "→", cancer.targetNames);1920const diabetes = loadDiabetes();21console.log("Diabetes:", diabetes.data.shape, "(regression)");2223const housing = loadHousingMini();24console.log("Housing:", housing.data.shape);2526const students = loadStudentPerformance();27console.log("Students:", students.data.shape);2829// Synthetic generators30console.log("\n--- Synthetic Data ---");31const blobs = makeBlobs({ nSamples: 300, centers: 3, randomState: 42 });32console.log("makeBlobs:", blobs.X.shape, "→", 3, "clusters");3334const circles = makeCircles({ nSamples: 200, noise: 0.05, factor: 0.5 });35console.log("makeCircles:", circles.X.shape, "→ concentric rings");3637const moons = makeMoons({ nSamples: 200, noise: 0.1 });38console.log("makeMoons:", moons.X.shape, "→ interleaving crescents");3940const reg = makeRegression({ nSamples: 100, nFeatures: 5, noise: 0.1 });41console.log("makeRegression:", reg.X.shape, "→", reg.y.shape);

Console Output

$ npx tsx 22-datasets/index.ts
=== Built-in Datasets ===

Iris: [150, 4] features: ["sepal length", "sepal width", "petal length", "petal width"]
  Classes: ["setosa", "versicolor", "virginica"]

Digits: [1797, 64] images: [1797, 8, 8]
Breast Cancer: [569, 30] → ["malignant", "benign"]
Diabetes: [442, 10] (regression)
Housing: [100, 6]
Students: [100, 5]

--- Synthetic Data ---
makeBlobs: [300, 2] → 3 clusters
makeCircles: [200, 2] → concentric rings
makeMoons: [200, 2] → interleaving crescents
makeRegression: [100, 5] → [100]