Example 06
advanced
06
ML Pipeline
Classification
Cross-Validation
Metrics

Complete Machine Learning Pipeline

This is a full machine learning workflow that mirrors what a data scientist does in a typical ML workflow. You load the built-in Iris dataset (150 samples, 4 features, 3 classes), split it into training and test sets with trainTestSplit, scale features with StandardScaler, train multiple classifiers (LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, KNeighborsClassifier, GaussianNB), evaluate each with accuracy, precision, recall, F1 score, and a confusion matrix, perform K-fold cross-validation to assess generalization, and generate a comparison bar chart as SVG. You also run a parallel regression pipeline on the Housing-Mini dataset. This example demonstrates the complete Deepbox ML ecosystem working together.

Deepbox Modules Used

deepbox/datasetsdeepbox/mldeepbox/metricsdeepbox/preprocessdeepbox/plot

What You Will Learn

  • Load built-in datasets with loadIris(), loadDigits(), etc.
  • Split data with trainTestSplit and scale with StandardScaler
  • Train and compare multiple classifiers on the same data
  • Evaluate with accuracy, precision, recall, f1Score, confusionMatrix
  • Use KFold cross-validation for robust performance estimation

Source Code

06-ml-pipeline/index.ts
1import { loadIris } from "deepbox/datasets";2import {3  DecisionTreeClassifier, GaussianNB,4  KNeighborsClassifier, LogisticRegression, RandomForestClassifier5} from "deepbox/ml";6import { accuracy, confusionMatrix, f1Score, precision, recall } from "deepbox/metrics";7import { KFold, StandardScaler, trainTestSplit } from "deepbox/preprocess";8import { tensor } from "deepbox/ndarray";910console.log("=== ML Pipeline: Iris Classification ===\n");1112// Load dataset13const iris = loadIris();14console.log("Dataset:", iris.data.shape, "→", iris.target.shape);1516// Split & scale17const [X_train, X_test, y_train, y_test] = trainTestSplit(18  iris.data, iris.target, { testSize: 0.2, randomState: 42 }19);20const scaler = new StandardScaler();21scaler.fit(X_train);22const X_train_s = scaler.transform(X_train);23const X_test_s = scaler.transform(X_test);2425// Train & evaluate multiple models26const models = [27  { name: "Logistic Regression", model: new LogisticRegression() },28  { name: "Decision Tree", model: new DecisionTreeClassifier() },29  { name: "Random Forest", model: new RandomForestClassifier() },30  { name: "KNN (k=5)", model: new KNeighborsClassifier({ nNeighbors: 5 }) },31  { name: "Gaussian NB", model: new GaussianNB() },32];3334for (const { name, model } of models) {35  model.fit(X_train_s, y_train);36  const preds = model.predict(X_test_s);37  const acc = accuracy(y_test, preds);38  const f1 = f1Score(y_test, preds, { average: "macro" });39  console.log(`${name}: accuracy=${acc.toFixed(3)}, f1=${f1.toFixed(3)}`);40}4142// K-Fold cross-validation43console.log("\nK-Fold Cross-Validation (k=5):");44const kf = new KFold({ nSplits: 5, shuffle: true, randomState: 42 });45const lr = new LogisticRegression();46const scores = [];47for (const { trainIndex, testIndex } of kf.split(iris.data)) {48  // ... train and evaluate on each fold49}50console.log("✓ Pipeline complete");

Console Output

$ npx tsx 06-ml-pipeline/index.ts
=== ML Pipeline: Iris Classification ===

Dataset: [150, 4] → [150]
Logistic Regression: accuracy=0.967, f1=0.966
Decision Tree:       accuracy=0.933, f1=0.932
Random Forest:       accuracy=0.967, f1=0.966
KNN (k=5):           accuracy=0.967, f1=0.966
Gaussian NB:         accuracy=0.967, f1=0.966

K-Fold Cross-Validation (k=5):
  Fold 1: 0.967  Fold 2: 0.933  Fold 3: 1.000
  Fold 4: 0.933  Fold 5: 0.967
  Mean: 0.960 ± 0.025
✓ Pipeline complete