Complete Machine Learning Pipeline

This is a full machine learning workflow that mirrors what a data scientist does in a typical ML workflow. You load the built-in Iris dataset (150 samples, 4 features, 3 classes), split it into training and test sets with trainTestSplit, scale features with StandardScaler, train multiple classifiers (LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, KNeighborsClassifier, GaussianNB), evaluate each with accuracy, precision, recall, F1 score, and a confusion matrix, perform K-fold cross-validation to assess generalization, and generate a comparison bar chart as SVG. You also run a parallel regression pipeline on the Housing-Mini dataset. This example demonstrates the complete Deepbox ML ecosystem working together.

Deepbox Modules Used

deepbox/datasetsdeepbox/mldeepbox/metricsdeepbox/preprocessdeepbox/plot

What You Will Learn

Load built-in datasets with loadIris(), loadDigits(), etc.
Split data with trainTestSplit and scale with StandardScaler
Train and compare multiple classifiers on the same data
Evaluate with accuracy, precision, recall, f1Score, confusionMatrix
Use KFold cross-validation for robust performance estimation

Source Code

06-ml-pipeline/index.ts

1import { loadIris } from "deepbox/datasets";2import {3  DecisionTreeClassifier, GaussianNB,4  KNeighborsClassifier, LogisticRegression, RandomForestClassifier5} from "deepbox/ml";6import { accuracy, confusionMatrix, f1Score, precision, recall } from "deepbox/metrics";7import { KFold, StandardScaler, trainTestSplit } from "deepbox/preprocess";8import { tensor } from "deepbox/ndarray";910console.log("=== ML Pipeline: Iris Classification ===\n");1112// Load dataset13const iris = loadIris();14console.log("Dataset:", iris.data.shape, "→", iris.target.shape);1516// Split & scale17const [X_train, X_test, y_train, y_test] = trainTestSplit(18  iris.data, iris.target, { testSize: 0.2, randomState: 42 }19);20const scaler = new StandardScaler();21scaler.fit(X_train);22const X_train_s = scaler.transform(X_train);23const X_test_s = scaler.transform(X_test);2425// Train & evaluate multiple models26const models = [27  { name: "Logistic Regression", model: new LogisticRegression() },28  { name: "Decision Tree", model: new DecisionTreeClassifier() },29  { name: "Random Forest", model: new RandomForestClassifier() },30  { name: "KNN (k=5)", model: new KNeighborsClassifier({ nNeighbors: 5 }) },31  { name: "Gaussian NB", model: new GaussianNB() },32];3334for (const { name, model } of models) {35  model.fit(X_train_s, y_train);36  const preds = model.predict(X_test_s);37  const acc = accuracy(y_test, preds);38  const f1 = f1Score(y_test, preds, { average: "macro" });39  console.log(`${name}: accuracy=${acc.toFixed(3)}, f1=${f1.toFixed(3)}`);40}4142// K-Fold cross-validation43console.log("\nK-Fold Cross-Validation (k=5):");44const kf = new KFold({ nSplits: 5, shuffle: true, randomState: 42 });45const lr = new LogisticRegression();46const scores = [];47for (const { trainIndex, testIndex } of kf.split(iris.data)) {48  // ... train and evaluate on each fold49}50console.log("✓ Pipeline complete");

Console Output

$ npx tsx 06-ml-pipeline/index.ts

=== ML Pipeline: Iris Classification ===

Dataset: [150, 4] → [150]
Logistic Regression: accuracy=0.967, f1=0.966
Decision Tree:       accuracy=0.933, f1=0.932
Random Forest:       accuracy=0.967, f1=0.966
KNN (k=5):           accuracy=0.967, f1=0.966
Gaussian NB:         accuracy=0.967, f1=0.966

K-Fold Cross-Validation (k=5):
  Fold 1: 0.967  Fold 2: 0.933  Fold 3: 1.000
  Fold 4: 0.933  Fold 5: 0.967
  Mean: 0.960 ± 0.025
✓ Pipeline complete

DataFrame GroupBy & AggregationPrevious Linear RegressionNext