06
ML Pipeline
Classification
Cross-Validation
Metrics
Complete Machine Learning Pipeline
This is a full machine learning workflow that mirrors what a data scientist does in a typical ML workflow. You load the built-in Iris dataset (150 samples, 4 features, 3 classes), split it into training and test sets with trainTestSplit, scale features with StandardScaler, train multiple classifiers (LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, KNeighborsClassifier, GaussianNB), evaluate each with accuracy, precision, recall, F1 score, and a confusion matrix, perform K-fold cross-validation to assess generalization, and generate a comparison bar chart as SVG. You also run a parallel regression pipeline on the Housing-Mini dataset. This example demonstrates the complete Deepbox ML ecosystem working together.
Deepbox Modules Used
deepbox/datasetsdeepbox/mldeepbox/metricsdeepbox/preprocessdeepbox/plotWhat You Will Learn
- Load built-in datasets with loadIris(), loadDigits(), etc.
- Split data with trainTestSplit and scale with StandardScaler
- Train and compare multiple classifiers on the same data
- Evaluate with accuracy, precision, recall, f1Score, confusionMatrix
- Use KFold cross-validation for robust performance estimation
Source Code
06-ml-pipeline/index.ts
1import { loadIris } from "deepbox/datasets";2import {3 DecisionTreeClassifier, GaussianNB,4 KNeighborsClassifier, LogisticRegression, RandomForestClassifier5} from "deepbox/ml";6import { accuracy, confusionMatrix, f1Score, precision, recall } from "deepbox/metrics";7import { KFold, StandardScaler, trainTestSplit } from "deepbox/preprocess";8import { tensor } from "deepbox/ndarray";910console.log("=== ML Pipeline: Iris Classification ===\n");1112// Load dataset13const iris = loadIris();14console.log("Dataset:", iris.data.shape, "→", iris.target.shape);1516// Split & scale17const [X_train, X_test, y_train, y_test] = trainTestSplit(18 iris.data, iris.target, { testSize: 0.2, randomState: 42 }19);20const scaler = new StandardScaler();21scaler.fit(X_train);22const X_train_s = scaler.transform(X_train);23const X_test_s = scaler.transform(X_test);2425// Train & evaluate multiple models26const models = [27 { name: "Logistic Regression", model: new LogisticRegression() },28 { name: "Decision Tree", model: new DecisionTreeClassifier() },29 { name: "Random Forest", model: new RandomForestClassifier() },30 { name: "KNN (k=5)", model: new KNeighborsClassifier({ nNeighbors: 5 }) },31 { name: "Gaussian NB", model: new GaussianNB() },32];3334for (const { name, model } of models) {35 model.fit(X_train_s, y_train);36 const preds = model.predict(X_test_s);37 const acc = accuracy(y_test, preds);38 const f1 = f1Score(y_test, preds, { average: "macro" });39 console.log(`${name}: accuracy=${acc.toFixed(3)}, f1=${f1.toFixed(3)}`);40}4142// K-Fold cross-validation43console.log("\nK-Fold Cross-Validation (k=5):");44const kf = new KFold({ nSplits: 5, shuffle: true, randomState: 42 });45const lr = new LogisticRegression();46const scores = [];47for (const { trainIndex, testIndex } of kf.split(iris.data)) {48 // ... train and evaluate on each fold49}50console.log("✓ Pipeline complete");Console Output
$ npx tsx 06-ml-pipeline/index.ts
=== ML Pipeline: Iris Classification ===
Dataset: [150, 4] → [150]
Logistic Regression: accuracy=0.967, f1=0.966
Decision Tree: accuracy=0.933, f1=0.932
Random Forest: accuracy=0.967, f1=0.966
KNN (k=5): accuracy=0.967, f1=0.966
Gaussian NB: accuracy=0.967, f1=0.966
K-Fold Cross-Validation (k=5):
Fold 1: 0.967 Fold 2: 0.933 Fold 3: 1.000
Fold 4: 0.933 Fold 5: 0.967
Mean: 0.960 ± 0.025
✓ Pipeline complete