Example 24
intermediate
24
Metrics
Evaluation
Classification
Regression

Model Evaluation Metrics

Choosing the right evaluation metric is as important as choosing the right model. This example covers three categories: Classification metrics (accuracy, precision, recall, F1 score, confusion matrix, ROC AUC, classification report, log loss, Matthews correlation, Cohen's kappa, balanced accuracy, Jaccard score), Regression metrics (MSE, RMSE, MAE, MAPE, R², adjusted R², explained variance, max error, median absolute error), and Clustering metrics (silhouette score, Calinski-Harabasz index, Davies-Bouldin index, adjusted Rand index, normalized mutual info, completeness, homogeneity, V-measure, Fowlkes-Mallows). Each metric is computed on sample data with an explanation of what it measures and when to use it.

Deepbox Modules Used

deepbox/ndarraydeepbox/metrics

What You Will Learn

  • Accuracy is misleading for imbalanced data — use F1 or balanced accuracy
  • Precision measures false positive rate; Recall measures false negative rate
  • MSE penalizes large errors quadratically; MAE treats all errors equally
  • R² close to 1.0 means the model explains most variance in the data
  • Silhouette score evaluates clustering quality without ground truth labels

Source Code

24-metrics/index.ts
1import { tensor } from "deepbox/ndarray";2import {3  accuracy, precision, recall, f1Score, confusionMatrix,4  mse, rmse, mae, r2Score,5  silhouetteScore6} from "deepbox/metrics";78// Classification metrics9const yTrue = tensor([0, 1, 1, 0, 1, 0, 1, 1]);10const yPred = tensor([0, 1, 0, 0, 1, 1, 1, 1]);1112console.log("=== Classification ===");13console.log("Accuracy: ", accuracy(yTrue, yPred).toFixed(3));14console.log("Precision:", precision(yTrue, yPred).toFixed(3));15console.log("Recall:   ", recall(yTrue, yPred).toFixed(3));16console.log("F1 Score: ", f1Score(yTrue, yPred).toFixed(3));17console.log("Confusion Matrix:");18console.log(confusionMatrix(yTrue, yPred).toString());1920// Regression metrics21const yTrueReg = tensor([3.0, -0.5, 2.0, 7.0]);22const yPredReg = tensor([2.5, 0.0, 2.1, 7.8]);2324console.log("\n=== Regression ===");25console.log("MSE: ", mse(yTrueReg, yPredReg).toFixed(4));26console.log("RMSE:", rmse(yTrueReg, yPredReg).toFixed(4));27console.log("MAE: ", mae(yTrueReg, yPredReg).toFixed(4));28console.log("R²:  ", r2Score(yTrueReg, yPredReg).toFixed(4));

Console Output

$ npx tsx 24-metrics/index.ts
=== Classification ===
Accuracy:  0.750
Precision: 0.800
Recall:    0.800
F1 Score:  0.800
Confusion Matrix:
[[2, 1],
 [1, 4]]

=== Regression ===
MSE:  0.2150
RMSE: 0.4637
MAE:  0.3500
R²:   0.9717