24
Metrics
Evaluation
Classification
Regression
Model Evaluation Metrics
Choosing the right evaluation metric is as important as choosing the right model. This example covers three categories: Classification metrics (accuracy, precision, recall, F1 score, confusion matrix, ROC AUC, classification report, log loss, Matthews correlation, Cohen's kappa, balanced accuracy, Jaccard score), Regression metrics (MSE, RMSE, MAE, MAPE, R², adjusted R², explained variance, max error, median absolute error), and Clustering metrics (silhouette score, Calinski-Harabasz index, Davies-Bouldin index, adjusted Rand index, normalized mutual info, completeness, homogeneity, V-measure, Fowlkes-Mallows). Each metric is computed on sample data with an explanation of what it measures and when to use it.
Deepbox Modules Used
deepbox/ndarraydeepbox/metricsWhat You Will Learn
- Accuracy is misleading for imbalanced data — use F1 or balanced accuracy
- Precision measures false positive rate; Recall measures false negative rate
- MSE penalizes large errors quadratically; MAE treats all errors equally
- R² close to 1.0 means the model explains most variance in the data
- Silhouette score evaluates clustering quality without ground truth labels
Source Code
24-metrics/index.ts
1import { tensor } from "deepbox/ndarray";2import {3 accuracy, precision, recall, f1Score, confusionMatrix,4 mse, rmse, mae, r2Score,5 silhouetteScore6} from "deepbox/metrics";78// Classification metrics9const yTrue = tensor([0, 1, 1, 0, 1, 0, 1, 1]);10const yPred = tensor([0, 1, 0, 0, 1, 1, 1, 1]);1112console.log("=== Classification ===");13console.log("Accuracy: ", accuracy(yTrue, yPred).toFixed(3));14console.log("Precision:", precision(yTrue, yPred).toFixed(3));15console.log("Recall: ", recall(yTrue, yPred).toFixed(3));16console.log("F1 Score: ", f1Score(yTrue, yPred).toFixed(3));17console.log("Confusion Matrix:");18console.log(confusionMatrix(yTrue, yPred).toString());1920// Regression metrics21const yTrueReg = tensor([3.0, -0.5, 2.0, 7.0]);22const yPredReg = tensor([2.5, 0.0, 2.1, 7.8]);2324console.log("\n=== Regression ===");25console.log("MSE: ", mse(yTrueReg, yPredReg).toFixed(4));26console.log("RMSE:", rmse(yTrueReg, yPredReg).toFixed(4));27console.log("MAE: ", mae(yTrueReg, yPredReg).toFixed(4));28console.log("R²: ", r2Score(yTrueReg, yPredReg).toFixed(4));Console Output
$ npx tsx 24-metrics/index.ts
=== Classification ===
Accuracy: 0.750
Precision: 0.800
Recall: 0.800
F1 Score: 0.800
Confusion Matrix:
[[2, 1],
[1, 4]]
=== Regression ===
MSE: 0.2150
RMSE: 0.4637
MAE: 0.3500
R²: 0.9717