Clustering
KMeans
Partitions n observations into k clusters by iteratively: (1) assigning each point to the nearest centroid, and (2) recomputing centroids as the mean of assigned points. Converges when assignments stop changing. Uses k-means++ initialization (smart seeding) to avoid poor local minima. Inertia (total within-cluster sum of squares) decreases monotonically — use the elbow method to choose k.
DBSCAN
Density-Based Spatial Clustering of Applications with Noise. A point is a core point if at least minSamples neighbors lie within radius eps. Core points that are within eps of each other form a cluster. Non-core points within eps of a core point are border points. All other points are noise (label −1). Key advantage: discovers arbitrarily-shaped clusters and automatically detects outliers. Does NOT require specifying the number of clusters.
KMeans Objective (Inertia)
Where:
- μₖ = Centroid of cluster k
- Cₖ = Set of points assigned to cluster k
KMeans Centroid Update
Where:
- |Cₖ| = Number of points in cluster k
DBSCAN Core Point
Where:
- Nε(x) = Points within radius ε of x
- minSamples = Minimum neighborhood density
Constructor Parameters
- KMeans: nClusters (k), maxIter (default: 300), nInit (number of restarts, default: 10), tol (convergence tolerance), randomState
- DBSCAN: eps (neighborhood radius), minSamples (minimum points for a core point, default: 5)
- Properties after fit(): .labels (cluster assignments), .clusterCenters (KMeans only), .nClusters (DBSCAN: auto-detected)
import { KMeans, DBSCAN } from "deepbox/ml";import { tensor } from "deepbox/ndarray";import { silhouetteScore } from "deepbox/metrics";const X = tensor([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]);// ── KMeans: specify number of clusters ──const km = new KMeans({ nClusters: 2, maxIter: 300 });km.fit(X);console.log(km.labels); // Cluster assignments [0, 0, 1, 1, 0, 1]console.log(km.clusterCenters); // Centroid positionsconsole.log(km.inertia); // Within-cluster sum of squareskm.predict(tensor([[3, 3]])); // Assign new point to nearest cluster// Evaluate with silhouette scoreconsole.log(silhouetteScore(X, km.labels)); // Higher = better separation// ── DBSCAN: density-based (auto-discovers cluster count) ──const db = new DBSCAN({ eps: 3, minSamples: 2 });db.fit(X);console.log(db.labels); // -1 = noise, 0+ = cluster IDconsole.log(db.nClusters); // Number of clusters found