GitHub
deepbox/ml

Clustering

Unsupervised algorithms that discover natural groupings in unlabeled data. KMeans partitions data into k convex clusters by minimizing within-cluster variance. DBSCAN finds arbitrarily-shaped clusters based on density and automatically identifies outliers. Neither algorithm requires target labels.

KMeans

Partitions n observations into k clusters by iteratively: (1) assigning each point to the nearest centroid, and (2) recomputing centroids as the mean of assigned points. Converges when assignments stop changing. Uses k-means++ initialization (smart seeding) to avoid poor local minima. Inertia (total within-cluster sum of squares) decreases monotonically — use the elbow method to choose k.

DBSCAN

Density-Based Spatial Clustering of Applications with Noise. A point is a core point if at least minSamples neighbors lie within radius eps. Core points that are within eps of each other form a cluster. Non-core points within eps of a core point are border points. All other points are noise (label −1). Key advantage: discovers arbitrarily-shaped clusters and automatically detects outliers. Does NOT require specifying the number of clusters.

KMeans Objective (Inertia)

J = Σₖ Σᵢ∈Cₖ ‖xᵢ − μₖ‖²

Where:

  • μₖ = Centroid of cluster k
  • Cₖ = Set of points assigned to cluster k

KMeans Centroid Update

μₖ = (1/|Cₖ|) Σᵢ∈Cₖ xᵢ

Where:

  • |Cₖ| = Number of points in cluster k

DBSCAN Core Point

|Nε(x)| ≥ minSamples

Where:

  • Nε(x) = Points within radius ε of x
  • minSamples = Minimum neighborhood density

Constructor Parameters

  • KMeans: nClusters (k), maxIter (default: 300), nInit (number of restarts, default: 10), tol (convergence tolerance), randomState
  • DBSCAN: eps (neighborhood radius), minSamples (minimum points for a core point, default: 5)
  • Properties after fit(): .labels (cluster assignments), .clusterCenters (KMeans only), .nClusters (DBSCAN: auto-detected)
clustering.ts
import { KMeans, DBSCAN } from "deepbox/ml";import { tensor } from "deepbox/ndarray";import { silhouetteScore } from "deepbox/metrics";const X = tensor([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]);// ── KMeans: specify number of clusters ──const km = new KMeans({ nClusters: 2, maxIter: 300 });km.fit(X);console.log(km.labels);         // Cluster assignments [0, 0, 1, 1, 0, 1]console.log(km.clusterCenters);  // Centroid positionsconsole.log(km.inertia);         // Within-cluster sum of squareskm.predict(tensor([[3, 3]]));    // Assign new point to nearest cluster// Evaluate with silhouette scoreconsole.log(silhouetteScore(X, km.labels)); // Higher = better separation// ── DBSCAN: density-based (auto-discovers cluster count) ──const db = new DBSCAN({ eps: 3, minSamples: 2 });db.fit(X);console.log(db.labels);  // -1 = noise, 0+ = cluster IDconsole.log(db.nClusters); // Number of clusters found