deepbox/ml
Manifold Learning
Non-linear dimensionality reduction that preserves local neighborhood structure. Unlike PCA (which preserves global variance), manifold methods model the underlying low-dimensional surface (manifold) on which high-dimensional data lies. Primarily used for visualization of complex datasets in 2D/3D.
TSNE
t-Distributed Stochastic Neighbor Embedding. Converts pairwise Euclidean distances to conditional probabilities representing similarity, then minimizes KL divergence between the high-dimensional and low-dimensional probability distributions. The heavy-tailed Student-t distribution in the low-dimensional space prevents the crowding problem. Perplexity controls the effective number of neighbors — typically between 5 and 50. Not deterministic without a seed.
t-SNE Objective
C = KL(P ‖ Q) = Σᵢ Σⱼ pᵢⱼ log(pᵢⱼ / qᵢⱼ)
Where:
- pᵢⱼ = High-dimensional pairwise similarity (Gaussian)
- qᵢⱼ = Low-dimensional pairwise similarity (Student-t)
Low-Dimensional Similarity
qᵢⱼ = (1 + ‖yᵢ − yⱼ‖²)⁻¹ / Σₖ≠ₗ (1 + ‖yₖ − yₗ‖²)⁻¹
Where:
- yᵢ = Low-dimensional embedding of point i
Constructor Parameters
- nComponents: number — Target dimensionality, usually 2 or 3 (default: 2)
- perplexity: number — Effective number of neighbors. Typical range: 5–50. Larger values → more global structure preserved (default: 30)
- learningRate: number — Step size for gradient descent (default: 200)
- nIter: number — Maximum number of optimization iterations (default: 1000)
- randomState: number — Seed for reproducibility
tsne.ts
import { TSNE } from "deepbox/ml";import { loadIris } from "deepbox/datasets";import { figure, scatter, saveFig } from "deepbox/plot";const { data, target } = loadIris();// Reduce 4D iris features to 2D for visualizationconst tsne = new TSNE({ nComponents: 2, perplexity: 30, randomState: 42 });tsne.fit(data);const X2d = tsne.transform(data); // shape: [150, 2]console.log(X2d.shape); // [150, 2]console.log(target.shape); // [150]