GitHub
deepbox/preprocess

Encoders

Convert categorical data into numerical representations suitable for machine learning models.

LabelEncoder

Encode target labels as integers 0..n−1. Maps each unique string/value to a unique integer. Use for encoding target variables (y), not features.

OneHotEncoder

Encode categorical features as one-hot binary vectors. Each category becomes a separate binary column. Use for nominal features with no inherent ordering.

OrdinalEncoder

Encode categorical features as integers preserving the given order. Similar to LabelEncoder but for feature columns where order matters (e.g., low < medium < high).

LabelBinarizer

Binarize labels in a one-vs-all fashion. Transforms multi-class labels into binary indicator format.

MultiLabelBinarizer

Binarize multi-label data — each sample can belong to multiple classes simultaneously. Transforms arrays of label sets into a binary indicator matrix where column j is 1 if the sample has label j. Inverse transform recovers the original label sets.

encoders.ts
import { LabelEncoder, OneHotEncoder, OrdinalEncoder, LabelBinarizer } from "deepbox/preprocess";import { tensor } from "deepbox/ndarray";// LabelEncoder: strings → integersconst le = new LabelEncoder();le.fit(["cat", "dog", "fish"]);le.transform(["dog", "cat", "fish"]); // [1, 0, 2]le.inverseTransform([1, 0, 2]);       // ["dog", "cat", "fish"]// OneHotEncoder: categories → binary vectorsconst ohe = new OneHotEncoder();ohe.fit([["red"], ["green"], ["blue"]]);ohe.transform([["red"], ["blue"]]); // [[1,0,0], [0,0,1]]// Binarizer: threshold valuesconst lb = new LabelBinarizer();lb.fit([0, 1, 2]);lb.transform([1, 2]); // [[0,1,0], [0,0,1]]