17
Preprocessing
Encoders
Categorical Data
Preprocessing — Encoders
Machine learning models require numeric input, but real-world data often contains categorical variables like 'red', 'green', 'blue' or 'small', 'medium', 'large'. This example demonstrates all 5 Deepbox encoders: LabelEncoder maps categories to integers (red→0, green→1, blue→2), OneHotEncoder creates binary columns for each category, OrdinalEncoder preserves ordering for ordinal data, MultiLabelBinarizer handles multiple labels per sample, and LabelBinarizer converts a single column of categories into a binary matrix. Each encoder supports .fit(), .transform(), and .inverseTransform() for round-trip conversion.
Deepbox Modules Used
deepbox/ndarraydeepbox/preprocessWhat You Will Learn
- LabelEncoder maps categories to integers — use for tree-based models
- OneHotEncoder creates binary columns — required for linear models and neural nets
- OrdinalEncoder preserves ordering for ordinal features (small < medium < large)
- All encoders support .inverseTransform() for decoding predictions back to labels
Source Code
17-preprocessing-encoders/index.ts
1import { tensor } from "deepbox/ndarray";2import { LabelEncoder, OneHotEncoder, OrdinalEncoder, LabelBinarizer } from "deepbox/preprocess";34console.log("=== Preprocessing Encoders ===\n");56// LabelEncoder: categories → integers7const le = new LabelEncoder();8le.fit(tensor(["cat", "dog", "bird", "cat", "dog"]));9const encoded = le.transform(tensor(["cat", "bird", "dog"]));10console.log("LabelEncoder:", encoded.toString()); // [0, 1, 2]11const decoded = le.inverseTransform(encoded);12console.log("Inverse:", decoded.toString()); // ["cat", "bird", "dog"]1314// OneHotEncoder: categories → binary columns15const ohe = new OneHotEncoder();16ohe.fit(tensor(["red", "green", "blue", "red"]));17const onehot = ohe.transform(tensor(["red", "blue", "green"]));18console.log("\nOneHotEncoder:");19console.log(onehot.toString()); // [[1,0,0], [0,0,1], [0,1,0]]2021// OrdinalEncoder: preserves ordering22const oe = new OrdinalEncoder();23oe.fit(tensor(["small", "medium", "large"]));24console.log("\nOrdinalEncoder:", oe.transform(tensor(["large", "small"])).toString());Console Output
$ npx tsx 17-preprocessing-encoders/index.ts
=== Preprocessing Encoders ===
LabelEncoder: [0, 1, 2]
Inverse: ["cat", "bird", "dog"]
OneHotEncoder:
[[1, 0, 0],
[0, 0, 1],
[0, 1, 0]]
OrdinalEncoder: [2, 0]