deepbox/dataframe

DataFrame

A tabular data structure with labeled columns. Each column stores homogeneous data (numbers, strings, or booleans) and all columns share the same row index. Supports chainable operations: selection, filtering, sorting, grouping, aggregation, joining, CSV I/O, and conversion to Tensors for numerical computing. All operations return new DataFrames (immutable).

new DataFrame

new DataFrame(data: Record<string, Array<number | string | boolean>>)

Create a DataFrame from a column-oriented object. Each key becomes a column name, each value is an array of column data. All arrays must have the same length. Column order is preserved from the object key insertion order.

Parameters:

data: Record<string, Array<number | string | boolean>> - Column name → values mapping. All arrays must be the same length.

DataFrame.fromCsvString

DataFrame.fromCsvString(csvString: string, opts?: { delimiter?: string; header?: boolean }): DataFrame

Parse a CSV string into a DataFrame. Auto-detects numeric columns and converts them from strings. If header is false, columns are named 'col_0', 'col_1', etc.

Properties

.columns: string[] — Ordered array of column names
.shape: [number, number] — [nRows, nColumns] tuple
.index: (string | number)[] — Row labels (defaults to 0, 1, 2, ...)

Function	Description	Example
.select(columns: string[])	Select columns by name → new DataFrame with only those columns	`df.select(['name', 'salary'])`
.drop(columns: string[])	Remove columns by name → new DataFrame without those columns	`df.drop(['id'])`
.filter(fn: (row) => boolean)	Keep rows where predicate returns true	`df.filter(r => r.age > 25)`
.head(n?: number)	First n rows (default: 5)	`df.head(10)`
.tail(n?: number)	Last n rows (default: 5)	`df.tail(3)`
.iloc(index: number)	Single row by integer position → row object	`df.iloc(0)`
.loc(row: string \| number)	Single row by index label → row object	`df.loc('row1')`
.slice(start: number, end: number)	Row range [start, end)	`df.slice(10, 20)`

Function	Description	Example
.sort(by: string \| string[], ascending?: boolean)	Sort by one or more columns. ascending defaults to true.	`df.sort('salary', false)`
.sample(n: number, random_state?: number)	Random sample of n rows without replacement	`df.sample(50)`

Function	Description	Example
.rename(mapper: Record<string, string> \| ((name: string) => string), axis?: 0 \| 1)	Rename columns (axis=1, default) or index labels (axis=0)	`df.rename({ name: 'full_name' })`
.apply(fn: (series: Series) => Series, axis?: 0 \| 1)	Apply function to each column (axis=0) or row (axis=1)	`df.apply(s => s.map(x => Number(x) * 2))`
.fillna(value: unknown)	Replace null/undefined/NaN values	`df.fillna(0)`
.dropna()	Remove rows containing any null/undefined/NaN	`df.dropna()`
.replace(toReplace: unknown \| unknown[], value: unknown)	Replace all occurrences of toReplace with value across all columns	`df.replace('NA', 'unknown')`
.clip(lower?: number, upper?: number)	Clip numeric values to [lower, upper] range	`df.clip(0, 100)`
.isnull()	Boolean DataFrame: true where values are null/undefined/NaN	`df.isnull()`
.duplicated(subset?: string[], keep?: 'first' \| 'last' \| false)	Boolean Series marking duplicate rows	`df.duplicated(['name'])`

Function	Description	Example
.describe()	Summary statistics (count, mean, std, min, 25%, 50%, 75%, max) for all numeric columns → DataFrame	`df.describe()`
.quantile(q: number)	Quantile value for each numeric column → Series	`df.quantile(0.5)`
.cov()	Pairwise covariance matrix of numeric columns → DataFrame	`df.cov()`
.corr()	Pairwise Pearson correlation matrix of all numeric columns → DataFrame	`df.corr()`

Function	Description	Example
.groupBy(col: string)	Group rows by distinct values of a column → GroupedDataFrame	`df.groupBy('department')`
.groupBy(col).agg(spec)	Aggregate each group. spec: { column: 'mean'\|'sum'\|'count'\|'min'\|'max'\|'std' } → DataFrame	`df.groupBy('dept').agg({ salary: 'mean' })`
.groupBy(col).apply(fn)	Apply a function to each group's DataFrame → combined DataFrame	`df.groupBy('dept').apply(g => g.head(1))`
.groupBy(col).sum() / .mean() / .min() / .max() / .std() / .count()	Shorthand aggregation methods on grouped data → DataFrame	`df.groupBy('department').mean()`

Function	Description	Example
.merge(other, opts)	SQL-style join. opts: { on: string, how: 'inner'\|'left'\|'right'\|'outer' }	`df.merge(other, { on: 'id', how: 'left' })`
.concat(other: DataFrame)	Vertically stack two DataFrames (must have same columns)	`df.concat(df2)`
.join(other, on: string)	Join on a matching column (shorthand for inner merge)	`df.join(lookup, 'dept_id')`

Function	Description	Example
.toCsvString(opts?)	Export as a CSV string with header row → string	`fs.writeFileSync('out.csv', df.toCsvString())`
.toCsv(path: string, opts?)	Write DataFrame to a CSV file (async, Node.js)	`await df.toCsv('output.csv')`
.toArray()	Convert to array of row objects → Array<Record<string, any>>	`df.toArray().forEach(row => ...)`
.toTensor()	Convert all numeric columns to a 2D Tensor → Tensor [nRows, nCols]	`df.select(['age', 'salary']).toTensor()`
.toString()	Pretty-printed table string for console output	`console.log(df.toString())`

dataframe.ts

import { DataFrame } from "deepbox/dataframe";const df = new DataFrame({  name: ["Alice", "Bob", "Charlie", "David", "Eve"],  age: [25, 30, 35, 28, 22],  salary: [50000, 60000, 75000, 55000, 48000],  department: ["IT", "HR", "IT", "HR", "IT"],});console.log(df.shape);    // [5, 4]console.log(df.columns);  // ['name', 'age', 'salary', 'department']// ── Selection & filtering ──df.select(["name", "salary"]);           // 2-column DataFramedf.filter((row) => row.age > 25);       // 3 rows matchingdf.head(3);                              // First 3 rows// ── Sorting ──df.sort("salary", false);                // Highest salary firstdf.sort(["department", "salary"]);       // Multi-column sort// ── GroupBy aggregation ──const byDept = df.groupBy("department").agg({  salary: "mean",  age: "max",});console.log(byDept.toString());// ── Statistics ──console.log(df.describe().toString()); // count, mean, std, min, 25%, 50%, 75%, maxdf.corr();                              // Correlation matrix of numeric columns// ── I/O ──const csv = df.toCsvString();const restored = DataFrame.fromCsvString(csv);const t = df.select(["age", "salary"]).toTensor(); // shape: [5, 2]

Properties & Norms

Series