deepbox/dataframe
DataFrame
A tabular data structure with labeled columns. Each column stores homogeneous data (numbers, strings, or booleans) and all columns share the same row index. Supports chainable operations: selection, filtering, sorting, grouping, aggregation, joining, CSV I/O, and conversion to Tensors for numerical computing. All operations return new DataFrames (immutable).
new DataFrame
new DataFrame(data: Record<string, Array<number | string | boolean>>)
Create a DataFrame from a column-oriented object. Each key becomes a column name, each value is an array of column data. All arrays must have the same length. Column order is preserved from the object key insertion order.
Parameters:
data: Record<string, Array<number | string | boolean>> - Column name → values mapping. All arrays must be the same length.DataFrame.fromCsvString
DataFrame.fromCsvString(csvString: string, opts?: { delimiter?: string; header?: boolean }): DataFrame
Parse a CSV string into a DataFrame. Auto-detects numeric columns and converts them from strings. If header is false, columns are named 'col_0', 'col_1', etc.
Properties
- .columns: string[] — Ordered array of column names
- .shape: [number, number] — [nRows, nColumns] tuple
- .index: (string | number)[] — Row labels (defaults to 0, 1, 2, ...)
| Function | Description | Example |
|---|---|---|
| .select(columns: string[]) | Select columns by name → new DataFrame with only those columns | df.select(['name', 'salary']) |
| .drop(columns: string[]) | Remove columns by name → new DataFrame without those columns | df.drop(['id']) |
| .filter(fn: (row) => boolean) | Keep rows where predicate returns true | df.filter(r => r.age > 25) |
| .head(n?: number) | First n rows (default: 5) | df.head(10) |
| .tail(n?: number) | Last n rows (default: 5) | df.tail(3) |
| .iloc(index: number) | Single row by integer position → row object | df.iloc(0) |
| .loc(row: string | number) | Single row by index label → row object | df.loc('row1') |
| .slice(start: number, end: number) | Row range [start, end) | df.slice(10, 20) |
| Function | Description | Example |
|---|---|---|
| .sort(by: string | string[], ascending?: boolean) | Sort by one or more columns. ascending defaults to true. | df.sort('salary', false) |
| .sample(n: number, random_state?: number) | Random sample of n rows without replacement | df.sample(50) |
| Function | Description | Example |
|---|---|---|
| .rename(mapper: Record<string, string> | ((name: string) => string), axis?: 0 | 1) | Rename columns (axis=1, default) or index labels (axis=0) | df.rename({ name: 'full_name' }) |
| .apply(fn: (series: Series) => Series, axis?: 0 | 1) | Apply function to each column (axis=0) or row (axis=1) | df.apply(s => s.map(x => Number(x) * 2)) |
| .fillna(value: unknown) | Replace null/undefined/NaN values | df.fillna(0) |
| .dropna() | Remove rows containing any null/undefined/NaN | df.dropna() |
| .replace(toReplace: unknown | unknown[], value: unknown) | Replace all occurrences of toReplace with value across all columns | df.replace('NA', 'unknown') |
| .clip(lower?: number, upper?: number) | Clip numeric values to [lower, upper] range | df.clip(0, 100) |
| .isnull() | Boolean DataFrame: true where values are null/undefined/NaN | df.isnull() |
| .duplicated(subset?: string[], keep?: 'first' | 'last' | false) | Boolean Series marking duplicate rows | df.duplicated(['name']) |
| Function | Description | Example |
|---|---|---|
| .describe() | Summary statistics (count, mean, std, min, 25%, 50%, 75%, max) for all numeric columns → DataFrame | df.describe() |
| .quantile(q: number) | Quantile value for each numeric column → Series | df.quantile(0.5) |
| .cov() | Pairwise covariance matrix of numeric columns → DataFrame | df.cov() |
| .corr() | Pairwise Pearson correlation matrix of all numeric columns → DataFrame | df.corr() |
| Function | Description | Example |
|---|---|---|
| .groupBy(col: string) | Group rows by distinct values of a column → GroupedDataFrame | df.groupBy('department') |
| .groupBy(col).agg(spec) | Aggregate each group. spec: { column: 'mean'|'sum'|'count'|'min'|'max'|'std' } → DataFrame | df.groupBy('dept').agg({ salary: 'mean' }) |
| .groupBy(col).apply(fn) | Apply a function to each group's DataFrame → combined DataFrame | df.groupBy('dept').apply(g => g.head(1)) |
| .groupBy(col).sum() / .mean() / .min() / .max() / .std() / .count() | Shorthand aggregation methods on grouped data → DataFrame | df.groupBy('department').mean() |
| Function | Description | Example |
|---|---|---|
| .merge(other, opts) | SQL-style join. opts: { on: string, how: 'inner'|'left'|'right'|'outer' } | df.merge(other, { on: 'id', how: 'left' }) |
| .concat(other: DataFrame) | Vertically stack two DataFrames (must have same columns) | df.concat(df2) |
| .join(other, on: string) | Join on a matching column (shorthand for inner merge) | df.join(lookup, 'dept_id') |
| Function | Description | Example |
|---|---|---|
| .toCsvString(opts?) | Export as a CSV string with header row → string | fs.writeFileSync('out.csv', df.toCsvString()) |
| .toCsv(path: string, opts?) | Write DataFrame to a CSV file (async, Node.js) | await df.toCsv('output.csv') |
| .toArray() | Convert to array of row objects → Array<Record<string, any>> | df.toArray().forEach(row => ...) |
| .toTensor() | Convert all numeric columns to a 2D Tensor → Tensor [nRows, nCols] | df.select(['age', 'salary']).toTensor() |
| .toString() | Pretty-printed table string for console output | console.log(df.toString()) |
dataframe.ts
import { DataFrame } from "deepbox/dataframe";const df = new DataFrame({ name: ["Alice", "Bob", "Charlie", "David", "Eve"], age: [25, 30, 35, 28, 22], salary: [50000, 60000, 75000, 55000, 48000], department: ["IT", "HR", "IT", "HR", "IT"],});console.log(df.shape); // [5, 4]console.log(df.columns); // ['name', 'age', 'salary', 'department']// ── Selection & filtering ──df.select(["name", "salary"]); // 2-column DataFramedf.filter((row) => row.age > 25); // 3 rows matchingdf.head(3); // First 3 rows// ── Sorting ──df.sort("salary", false); // Highest salary firstdf.sort(["department", "salary"]); // Multi-column sort// ── GroupBy aggregation ──const byDept = df.groupBy("department").agg({ salary: "mean", age: "max",});console.log(byDept.toString());// ── Statistics ──console.log(df.describe().toString()); // count, mean, std, min, 25%, 50%, 75%, maxdf.corr(); // Correlation matrix of numeric columns// ── I/O ──const csv = df.toCsvString();const restored = DataFrame.fromCsvString(csv);const t = df.select(["age", "salary"]).toTensor(); // shape: [5, 2]