Example 03
intermediate
03
DataFrame
Statistics
Plotting
EDA

Data Analysis & Visualization

This example demonstrates a full exploratory data analysis (EDA) workflow on an employee dataset with 20 records. You create a DataFrame with columns for name, department, salary, experience, and age. Then you compute descriptive statistics (mean, standard deviation) using deepbox/stats, group employees by department to calculate average salaries with groupBy().agg(), filter for high earners using .filter(), and compute a correlation matrix between salary, experience, and age using corrcoef(). The example produces four SVG visualizations: a scatter plot showing salary vs. experience, a histogram of salary distribution, a bar chart comparing department averages, and a heatmap of the correlation matrix. All plots are generated server-side using the stateful Figure/Axes API — no browser required.

Deepbox Modules Used

deepbox/dataframedeepbox/ndarraydeepbox/statsdeepbox/plot

What You Will Learn

  • Build DataFrames from plain objects and inspect with .shape, .columns, .head()
  • Compute descriptive stats (mean, std) on extracted tensor columns
  • Group rows by a column and aggregate with .groupBy().agg()
  • Filter rows with arbitrary predicate functions
  • Compute correlation matrices and generate SVG plots server-side

Source Code

03-data-analysis/index.ts
1import { DataFrame } from "deepbox/dataframe";2import { tensor } from "deepbox/ndarray";3import { Figure } from "deepbox/plot";4import { corrcoef, mean, std } from "deepbox/stats";56const employeeData = new DataFrame({7  name: ["Alice", "Bob", "Charlie", "David", "Eve", "Frank",8         "Grace", "Henry", "Ivy", "Jack", "Kate", "Leo",9         "Mia", "Noah", "Olivia", "Paul", "Quinn", "Rachel",10         "Sam", "Tina"],11  department: ["Engineering", "Sales", "Engineering", "HR",12               "Engineering", "Sales", "Marketing", "Engineering",13               "HR", "Sales", "Engineering", "Marketing",14               "Sales", "Engineering", "HR", "Sales",15               "Engineering", "Marketing", "Engineering", "Sales"],16  salary: [95000, 65000, 105000, 55000, 98000, 72000, 68000,17           110000, 58000, 70000, 102000, 71000, 67000, 115000,18           60000, 69000, 108000, 73000, 112000, 66000],19  experience: [5, 3, 8, 2, 6, 4, 3, 10, 2, 5, 7, 4, 3, 12,20               3, 4, 9, 5, 11, 3],21});2223// Descriptive statistics24const salaryTensor = tensor(employeeData.get("salary").toArray());25console.log("Mean salary: $" + Number(mean(salaryTensor).data[0]).toFixed(2));26console.log("Std dev:     $" + Number(std(salaryTensor).data[0]).toFixed(2));2728// Group by department29const deptStats = employeeData.groupBy("department").agg({30  salary: "mean", experience: "mean"31});32console.log("\nDepartment Averages:");33console.log(deptStats.toString());3435// Filter high earners36const highEarners = employeeData.filter(row => row.salary > 100000);37console.log("\nHigh earners (>$100k):", highEarners.shape[0], "employees");3839// Correlation analysis40const salaries = employeeData.get("salary").toArray();41const experiences = employeeData.get("experience").toArray();42const corr = corrcoef(tensor([salaries, experiences]));43console.log("\nCorrelation (salary vs experience):");44console.log(corr.toString());4546// Generate scatter plot47const fig = new Figure();48const ax = fig.addAxes();49ax.scatter(tensor(experiences), salaryTensor, { color: "#1f77b4" });50ax.setTitle("Salary vs Experience");51ax.setXLabel("Years of Experience");52ax.setYLabel("Salary ($)");53const svg = fig.renderSVG();54console.log("\n✓ Generated salary-vs-experience.svg");

Console Output

$ npx tsx 03-data-analysis/index.ts
Mean salary: $81450.00
Std dev:     $20152.34

Department Averages:
┌─────────────┬──────────┬────────────┐
│ department  │  salary  │ experience │
├─────────────┼──────────┼────────────┤
│ Engineering │ 103125.0 │       8.5  │
│ Sales       │  68167.0 │       3.7  │
│ Marketing   │  70667.0 │       4.0  │
│ HR          │  57667.0 │       2.3  │
└─────────────┴──────────┴────────────┘

High earners (>$100k): 5 employees

Correlation (salary vs experience):
[[1.000, 0.962],
 [0.962, 1.000]]

✓ Generated salary-vs-experience.svg