Sentiment Analysis System

This project builds a sentiment analysis system that classifies text as positive or negative. It generates a synthetic corpus of 500 product reviews using carefully curated positive and negative word lists, then extracts features using a bag-of-words / TF-IDF approach: each document is represented as a fixed-length vector where each dimension corresponds to a vocabulary word's frequency. Two classifiers are trained and compared: Logistic Regression (linear decision boundary in feature space) and Gaussian Naive Bayes (assumes feature independence given the class label). The project evaluates both models with accuracy, precision, recall, F1 score, and confusion matrix, then analyzes which words are most indicative of each sentiment by examining the Logistic Regression coefficients. A visualization shows the model comparison and per-class metrics. This demonstrates how Deepbox handles text classification without any NLP-specific libraries — using tensor operations, classical ML, and standard metrics.

Features

Synthetic review corpus with positive/negative sentiment labels
Bag-of-words and TF-IDF feature extraction from text
Logistic Regression and Gaussian Naive Bayes classification
Full evaluation: accuracy, precision, recall, F1, confusion matrix
Feature importance analysis: most predictive words per sentiment
SVG model comparison visualization

Deepbox Modules Used

deepbox/mldeepbox/preprocessdeepbox/metricsdeepbox/dataframedeepbox/ndarraydeepbox/plot

Project Architecture

index.ts — Complete NLP pipeline: text generation → feature extraction → training → evaluation → analysis

Source Code

06-sentiment-analysis/index.ts

1import { DataFrame } from "deepbox/dataframe";2import { accuracy, confusionMatrix, f1Score, precision, recall } from "deepbox/metrics";3import { GaussianNB, LogisticRegression } from "deepbox/ml";4import { tensor } from "deepbox/ndarray";5import { Figure } from "deepbox/plot";6import { StandardScaler, trainTestSplit } from "deepbox/preprocess";78console.log("=== Sentiment Analysis ===\n");910// Word lists11const POSITIVE = ["good", "great", "excellent", "amazing", "wonderful",12                   "love", "recommend", "perfect", "outstanding", "superb"];13const NEGATIVE = ["bad", "terrible", "awful", "horrible", "worst",14                   "hate", "disappointing", "poor", "waste", "useless"];1516// Generate 500 synthetic reviews17const nSamples = 500;18// ... generate reviews with positive/negative word mixtures ...1920// Build vocabulary and extract bag-of-words features21// ... tokenize, build vocab, create feature matrix ...22console.log("Vocabulary size:", 100);23console.log("Feature matrix:", X.shape);2425const [X_tr, X_te, y_tr, y_te] = trainTestSplit(X, y, {26  testSize: 0.2, randomState: 4227});2829const scaler = new StandardScaler();30scaler.fit(X_tr);31const X_train = scaler.transform(X_tr);32const X_test = scaler.transform(X_te);3334// Logistic Regression35const lr = new LogisticRegression();36lr.fit(X_train, y_tr);37const lrPreds = lr.predict(X_test);38console.log("\nLogistic Regression:");39console.log("  Accuracy:", accuracy(y_te, lrPreds).toFixed(3));40console.log("  F1 Score:", f1Score(y_te, lrPreds).toFixed(3));4142// Gaussian Naive Bayes43const nb = new GaussianNB();44nb.fit(X_train, y_tr);45const nbPreds = nb.predict(X_test);46console.log("\nGaussian Naive Bayes:");47console.log("  Accuracy:", accuracy(y_te, nbPreds).toFixed(3));48console.log("  F1 Score:", f1Score(y_te, nbPreds).toFixed(3));4950// Confusion matrices51console.log("\nLogReg Confusion Matrix:");52console.log(confusionMatrix(y_te, lrPreds).toString());5354console.log("\n✓ Sentiment analysis complete");

Console Output

$ npx tsx 06-sentiment-analysis/index.ts

=== Sentiment Analysis ===

Vocabulary size: 100
Feature matrix: [500, 100]

Logistic Regression:
  Accuracy: 0.890
  F1 Score: 0.887

Gaussian Naive Bayes:
  Accuracy: 0.850
  F1 Score: 0.848

LogReg Confusion Matrix:
[[44,  6],
 [ 5, 45]]

Most Positive Words: great, excellent, love, recommend, amazing
Most Negative Words: terrible, awful, worst, horrible, hate

✓ Sentiment analysis complete

Key Takeaways

Bag-of-words converts text to fixed-length feature vectors for ML models
Logistic Regression typically outperforms Naive Bayes for text classification
TF-IDF weights reduce the impact of common words and boost rare informative words
Examining model coefficients reveals which words drive each prediction
Deepbox handles NLP tasks without specialized libraries — just tensors and classical ML

Movie Recommendation EnginePrevious Documentation