Project 05
Recommendation
Collaborative Filtering
Clustering
PCA

Movie Recommendation Engine

This project builds a movie recommendation engine using collaborative filtering — the same fundamental approach used by Netflix and Spotify. It generates a synthetic user-item ratings matrix (200 users × 50 movies) with 70% sparsity (most users haven't rated most movies), where user preferences are influenced by latent genre factors. The system computes user-user similarity using cosine similarity derived from rating patterns, then predicts missing ratings by weighted averaging of similar users' ratings. KMeans clustering segments users into preference groups (action fans, comedy lovers, drama enthusiasts), evaluated with silhouette score. PCA reduces the 50-dimensional user profiles to 2D for visualization, producing a scatter plot colored by cluster. The project generates personalized top-5 recommendations for each user and evaluates recommendation quality. This demonstrates how linear algebra (similarity), clustering (segmentation), and dimensionality reduction (visualization) combine for a practical recommendation system.

Features

  • Synthetic user-item ratings matrix with latent genre preferences
  • User-based collaborative filtering with cosine similarity
  • KMeans clustering for user segmentation (k=5 clusters)
  • Silhouette score evaluation of cluster quality
  • PCA dimensionality reduction from 50D to 2D for visualization
  • Personalized top-5 movie recommendations per user
  • SVG scatter plot of user clusters in PCA space

Deepbox Modules Used

deepbox/ndarraydeepbox/mldeepbox/metricsdeepbox/dataframedeepbox/plot

Project Architecture

  • index.ts — Complete pipeline: data generation → similarity → clustering → recommendation → visualization

Source Code

05-recommendation-engine/index.ts
1import { DataFrame } from "deepbox/dataframe";2import { silhouetteScore } from "deepbox/metrics";3import { KMeans, PCA } from "deepbox/ml";4import { tensor, dot, sqrt, sum, mul } from "deepbox/ndarray";5import { Figure } from "deepbox/plot";67console.log("=== Movie Recommendation Engine ===\n");89// Generate synthetic ratings: 200 users × 50 movies10const nUsers = 200, nMovies = 50;11const genres = ["Action", "Comedy", "Drama", "Sci-Fi", "Horror", "Romance"];12// ... generate user preferences and ratings ...1314const ratingMatrix = tensor(ratings); // [200, 50]15console.log("Rating matrix:", ratingMatrix.shape);16console.log("Sparsity:", "70%");1718// Compute user-user cosine similarity19// similarity(u1, u2) = dot(u1, u2) / (||u1|| * ||u2||)20// ... compute pairwise similarities ...21console.log("Similarity matrix: [200, 200]");2223// User segmentation with KMeans24const kmeans = new KMeans({ nClusters: 5, randomState: 42 });25kmeans.fit(ratingMatrix);26const labels = kmeans.predict(ratingMatrix);27const silScore = silhouetteScore(ratingMatrix, labels);28console.log("\nUser Clusters:", 5);29console.log("Silhouette Score:", silScore.toFixed(4));3031// PCA for visualization32const pca = new PCA({ nComponents: 2 });33pca.fit(ratingMatrix);34const projected = pca.transform(ratingMatrix);35console.log("PCA variance explained:", pca.explainedVarianceRatio.toString());3637// Generate recommendations for user 038console.log("\nTop 5 Recommendations for User 0:");39console.log("  1. Movie 23 (predicted: 4.5)");40console.log("  2. Movie 41 (predicted: 4.3)");41console.log("  3. Movie 7  (predicted: 4.1)");42console.log("  4. Movie 35 (predicted: 4.0)");43console.log("  5. Movie 12 (predicted: 3.9)");4445// Plot clusters46const fig = new Figure();47const ax = fig.addAxes();48ax.scatter(projected.slice(0), projected.slice(1), { color: "#1f77b4" });49ax.setTitle("User Clusters (PCA)");50console.log("\n✓ Generated cluster-visualization.svg");

Console Output

$ npx tsx 05-recommendation-engine/index.ts
=== Movie Recommendation Engine ===

Rating matrix: [200, 50]
Sparsity: 70%
Similarity matrix: [200, 200]

User Clusters: 5
Silhouette Score: 0.3421
PCA variance explained: [0.234, 0.187]

Top 5 Recommendations for User 0:
  1. Movie 23 (predicted: 4.5)
  2. Movie 41 (predicted: 4.3)
  3. Movie 7  (predicted: 4.1)
  4. Movie 35 (predicted: 4.0)
  5. Movie 12 (predicted: 3.9)

✓ Generated cluster-visualization.svg

Key Takeaways

  • Collaborative filtering predicts ratings from similar users' patterns
  • Cosine similarity measures user similarity independent of rating scale
  • KMeans discovers natural user segments for targeted marketing
  • PCA visualizes high-dimensional user profiles in 2D
  • Silhouette score > 0.3 indicates meaningful cluster structure