Personalized Recommendation System
1. Approach
This project builds a Personalized Recommendation System using the same movie ratings survey data from Assignment 3A. Unlike the Global Baseline Estimate, which produced a single non-personalized prediction by combining global and bias statistics, this system tailors recommendations to each individual critic based on the rating patterns of similar movies.
Algorithm: Item-to-Item Collaborative Filtering
I chose Item-to-Item Collaborative Filtering implemented via the scikit-surprise library using the KNNWithMeans algorithm in item-based mode.
The core idea is that if two movies tend to receive similar ratings across critics, they are considered “neighbors.” To predict how a critic would rate an unseen movie, the model looks at how that critic rated the most similar movies they have seen, and adjusts for each movie’s average rating.
The prediction formula is:
\[\hat{r}_{u,i} = \bar{r}_i + \frac{\sum_{j \in N(i)} \text{sim}(i,j) \cdot (r_{u,j} - \bar{r}_j)}{\sum_{j \in N(i)} |\text{sim}(i,j)|}\]
Where:
- \(\hat{r}_{u,i}\) is the predicted rating for critic \(u\) on movie \(i\)
- \(\bar{r}_i\) is the mean rating of movie \(i\)
- \(N(i)\) is the set of neighbor movies (most similar to \(i\)) that critic \(u\) has rated
- \(\text{sim}(i,j)\) is the cosine similarity between movies \(i\) and \(j\)
- \(r_{u,j}\) is the critic’s actual rating for neighbor movie \(j\)
My Approach Follows a 5-Step Pipeline
Ingestion and Normalization: The wide-format Excel matrix is loaded and pivoted to long (tidy) format, with one row per critic–movie–rating triplet, matching the structure used in Assignment 3A.
Data Sanitization: Non-numeric entries (such as
?and blanks) are coerced toNAand removed, producing 61 clean ratings across 16 critics and 6 movies.Item Similarity Computation: Cosine similarity is calculated between all pairs of movies using their mean-centered rating vectors. This produces a 6×6 item similarity matrix where higher values indicate movies that tend to be rated alike.
Personalized Prediction: For each missing critic–movie pair,
KNNWithMeansidentifies the \(k\) most similar movies the critic has already rated and computes a weighted average of their mean-centered scores, adjusted back to the original scale.Output and Evaluation: The recommender outputs:
- A Top-3 recommendation list per critic (highest predicted ratings among unseen movies)
- A full predicted rating matrix for all 16 critics × 6 movies
- Performance is evaluated using 5-fold cross-validation (RMSE and MAE) and an 80/20 hold-out test split