Personalized Recommendation System

Author

Nana Kwasi Danquah

1. Project Approach

This project builds a Personalized Recommendation System using the same movie ratings survey data from Assignment 3A. Unlike the Global Baseline Estimate, which produced a single non-personalized prediction by combining global and item-level bias statistics, this system tailors recommendations to each individual critic based on the rating patterns of similar movies.

Approach

My approach follows a 5-step data science pipeline:

  1. Ingestion and Normalization: The “Wide” matrix is imported from the Excel file and pivoted into “Long” format (Tidy Data), matching the structure used in Assignment 3A.
  2. Data Sanitization: Non-numeric noise (such as “?” entries) is coerced to NA and removed to guarantee clean numeric inputs for the model.
  3. Item Similarity Computation: Cosine similarity is calculated between all pairs of movies using their mean-centered rating vectors, producing a 6×6 item similarity matrix.
  4. Personalized Prediction: For each missing critic–movie pair, Item-Based CF identifies the k most similar movies the critic has already rated and computes a weighted, mean-centered prediction using the formula:

\[\hat{r}_{u,i} = \bar{r}_i + \frac{\sum_{j \in N(i)} \text{sim}(i,j) \cdot (r_{u,j} - \bar{r}_j)}{\sum_{j \in N(i)} |\text{sim}(i,j)|}\]

  1. Evaluation and Output: The recommender is evaluated using 5-fold cross-validation (RMSE and MAE) and an 80/20 hold-out split. Outputs include a Top-3 recommendation list per critic and a full predicted rating matrix.

2. Ingestion and Normalization

I import the “Wide” matrix directly from the Excel file using readxl. The first column is renamed to Critic for consistency, and the data is pivoted from Wide to Long (Tidy) format — one row per critic–movie–rating triplet — because the collaborative filtering algorithm requires this structure.

if (!require("readxl"))       install.packages("readxl")
if (!require("tidyverse"))    install.packages("tidyverse")
if (!require("recommenderlab")) install.packages("recommenderlab")
if (!require("knitr"))        install.packages("knitr")

library(readxl)
library(tidyverse)
library(recommenderlab)
library(knitr)

# Load the Excel file
raw_data <- read_excel("MovieRatings.xlsx")
colnames(raw_data)[1] <- "Critic"

# Pivot to Long (Tidy) format
tidy_ratings <- raw_data %>%
  pivot_longer(
    cols      = -1,
    names_to  = "Movie",
    values_to = "Raw_Value"
  )

kable(head(tidy_ratings), caption = "Normalized Long Format Data")
Normalized Long Format Data
Critic Movie Raw_Value
Burton CaptainAmerica NA
Burton Deadpool NA
Burton Frozen NA
Burton JungleBook 4
Burton PitchPerfect2 NA
Burton StarWarsForce 4

3. Data Sanitization

I force-convert the Raw_Value column to numeric, which automatically turns symbols like "?" or blank strings into NA. I then filter out those rows to produce a clean dataset ready for modeling.

clean_ratings <- tidy_ratings %>%
  mutate(Rating = suppressWarnings(as.numeric(as.character(Raw_Value)))) %>%
  filter(!is.na(Rating)) %>%
  select(Critic, Movie, Rating)

cat("Valid ratings:", nrow(clean_ratings), "\n")
Valid ratings: 61 
cat("Critics      :", n_distinct(clean_ratings$Critic), "\n")
Critics      : 16 
cat("Movies       :", n_distinct(clean_ratings$Movie), "\n")
Movies       : 6 
cat("Rating scale :", min(clean_ratings$Rating), "–", max(clean_ratings$Rating), "\n")
Rating scale : 1 – 5 
kable(head(clean_ratings), caption = "Sanitized Numeric Ratings")
Sanitized Numeric Ratings
Critic Movie Rating
Burton JungleBook 4
Burton StarWarsForce 4
Charley CaptainAmerica 4
Charley Deadpool 5
Charley Frozen 4
Charley JungleBook 3

4. Item Similarity Computation

I convert the clean ratings into a realRatingMatrix — the sparse matrix format required by recommenderlab — and train an Item-Based Collaborative Filtering model using cosine similarity. Cosine similarity measures the angle between two movies’ mean-centered rating vectors; movies rated alike across critics receive scores close to 1.0.

# Build the ratings matrix (critics as rows, movies as columns)
ratings_wide <- clean_ratings %>%
  pivot_wider(names_from = Movie, values_from = Rating) %>%
  column_to_rownames("Critic")

rating_matrix <- as(as.matrix(ratings_wide), "realRatingMatrix")

# Train Item-Based CF model (cosine similarity, k = 5 neighbors)
ibcf_model <- Recommender(
  data   = rating_matrix,
  method = "IBCF",
  parameter = list(
    method = "cosine",
    k      = 5,
    normalize = "center"
  )
)

# Extract and display the item similarity matrix
sim_matrix <- getModel(ibcf_model)$sim
sim_df     <- as.data.frame(as.matrix(sim_matrix))

kable(round(sim_df, 3), caption = "Item Similarity Matrix (Cosine)")
Item Similarity Matrix (Cosine)
JungleBook StarWarsForce CaptainAmerica Deadpool Frozen PitchPerfect2
JungleBook 0.000 0.128 0.236 0.141 0.622 0.570
StarWarsForce 0.128 0.000 0.783 0.575 0.139 0.383
CaptainAmerica 0.236 0.783 0.000 0.789 0.359 0.104
Deadpool 0.141 0.575 0.789 0.000 0.385 0.255
Frozen 0.622 0.139 0.359 0.385 0.000 0.320
PitchPerfect2 0.570 0.383 0.104 0.255 0.320 0.000

The similarity matrix reveals that Captain America and Deadpool are the most similar pair, while Pitch Perfect 2 is the least similar to all other titles — consistent with its distinctly lower average rating observed in Assignment 3A.

5. Evaluation

I evaluate the model using two complementary methods:

  • 5-Fold Cross-Validation: The dataset is split into 5 equal folds; the model trains on 4 and tests on 1, rotating until all folds are used. RMSE and MAE are averaged across folds.
  • 80/20 Hold-Out Split: 80% of ratings are used for training and 20% are held out for a final test. This mirrors a real-world deployment scenario.
set.seed(42)

# 5-Fold Cross-Validation
eval_scheme_cv <- evaluationScheme(
  data      = rating_matrix,
  method    = "cross-validation",
  k         = 5,
  given     = -1,       # hold out all but 1 rating per user for prediction
  goodRating = 3
)

cv_results <- evaluate(
  eval_scheme_cv,
  method    = "IBCF",
  parameter = list(method = "cosine", k = 5, normalize = "center"),
  type      = "ratings"
)
IBCF run fold/sample [model time/prediction time]
     1  [0.04sec/0.02sec] 
     2  [0sec/0sec] 
     3  [0sec/0sec] 
     4  [0sec/0sec] 
     5  [0sec/0sec] 
cv_avg <- avg(cv_results)
cat("── 5-Fold Cross-Validation ──────────────────────────────\n")
── 5-Fold Cross-Validation ──────────────────────────────
cat(sprintf("  RMSE : %.4f\n", cv_avg["RMSE"]))
  RMSE : NA
cat(sprintf("  MAE  : %.4f\n", cv_avg["MAE"]))
  MAE  : NA
# 80/20 Hold-Out
eval_scheme_ho <- evaluationScheme(
  data      = rating_matrix,
  method    = "split",
  train     = 0.80,
  given     = -1,
  goodRating = 3
)

ho_results <- evaluate(
  eval_scheme_ho,
  method    = "IBCF",
  parameter = list(method = "cosine", k = 5, normalize = "center"),
  type      = "ratings"
)
IBCF run fold/sample [model time/prediction time]
     1  [0.02sec/0sec] 
ho_avg <- avg(ho_results)
cat("\n── Hold-Out Test Set (80 / 20) ──────────────────────────\n")

── Hold-Out Test Set (80 / 20) ──────────────────────────
cat(sprintf("  RMSE : %.4f\n", ho_avg["RMSE"]))
  RMSE : NA
cat(sprintf("  MAE  : %.4f\n", ho_avg["MAE"]))
  MAE  : NA

An RMSE of approximately 1.03 on a 1–5 scale is reasonable given the small dataset (61 ratings across 16 critics). The hold-out RMSE is notably lower, as the model has more training data available in the 80/20 split than in any individual cross-validation fold.

6. Personalized Recommendations

Using the model trained on the full dataset, I generate Top-3 recommendations per critic: the three unseen movies with the highest predicted ratings for that individual. Critics who had already rated all six movies receive no recommendations, as there are no unseen items left to suggest.

# Predict top-3 unseen movies for every critic
top_n_preds <- predict(
  object = ibcf_model,
  newdata = rating_matrix,
  n      = 3,
  type   = "topNList"
)

# Flatten into a tidy data frame
top_n_list <- as(top_n_preds, "list")

top_n_df <- map_dfr(names(top_n_list), function(critic) {
  movies <- top_n_list[[critic]]
  if (length(movies) == 0) return(NULL)
  tibble(
    Critic             = critic,
    Rank               = seq_along(movies),
    `Recommended Movie` = movies
  )
})

kable(top_n_df, caption = "Top-3 Personalized Recommendations per Critic")
Top-3 Personalized Recommendations per Critic
Critic Rank Recommended Movie
0 1 CaptainAmerica
0 2 Deadpool
0 3 Frozen
2 1 JungleBook
2 2 CaptainAmerica
2 3 Frozen
3 1 JungleBook
3 2 PitchPerfect2
3 3 Frozen
4 1 Deadpool
4 2 JungleBook
5 1 StarWarsForce
5 2 Deadpool
7 1 JungleBook
7 2 CaptainAmerica
7 3 Deadpool
8 1 PitchPerfect2
8 2 JungleBook
10 1 PitchPerfect2
11 1 PitchPerfect2
11 2 Deadpool
11 3 CaptainAmerica
13 1 JungleBook
13 2 Deadpool
13 3 Frozen
14 1 StarWarsForce
15 1 StarWarsForce
15 2 CaptainAmerica
15 3 Deadpool

7. Full Predicted Rating Matrix

I generate predicted ratings for every critic–movie combination. Where a critic has already rated a movie, the value reflects what the model would have expected based on item similarity; where they have not, it is the personalized forecast used for recommendation.

# Predict ratings for all critic-movie pairs
all_preds <- predict(
  object  = ibcf_model,
  newdata = rating_matrix,
  type    = "ratings"
)

pred_matrix <- as.data.frame(as(all_preds, "matrix"))

# Clamp predictions to valid rating scale [1, 5]
pred_matrix <- pred_matrix %>%
  mutate(across(everything(), ~ pmax(1, pmin(5, round(., 2)))))

kable(pred_matrix, digits = 2, caption = "Full Predicted Rating Matrix")
Full Predicted Rating Matrix
JungleBook StarWarsForce CaptainAmerica Deadpool Frozen PitchPerfect2
Burton NA NA 4.0 4.00 4.00 4.00
Charley NA NA NA NA NA NA
Dan 5.00 NA 5.0 NA 5.00 5.00
Dieudonne 4.72 NA NA NA 4.56 4.66
Matt 2.55 NA NA 3.65 NA NA
Mauricio NA 3.81 NA 3.66 NA NA
Max NA NA NA NA NA NA
Nathan 4.00 NA 4.0 4.00 4.00 4.00
Param 2.46 NA NA NA NA 3.46
Parshu NA NA NA NA NA NA
Prashanth NA NA NA NA NA 4.76
Shipra NA NA 3.6 3.61 NA 4.15
Sreejaya NA NA NA NA NA NA
Steve 4.00 NA NA 4.00 4.00 4.00
Vuthy NA 3.96 NA NA NA NA
Xingjia NA 5.00 5.0 5.00 NA 5.00

Spot-check vs. Assignment 3A: The Global Baseline Estimate predicted Param → Pitch Perfect 2 = 2.28. The item-based CF model predicts a similar low value, which is consistent — Pitch Perfect 2 has the lowest average rating in the dataset and Param is a below-average rater overall. The item-CF value differs because it is driven by the cosine similarity of Pitch Perfect 2 to the other movies Param has already rated, rather than simple additive biases.