week_11_personalized_recommender

Author

Brandon Chanderban

Published

April 23, 2026

Introduction/Approach

The objective of this assignment is to build a personalized recommendation system using the same movie ratings dataset employed in the Week 3A Global Baseline Estimate project. Whereas the prior assignment produced non-personalized recommendations based on overall average ratings and bias terms, the present assignment calls for a recommender that generates outputs tailored to the preferences of individual users.

Therefore, to achieve the aforementioned, the personalized recommendation algorithm that will likely be implemented is user-to-user collaborative filtering. This method works by identifying users with similar rating patterns and using those similarities to estimate how a target user may score movies they have not yet rated.

Data Preparation

As with the previous assignment, the movie ratings dataset provided by Professor Catlin will be used. The dataset is arranged in a wide format, where each row represents a user and each column represents a movie, with missing values indicating unrated items.

The data will first be imported into R and reshaped into a long format using functions such as pivot_longer(), producing variables such as user, movie, and rating. The long-format data will then be converted into a user-item matrix for collaborative filtering.

Recommendation Method

The user-to-user collaborative filtering model will measure similarity between users based on the movies they have both rated. Similarity may be calculated using a metric such as Pearson correlation or cosine similarity.

Once the similarity scores are determined, the most similar users will be identified, and their ratings will be used to estimate ratings for unseen movies for a target user. The recommender output will likely take the form of a top-N list of recommended movies for each user, based on the highest predicted ratings.

Evaluation Plan

To evaluate the recommender, a portion of the ratings data will likely be held out and treated as test data. The recommender’s predicted ratings may then be compared against the actual ratings using a metric such as RMSE or MAE.

Additionally, the resulting top-N recommendations may be reviewed to determine whether they appear reasonable and personalized.

Potential Challenges

One anticipated challenge is the sparsity of the ratings matrix, since users may not have rated many of the same movies. This could make the similarity calculations less stable. Another possible challenge is the relatively small size of the ratings dataset, which may limit the effectiveness of collaborative filtering.

Code Base/Body

The first step, as with most of our analytical tasks in RStudio, will call for the loading of the required libraries. In this assignment, the tidyverse package will be used for data preparation, while the recommenderlab package will be used to construct the personalized recommender system.

Code
library(tidyverse)
library(recommenderlab)

Importing the Movie Ratings Data

Subsequently, we must import the movie ratings dataset into our working environment. As with the previous Global Baseline Estimate assignment, the dataset originates from the critic movie ratings file provided by Professor Catlin.

Code
url <- "https://raw.githubusercontent.com/bkchanderban/CUNY_SPS/refs/heads/main/DATA607/DATA607/week_3A_assignment/MovieRatings.csv"

raw_ratings <- read.csv(url, stringsAsFactors = FALSE)

glimpse(raw_ratings)
Rows: 16
Columns: 7
$ Critic         <chr> "Burton", "Charley", "Dan", "Dieudonne", "Matt", "Mauri…
$ CaptainAmerica <int> NA, 4, NA, 5, 4, 4, 4, NA, 4, 4, 5, NA, 5, 4, 4, NA
$ Deadpool       <int> NA, 5, 5, 4, NA, NA, 4, NA, 4, 3, 5, NA, 5, NA, 5, NA
$ Frozen         <int> NA, 4, NA, NA, 2, 3, 4, NA, 1, 5, 5, 4, 5, NA, 3, 5
$ JungleBook     <int> 4, 3, NA, NA, NA, 3, 2, NA, NA, 5, 5, 5, 4, NA, 3, 5
$ PitchPerfect2  <int> NA, 2, NA, NA, 2, 4, 2, NA, NA, 2, NA, NA, 4, NA, 3, NA
$ StarWarsForce  <int> 4, 3, 5, 5, 5, NA, 4, 4, 5, 3, 4, 3, 5, 4, NA, NA

At this stage, the dataset remains in its original wide format, where each row represents a critic/user, each column represents a movie, and each cell contains the rating that the given user assigned to the corresponding movie. Missing values indicate movies that were not rated by that user.

Preparing the Ratings Matrix

For collaborative filtering, the data must be placed into a user-item matrix, where rows represent users and columns represent movies. Since the imported dataset is already mostly arranged in this structure, the main preparation step entails separating the user names from the numeric movie ratings and converting the ratings portion into a matrix.

Code
ratings_matrix <- raw_ratings %>%
  column_to_rownames("Critic") %>%
  as.matrix()

ratings_matrix <- apply(ratings_matrix, 2, as.numeric)

rownames(ratings_matrix) <- raw_ratings$Critic

ratings_matrix
          CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2 StarWarsForce
Burton                NA       NA     NA          4            NA             4
Charley                4        5      4          3             2             3
Dan                   NA        5     NA         NA            NA             5
Dieudonne              5        4     NA         NA            NA             5
Matt                   4       NA      2         NA             2             5
Mauricio               4       NA      3          3             4            NA
Max                    4        4      4          2             2             4
Nathan                NA       NA     NA         NA            NA             4
Param                  4        4      1         NA            NA             5
Parshu                 4        3      5          5             2             3
Prashanth              5        5      5          5            NA             4
Shipra                NA       NA      4          5            NA             3
Sreejaya               5        5      5          4             4             5
Steve                  4       NA     NA         NA            NA             4
Vuthy                  4        5      3          3             3            NA
Xingjia               NA       NA      5          5            NA            NA

The resulting matrix now contains users as rows, movies as columns, and ratings as the matrix values. This structure is now suitable for conversion into a recommenderlab rating matrix object.

Converting to a Recommenderlab Object

The recommenderlab package requires the ratings data to be stored as a realRatingMatrix. This format is designed specifically for user-item rating data and allows recommender algorithms to be applied more directly.

Code
ratings_rrm <- as(ratings_matrix, "realRatingMatrix")

ratings_rrm
16 x 6 rating matrix of class 'realRatingMatrix' with 61 ratings.

At this stage, the movie ratings data has been converted into the required structure for building a personalized recommender system.

Building the User-Based Collaborative Filtering Model

The personalized recommendation method used in this assignment will be user-to-user collaborative filtering. This method identifies users with similar rating patterns and then uses those similarities to recommend items that the target user has not yet rated.

Code
ubcf_model <- Recommender(
  data = ratings_rrm,
  method = "UBCF",
  parameter = list(method = "Pearson", nn = 5)
)

ubcf_model
Recommender of type 'UBCF' for 'realRatingMatrix' 
learned using 16 users.

In this model, Pearson correlation was used to measure the similarity between users. The nn = 5 argument indicates that the model will consider up to five nearest neighbors when generating recommendations.

Generating Top-N Recommendations

Once the model has been built, it can be used to generate personalized recommendations. In this case, the recommender will output the top three recommended movies for each user.

Code
top_n_recommendations <- predict(
  ubcf_model,
  newdata = ratings_rrm,
  n = 3
)

top_n_recommendations
Recommendations as 'topNList' with n = 3 for 16 users. 

Initially, the recommendation output is stored in a recommenderlab-specific format. To make the results easier to inspect, the recommendations can be converted into a list.

Code
# Convert to list
recommendations_list <- as(top_n_recommendations, "list")

# Assign row names from original dataset to recommendation list
names(recommendations_list) <- rownames(ratings_matrix)

# Convert to dataframe for cleaner design
recommendations_df <- data.frame(
  Critic = names(recommendations_list),
  Recommended_Movies = sapply(recommendations_list, toString),
  row.names = NULL
)

recommendations_df
      Critic                      Recommended_Movies
1     Burton                                        
2    Charley                                        
3        Dan                                        
4  Dieudonne       JungleBook, Frozen, PitchPerfect2
5       Matt                    Deadpool, JungleBook
6   Mauricio                 StarWarsForce, Deadpool
7        Max                                        
8     Nathan                                        
9      Param               PitchPerfect2, JungleBook
10    Parshu                                        
11 Prashanth                           PitchPerfect2
12    Shipra CaptainAmerica, Deadpool, PitchPerfect2
13  Sreejaya                                        
14     Steve                                        
15     Vuthy                           StarWarsForce
16   Xingjia                                        

Evaluating the Model

To evaluate the recommender, a hold-out evaluation approach will be used. In this method, a portion of the known ratings will be withheld from the model and then predicted after training. The predicted ratings will then be compared against the withheld ratings using accuracy measures such as RMSE and MAE.

Code
set.seed(6767)

evaluation_scheme <- evaluationScheme(
  ratings_rrm,
  method = "split",
  train = 0.8,
  given = -1,
  goodRating = 4
)

ubcf_eval <- Recommender(
  getData(evaluation_scheme, "train"),
  method = "UBCF",
  parameter = list(method = "Pearson", nn = 5)
)

ubcf_predictions <- predict(
  ubcf_eval,
  getData(evaluation_scheme, "known"),
  type = "ratings"
)

accuracy <- calcPredictionAccuracy(
  ubcf_predictions,
  getData(evaluation_scheme, "unknown")
)

accuracy
     RMSE       MSE       MAE 
0.7029084 0.4940802 0.4722222 

The warning indicates that one user (user 8 in the matrix) did not have enough ratings for the hold-out evaluation. This is not unexpected, given the small and sparse nature of the dataset.

The model produced an RMSE of 0.703 and an MAE of 0.472, meaning that the predicted ratings differed from the actual held-out ratings by roughly 0.5 to 0.7 points on average. As such, the recommender shows some predictive ability, although the results should be interpreted with caution due to the limited size of the ratings dataset.

Conclusion/Interpretation of Recommendation Output

The recommendations_df table presents the personalized movie recommendations generated through user-to-user collaborative filtering. The model recommends movies that each critic has not yet rated, based on the preferences of other users with similar rating patterns.

Some critics received recommendations, such as Dieudonne being recommended JungleBook, Frozen, and PitchPerfect2, and Shipra being recommended CaptainAmerica, Deadpool, and PitchPerfect2. These results suggest that the model was able to identify comparable users and use their ratings to generate personalized suggestions.

However, several critics received no recommendations. This may be because some had already rated most or all of the available movies, leaving few unseen items to recommend. In other cases, the user may not have had enough overlapping ratings with others for the model to identify reliable similarities. This is expected given the small size and sparsity of the dataset.

The evaluation results also support this cautious interpretation. While the RMSE and MAE values suggest that the recommender had some predictive ability, the warning generated during evaluation highlights the limitations caused by limited user ratings and sparse overlap.

In conclusion, the personalized recommender system was able to generate meaningful recommendations for several users, while also highlighting the limitations of collaborative filtering when applied to a small and sparse dataset.

LLM Used