In a previous assignment we assigned the task of creating a non-personalized recommendations using a Global Baseline Estimate. In that assignment, I used synthetic data generated by Google Gemini’s LLM, to create a dataset of 75 unique users with Profile Id’s and a 1-5 rating for 6 films they may have seen.
I will use this survey data to create a personalized recommendation algorithm of my choice. My initial approach to this task is to use a User-Based Collaborative Filtering (UBCF) through a package such as recommenderlab, found in R. I chose this package due to the supposed ease of start due to my inexperience and the time line in which this deliverable must be created, tested and submitted. I chose UBCF over Content-Based Filtering due to the Survey data not containing data such as genre, directors, etc; this led to the rejection of an item based recommendation algorithm.
Planned Implementation:
Since recommender algorithms use linear algebra across matrices and not long-format data, I will need to convert the data into a sparse matrix. The package I am planning to use, uses the object, realRatingMatrix. To get the data into this format, I will need to isolate the ratings that are input as ‘1’, ‘2’, ‘3’, ‘4’ or ‘5’. The data (movies) that is missing will be the unrated data and cannot be passed as NA values.
The beginning steps will be the simplest to implement, importing the CSV, cleaning out NAs, subsetting the data frame to specific columns, and then using coercion. A bit more difficult: I will train the UBCF model to identify other users with a similar rating history and then use their scores to estimate what the target user would think of movies they have not watched yet.
My recommender will output predicted ratings for all of the un-rated movies. Though the assignment calls for a ranking of the top 5 Since there are only 6 movies total, I will set it to predict the magnitude of the preference/5 (ex. 3.5/5 or 2/5).
To evaluate the model, I am planning to use K-fold cross-validation (I still need to do more research on this as it is new-to-me). This will them measure the Root Mean Square. This withholds a portion of the actual user ratings and compares them to the model’s predictions, which will allow me to objectively quantify its accuracy.
Anticipated Challenges:
Some of the challenges I anticipate during this assignment are due to the very small size of my data. Due to this, I will have to focus on the rating prediction accuracy. There are also a lot of missing ratings, which may damper my attempts at collaborative filtering. Cold starting was a challenge in the original Global Baseline Estimate and may return in this project. Evaluation sensitivity may also be a challenge as well. Finally, due to my lack of experience in recommender systems, I will have to do a lot of research and learn how to successfully complete the assignment within the timeframe of the deliverable.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(recommenderlab)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loading required package: arules
Attaching package: 'arules'
The following object is masked from 'package:dplyr':
recode
The following objects are masked from 'package:base':
abbreviate, write
Loading required package: proxy
Attaching package: 'proxy'
The following object is masked from 'package:Matrix':
as.matrix
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
Rows: 450 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): profile_name, movie_title
dbl (2): account_id, rating
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
rating_matrix <-as(as.data.frame(tidy_movie_survey), "realRatingMatrix")#unlike the tutorial in base R that I followed (for an example of how to manually do this), I changed given to 1 instead of 12 to prevent my algorithm from failingeval_scheme <-evaluationScheme(rating_matrix, method ="cross-validation", k =5, given =1, goodRating =3)#I decided to define both of my algorithms earlier in my codebase rather than doing that one at a time algorithms <-list("User-Based CF"=list(name ="UBCF", param =list(method ="Cosine")), "Item-Based CF"=list(nane ="IBCF", param =list(method ="Cosine")))#Here I run the evaluations model_results <-evaluate(eval_scheme, algorithms, type ="ratings")
#Here I am extracting the results of the S4 list and converting the confusion matrices to tibbles before I bind them together using purrrrmse_comparison <-map_dfr(model_results, ~as_tibble(getConfusionMatrix(.x)[[1]]), .id="Algorithm") %>%select(Algorithm, RMSE, MSE, MAE) %>%arrange(RMSE) #See which algorithm performed better#tibbleprint(rmse_comparison)
Here I visualize the error rates for my item-based and user-based models. Ideally, the models perform better when their scores are closer to 0. As we can see here, my item baed model returned with an error of 2.03 and my user-based model returned with an error rate of 2.08. Due to this, I will make adjustments to my model to bring the error rate hopefully to a lower range. However, it is important to understand that while a “good” error rate is often between 0 and 1, we do not want to push the model to an exact score of 0. This is often the source of overfitting which can lead to it’s future predictions failing often or becoming “way off base”. In our case, as our models have error rates of >= 2, the model is often overestimating that a user likes a movie more than they do; for example, for a user who rated a movie with a score of 1, it is estimating that they rated it a ~2-3.
Part of the difficulty in cultivating an accurate model rating on the first try, it how small my dataset is. I have only 75 users and a very limited base of only 6 movies. We must keep in mind that the Recommenderlab is geared towards datasets with greater amounts of data. My UBCF default is nn = 25, which means that the algorithm tried to find the 25 “nearest” neighbors” in a dataset that only has… 75 users. In this case, I likely need to cut the amount of users that the model is evaluating down to 5 or 10 users to increase the accuracy. I am experiencing the same difficulty with my IBCF model due to the very limited number of items I am providing the model with. To increase the accuracy, I will change k = 30 tok = 2 or 3; I am leaning more towards 2. A third difficulty, be due to my using the Cosine similarity which ignores user bias; I will switch to the Pearson Correlation instead.
In a search to understand and demonstrate the difference that these individual changes make, I will create a list with the changes and shunt them into algorithms before I rerun my evaluation and then extract and plot. Since a lot of this code is repeating with minute changes (mostly the moethod and numbers), I had Gemini generate it to save time on my end.
# Define a "Grid Search" of different algorithms and parametersalgorithms <-list(# User-Based CF tests (tuning 'nn' - number of neighbors)"UBCF (Cosine, nn=25)"=list(name ="UBCF", param =list(method ="Cosine", nn =25)),"UBCF (Pearson, nn=25)"=list(name ="UBCF", param =list(method ="Pearson", nn =25)),"UBCF (Pearson, nn=10)"=list(name ="UBCF", param =list(method ="Pearson", nn =10)),"UBCF (Pearson, nn=5)"=list(name ="UBCF", param =list(method ="Pearson", nn =5)),# Item-Based CF tests (tuning 'k' - number of items to compare)# Notice 'k' is strictly kept under your 6-movie limit"IBCF (Cosine, k=3)"=list(name ="IBCF", param =list(method ="Cosine", k =3)),"IBCF (Pearson, k=3)"=list(name ="IBCF", param =list(method ="Pearson", k =3)),"IBCF (Pearson, k=2)"=list(name ="IBCF", param =list(method ="Pearson", k =2)))# Rerun the evaluationmodel_results <-evaluate(eval_scheme, algorithms, type ="ratings")
# grouping column for clean color-codingrmse_plot_data <- rmse_comparison %>%mutate(Model_Family =ifelse(str_detect(Algorithm, "UBCF"), "User-Based CF", "Item-Based CF"))# Generate the optimized plotggplot(rmse_plot_data, aes(x =reorder(Algorithm, RMSE), y = RMSE, fill = Model_Family)) +geom_col(color ="black", alpha =0.9) +geom_text(aes(label =round(RMSE, 3)), vjust =-0.5, fontface ="bold", size =4) +scale_fill_manual(values =c("User-Based CF"="#2c3e50", "Item-Based CF"="#e74c3c")) +theme_minimal() +labs(title ="Hyperparameter Tuning: Algorithm RMSE Comparison",subtitle ="Evaluating similarity metrics and neighborhood constraints on a 6-item catalog.",x ="Algorithm & Parameters",y ="Root Mean Square Error (RMSE)",fill ="Algorithm Family" ) +theme(axis.text.x =element_text(angle =45, hjust =1, face ="bold"),plot.title =element_text(face ="bold", size =14),panel.grid.major.x =element_blank() # Cleans up vertical background lines )
Looking at the visualizations above, we notice that using Pearson with a nearest neighbor of 10, provides us with the lowest UBCF score 1.658, while using Cosine with a k of 3 gives us a score of 1.943. What does this mean? To translate, we can say that by restricting the neighborhood size and k, the model was successfully able to better filter out the users and items with greater “noise”, which led to the error rate overall being reduced.
Tasks for the future
Future iterations of this assignment will include expanding the item and user base to provide better nearest-neighbor calculations. While I only had one cold-start problem in my original dataset, with a larger item and user base, it is more likely to occur and can be reduced. I will also add more information regarding the genres, release dates, and other item data to allow me to transition to a hybrid recommender system similar to the ones seen at companies such as Netflic, Audible, etc. albeit at a much smaller scale. This will allow me to use Content-based filtering.
Citations
Google DeepMind. (2026). Gemini 3 Pro [Large Language Model]. https://gemini,google.com. Acessed April 26, 2026.