Compute pairwise similarity between users using existing ratings. Use user-user collaborative filtering. I need to compile the data and do similarity tests. This is a personalized recommendation system so its heavily dependent on user data and input.
Code Base
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(recommenderlab)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loading required package: arules
Attaching package: 'arules'
The following object is masked from 'package:dplyr':
recode
The following objects are masked from 'package:base':
abbreviate, write
Loading required package: proxy
Attaching package: 'proxy'
The following object is masked from 'package:Matrix':
as.matrix
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
UBCF run fold/sample [model time/prediction time]
1 [0.001sec/0.018sec]
cat("RMSE / MSE / MAE:\n")
RMSE / MSE / MAE:
print(avg(eval_results))
RMSE MSE MAE
[1,] 1.409958 1.987982 1.307616
Conclusion
This project built a User-Based Collaborative Filtering recommender using the reccomenderlab package in R. The model computes cosine similarity between users on their mean-centered ratings, finds the 3 most similar neighbors, and predicts ratings for unseen movies — outputting top-3 personalized recommendations per user. Most users received recommendations successfully. Some returned character(0) due to too few overlapping ratings with neighbors or having rated/watched all the movies in the list, a common limitation with sparse data. Evaluation via an 80/20 split produced an RMSE of 1.41 and MAE of 1.31 — predictions were off by about 1.3 stars on average. This is higher than ideal but expected given only 16 users and a heavily sparse matrix. A larger dataset would substantially improve performance.