For this assignment, I created a personalized recommendation system using user-to-user collaborative filtering and my data set from assignment 2A. In order to do this, I used a library called recommenderlab because it provides a User Based Collaborative Filtering model or UBCF for short. The personalized recommendation system is used to predict a movie for each user that they are likely to enjoy. Afterwards, I evaluated the personalized recommendation system using a cross validation scheme and used that to draw my final conclusions.
Reading in the data
To start off, I grabbed the data from my previous assignment and loaded it in as a tribble.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Oppenheimer Wicked Top Gun: Maverick Zootopia 2 The Housemaid
David 5 3 4 4 NA
Aaron 4 NA 5 4 4
Josh 4 4 3 3 NA
Cameron NA 3 NA 5 NA
June NA NA 3 NA 5
Captain America: Brave New World
David NA
Aaron NA
Josh NA
Cameron 4
June 3
Using recommenderlab
From there, I decided to use an existing recommender package called recommenderlab because it offered the User-based collaborative filtering (UBCF) algorithm that I was interested in.
In order to start using recommenderlab, I had to convert my wide data format to a specialized realRatingMatrix.
library(recommenderlab)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loading required package: arules
Attaching package: 'arules'
The following object is masked from 'package:dplyr':
recode
The following objects are masked from 'package:base':
abbreviate, write
Loading required package: proxy
Attaching package: 'proxy'
The following object is masked from 'package:Matrix':
as.matrix
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
5 x 6 rating matrix of class 'realRatingMatrix' with 18 ratings.
Next, I created a UBCF (User-Based Collaborative Filtering) recommender model using recommenderlab and my data from 5 users. This looks for users with similar movie tastes to David, Aaron, etc.
Recommender of type 'UBCF' for 'realRatingMatrix'
learned using 5 users.
Next, I used the recommender_model to predict the top unseen movie that each user would likely enjoy.
predictions <-predict(recommender_model, rating_matrix, n =1)prediction_list <-as(predictions, "list")names(prediction_list) <-rownames(rating_matrix)print(prediction_list)
$David
[1] "The Housemaid"
$Aaron
[1] "Wicked"
$Josh
[1] "The Housemaid"
$Cameron
[1] "The Housemaid"
$June
[1] "Oppenheimer"
Here are the results of the personalized recommendation system using a UBCF (User-Based Collaborative Filtering) model. Although it looks like the system works, I believe my data set is too small for it to work effectively since it recommends the same movie to three different users.
Evaluating the Recommender System
In order to evaluate the recommender system, I initialized a 4-fold cross validation scheme because my dataset is small with only 18 ratings. A 4-fold cross validation scheme splits the data into 4 folds or equal parts and then tests it against the other 3 folds for a total of 4 tests.
eval_scheme <-evaluationScheme(data = rating_matrix, method ="cross-validation", k =4, given =1, goodRating =4)print(eval_scheme)
Evaluation scheme with 1 items given
Method: 'cross-validation' with 4 run(s).
Good ratings: >=4.000000
Data set: 5 x 6 rating matrix of class 'realRatingMatrix' with 18 ratings.
Next, I ran the evaluation using the eval_scheme and calculated the accuracy.
The results were all NaN most likely due to the fact that the dataset was too small and there was not enough overlap between the users.
Conclusion
The personalized recommender system using a User Based Collaborative Filtering model was able to generate a prediction for each user but when evaluating the model and the error metrics, it seems like the system was not robust enough because it lacked data. In order to improve the system, I believe more data is a must in order to properly generate predictions based off of user overlap.