By using an item-to-item collaborative filtering model and the recommenderlab package, this assignment will build a personalized recommender system for the data generated by my Global Baseline Estimate assignment. Through the item-to-item collaborative filtering model, top movie recommendations will be generated for each person that took the survey from the aforementioned assignment.
Code Base
First, begin by loading the necessary packages as well as the data from the Global Baseline Estimate Assignment.
library(recommenderlab)
Loading required package: Matrix
Loading required package: arules
Attaching package: 'arules'
The following objects are masked from 'package:base':
abbreviate, write
Loading required package: proxy
Attaching package: 'proxy'
The following object is masked from 'package:Matrix':
as.matrix
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::expand() masks Matrix::expand()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ tidyr::pack() masks Matrix::pack()
✖ dplyr::recode() masks arules::recode()
✖ tidyr::unpack() masks Matrix::unpack()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <-read.csv("ghibli_ratings_augmented.csv")
The data then has to be converted into a rating matrix, which is needed for recommenderlab (not a data frame).
#Set the critic names as rownames so that the matrix does not try to include them. mat <- df %>%column_to_rownames("Critic") %>%as.matrix()#Ratings of 0 set to NAmat[mat ==0] <-NArating_matrix <-as(mat, "realRatingMatrix")
Train the item based collaborative filtering model.
The above results of the cross cross-validation conclude that:
About 32 percent of the recommended films were ones that the user actually rated well.
About 51 percent of the films that a user would like were identified by the model.
About 28 percent of films that the user would not like are being recommended.
Per user, the model gets about 31 percent correct recommendations and 64 percent incorrect recommendations.
This means that the model is not accurate, most likely attributed to the small dataset, which did not give it considerable material for precise analysis.
In general, this attribute of the data created several issues with implementing the item-to-item collaborative filtering model throughout building this assignment. Originally, the data only had five critics, and the IBCF was not able to return any recommendations because of this. So, an LLM was asked to generate more data, which was then able to generate the top recommended movie for each user. The LLM was also consulted for other recommenderlab questions such as how to turn a data frame into a recommender matrix, typo debugging, how to return the nested recommender_list results into a data frame, and how to interpret the return of the cross-validation. The LLM conversation that supplemented this assignment can be found here.
Other sources that informed this assignment: https://www.r-bloggers.com/2020/04/movie-recommendation-with-recommenderlab/ https://cran.r-project.org/web/packages/recommenderlab/recommenderlab.pdf
To verify and extend this work one might compare multiple recommendation algorithms, and then use cross-validation again to see if they provide more accurate results with the same small dataset.