Assignment 11: Recomender Systems

Author

Emily El Mouaquite

Approach

By using an item-to-item collaborative filtering model and the recommenderlab package, this assignment will build a personalized recommender system for the data generated by my Global Baseline Estimate assignment. Through the item-to-item collaborative filtering model, top movie recommendations will be generated for each person that took the survey from the aforementioned assignment.

Code Base

First, begin by loading the necessary packages as well as the data from the Global Baseline Estimate Assignment.

library(recommenderlab)

Loading required package: Matrix

Loading required package: arules


Attaching package: 'arules'

The following objects are masked from 'package:base':

    abbreviate, write

Loading required package: proxy


Attaching package: 'proxy'

The following object is masked from 'package:Matrix':

    as.matrix

The following objects are masked from 'package:stats':

    as.dist, dist

The following object is masked from 'package:base':

    as.matrix

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::expand() masks Matrix::expand()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ tidyr::pack()   masks Matrix::pack()
✖ dplyr::recode() masks arules::recode()
✖ tidyr::unpack() masks Matrix::unpack()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

df <- read.csv("ghibli_ratings_augmented.csv")

The data then has to be converted into a rating matrix, which is needed for recommenderlab (not a data frame).

#Set the critic names as rownames so that the matrix does not try to include them. 
mat <- df %>%
  column_to_rownames("Critic") %>%
  as.matrix()
#Ratings of 0 set to NA
mat[mat == 0] <- NA

rating_matrix <- as(mat, "realRatingMatrix")

Train the item based collaborative filtering model.

recommender_model <- Recommender(
  rating_matrix,
  method = "IBCF", #item-based collaborative filtering
  parameter = list(k = 3)
)

Generate the top movie recommendation for each user.

top_movies <- 1

predictions <- predict(
  recommender_model,
  rating_matrix,
  n = top_movies
)

recommended_list <- as(predictions, "list")
names(recommended_list) <- df$Critic
recommended_list

$Ally
[1] "Ponyo"

$Barbara
[1] "Pom.Poko"

$Martin
[1] "Ponyo"

$Lucy
[1] "Pom.Poko"

$Amal
character(0)

$Sophie
[1] "Ponyo"

$Howl
[1] "Kiki.s.Delivery.Service"

$Calcifer
[1] "Howls.Moving.Castle"

$Nausicaa
[1] "Howls.Moving.Castle"

$Sheeta
[1] "Howls.Moving.Castle"

$Pazu
[1] "Howls.Moving.Castle"

$Satsuki
[1] "Spirited.Away"

$Mei
character(0)

$Totoro
[1] "Ponyo"

$Kiki
[1] "Princess.Mononoke"

$Jiji
character(0)

$Chihiro
[1] "Spirited.Away"

$Haku
[1] "Spirited.Away"

$San
[1] "Ponyo"

$Ashitaka
[1] "Ponyo"

$Ponyo2
character(0)

$Sosuke
[1] "Ponyo"

$Arrietty
[1] "Ponyo"

$Shawn
[1] "Princess.Mononoke"

$Marnie
[1] "Pom.Poko"

$Anna
character(0)

$Taeko
[1] "Spirited.Away"

$Seita
[1] "Howls.Moving.Castle"

$Setsuko
[1] "Kiki.s.Delivery.Service"

$Pazu2
[1] "Princess.Mononoke"

$Yubaba
[1] "Kiki.s.Delivery.Service"

$Lin
[1] "Spirited.Away"

$Zeniba
[1] "Kiki.s.Delivery.Service"

$Heen
[1] "Kiki.s.Delivery.Service"

$Kamaji
[1] "Pom.Poko"

Turn the nested list result, recommended_list, back into a dataframe.

df_recommendations <- recommended_list %>%
  enframe(name = "Critic", value = "Film") %>%
  unnest(cols = Film)
df_recommendations

# A tibble: 30 × 2
   Critic   Film                   
   <chr>    <chr>                  
 1 Ally     Ponyo                  
 2 Barbara  Pom.Poko               
 3 Martin   Ponyo                  
 4 Lucy     Pom.Poko               
 5 Sophie   Ponyo                  
 6 Howl     Kiki.s.Delivery.Service
 7 Calcifer Howls.Moving.Castle    
 8 Nausicaa Howls.Moving.Castle    
 9 Sheeta   Howls.Moving.Castle    
10 Pazu     Howls.Moving.Castle    
# ℹ 20 more rows

Validate the recommender results using cross-validation.

set.seed(123)

cv <- evaluationScheme(
  rating_matrix,
  method = "cross-validation",
  k = 5,          
  given = 3,      
  goodRating = 4
)

Warning in .local(data, ...): Dropping these users from the evaluation since they have fewer rating than specified in given!
These users are 1, 35

results_cv <- evaluate(
  cv,
  method = "IBCF",
  n = c(1)
)

IBCF run fold/sample [model time/prediction time]
     1  [0.012sec/0.012sec] 
     2  [0sec/0.001sec] 
     3  [0.001sec/0.001sec] 
     4  [0.001sec/0.001sec] 
     5  [0.001sec/0.001sec]

avg(results_cv)

            TP        FP        FN       TN        N precision recall  TPR
[1,] 0.3111111 0.6444444 0.3333333 1.888889 3.177778 0.3238095   0.51 0.51
           FPR n
[1,] 0.2789418 1

Conclusion

The above results of the cross cross-validation conclude that:

About 32 percent of the recommended films were ones that the user actually rated well.
About 51 percent of the films that a user would like were identified by the model.
About 28 percent of films that the user would not like are being recommended.
Per user, the model gets about 31 percent correct recommendations and 64 percent incorrect recommendations.

This means that the model is not accurate, most likely attributed to the small dataset, which did not give it considerable material for precise analysis.

In general, this attribute of the data created several issues with implementing the item-to-item collaborative filtering model throughout building this assignment. Originally, the data only had five critics, and the IBCF was not able to return any recommendations because of this. So, an LLM was asked to generate more data, which was then able to generate the top recommended movie for each user. The LLM was also consulted for other recommenderlab questions such as how to turn a data frame into a recommender matrix, typo debugging, how to return the nested recommender_list results into a data frame, and how to interpret the return of the cross-validation. The LLM conversation that supplemented this assignment can be found here.

Other sources that informed this assignment: https://www.r-bloggers.com/2020/04/movie-recommendation-with-recommenderlab/ https://cran.r-project.org/web/packages/recommenderlab/recommenderlab.pdf

To verify and extend this work one might compare multiple recommendation algorithms, and then use cross-validation again to see if they provide more accurate results with the same small dataset.