The goal of this assignment is give you practice working with accuracy and other recommender system metrics

Load Data & Goals

In this assignment we will attempt the best recommender model on the Jester dataset which is ratings of a set of jokes

library(ggplot2)
library(recommenderlab)
library(dplyr)
library(kableExtra)
data("Jester5k")

Explore Data

Exploratory visuals show that this dataset has:

dim(Jester5k)
## [1] 5000  100
# Ratings
all_ratings <- getRatings(Jester5k)
ggplot() + geom_histogram(aes(all_ratings), binwidth = 1, col="white") + labs(title = "User Ratings", x="Ratings", y="Count") 

all_ratings_norm <- getRatings(normalize(Jester5k, "Center"))
ggplot() + geom_histogram(aes(all_ratings_norm), binwidth = 1, col="white") + labs(title = "User Ratings Normalized (Center)", x="Ratings", y="Count") 

summary(all_ratings_norm)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -18.2616  -3.0169   0.4904   0.0000   3.2279  17.4594
#Users
users <- rowCounts(Jester5k)
ggplot() + geom_histogram(aes(users), binwidth = 5, col="white") + labs(title = "Jokes Rated by Users", x="Jokes Rated", y="Number of Users")

#Density 
sum(users)/(dim(Jester5k)[1]*dim(Jester5k)[2])
## [1] 0.724212

Build Models

We will simulate recommendations on 6 models all Center normalized:

eval_jokes <- evaluationScheme(data=Jester5k, method="split", train=.80, 
                               given=min(rowCounts(Jester5k))-3, goodRating = 5)
num_jokes_to_recommend <- 100

recc_algos <- list(
  IBCF_C = list(name = "IBCF", param = list(method = "Cosine", normalize = "Center")),
  IBCF_P = list(name = "IBCF", param = list(method = "Pearson", normalize = "Center")),
  UBCF_C = list(name = "UBCF", param = list(method = "Cosine", normalize = "Center")),
  UBCF_P = list(name = "UBCF", param = list(method = "Pearson", normalize = "Center")),
  m_SVD = list(name = "SVD", param = list(normalize = "Center")),
  random = list(name = "RANDOM", param=NULL))

recommendation_results <- evaluate(eval_jokes, method = recc_algos, n = c(1,5, seq(10, num_jokes_to_recommend, 10)))
## IBCF run fold/sample [model time/prediction time]
##   1  [0.49sec/0.4sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.6sec/0.33sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.04sec/8.89sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.04sec/7.12sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.15sec/0.2sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.47sec]

Model Evaluation

Looking at the Receiver Operating Characteristics (ROC) curves of all the models we see that UBCF & SVD models work fairly well and capture the most area under the ROC curve (AUC). In specific the UBCF model with Pearson differentiation barely captures the most AUC.

plot(recommendation_results, annotate = TRUE, legend = "topleft") 
title("ROC curve")

plot(recommendation_results, "prec/rec", annotate = TRUE, legend = "bottomright") 
title("Precision-recall")

Looking at the Precision-Recall chart there is more of a visible difference between these 3 models (UBCF with either Cosine or Pearson differentiation).

Precision is defined as the ratio between all the instances the model correctly predicted a good match over the total the number of matches predicted by the model.

Recall (Sensitivity) is defined as the ratio between all the instances the model correctly predicted a good match over all the matches the user would have liked.

By looking at the confusion matrix below, there is a takeoff between Precision and Recall. In our case, having a Type I error is more problematic than a Type II error. This is because Precision shows how much a recommendation can be trusted, if the model has low precision and the recommendations are not trusted then faith in the entire model is diminished. Therefore in this scenario we will prioritize minimizing the Type I error and thus choose a model that has a higher precision value

Based on the Precision-Recall chart, the model of choice is UBCF with Pearson differentiation (UBCF_P)

Serendipity, Novelty, or Diversity

It is common for recommender systems to recommend a very popular item to a large portion of it’s user base or keep recommending a specific type of item numerous times, both of which can result in low satisfaction over time. If we wanted to provide a more diverse set of recommendations we can insert some randomness into our recommendations.

If we compute the RMSE of a HYBRID model which recommends random movies 20% of the time and 80% of the time uses the UBCF model from the previous section, the difference in RMSE are quite small while giving users some diversity in their recommendations. The balance is that Type I errors would likely increase

hybrid_recc <- HybridRecommender(
Recommender(getData(eval_jokes, "train"), method = "UBCF", param = list(method = "Pearson", normalize = "Center")),
Recommender(getData(eval_jokes, "train"), method = "Random"),
weights = c(.8, .2))

UBCF <- Recommender(data = getData(eval_jokes, "train"), method = "UBCF", parameter = list(method = "Pearson", normalize = "Center", nn=40))

RMSE_UBCF <- calcPredictionAccuracy(predict(UBCF, getData(eval_jokes, "known"), type="ratings"), getData(eval_jokes, "unknown"))

RMSE_hybrid <- calcPredictionAccuracy(predict(hybrid_recc, getData(eval_jokes, "known"), type="ratings"), getData(eval_jokes, "unknown"))

df <- rbind(
  UBCF = RMSE_UBCF,
  HYBRID = RMSE_hybrid)

df
##            RMSE      MSE      MAE
## UBCF   4.407429 19.42543 3.474268
## HYBRID 4.511832 20.35663 3.558471