Project 4 - Accuracy and Beyond

The goal of this assignment is give you practice working with accuracy and other recommender system metrics

As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data
Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity
Compare and report on any change in accuracy before and after you’ve made the change in #2

Load Data & Goals

In this assignment we will attempt the best recommender model on the Jester dataset which is ratings of a set of jokes

library(ggplot2)
library(recommenderlab)
library(dplyr)
library(kableExtra)
data("Jester5k")

Explore Data

Exploratory visuals show that this dataset has:

5000 users and 100 jokes
Ratings are between -10 and +10, with a rating distribution leaning towards positive ratings
The ratings are not normally distributed, therefore normalization of the ratings will be beneficial
Most users have rated all 100 jokes making this dataset fairly dense with about 72% of the jokes rated across all 5000 users

dim(Jester5k)

## [1] 5000  100

# Ratings
all_ratings <- getRatings(Jester5k)
ggplot() + geom_histogram(aes(all_ratings), binwidth = 1, col="white") + labs(title = "User Ratings", x="Ratings", y="Count") 

all_ratings_norm <- getRatings(normalize(Jester5k, "Center"))
ggplot() + geom_histogram(aes(all_ratings_norm), binwidth = 1, col="white") + labs(title = "User Ratings Normalized (Center)", x="Ratings", y="Count") 

summary(all_ratings_norm)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -18.2616  -3.0169   0.4904   0.0000   3.2279  17.4594

#Users
users <- rowCounts(Jester5k)
ggplot() + geom_histogram(aes(users), binwidth = 5, col="white") + labs(title = "Jokes Rated by Users", x="Jokes Rated", y="Number of Users")

#Density 
sum(users)/(dim(Jester5k)[1]*dim(Jester5k)[2])

## [1] 0.724212

Build Models

We will simulate recommendations on 6 models all Center normalized:

Item Based Content Filtering, w Cosine difference
Item Based Content Filtering, w Pearson difference
User Based Content Filtering, w Cosine difference
User Based Content Filtering, w Pearson difference
Singular Value Decomposition
Random recommendations to simulate baseline

eval_jokes <- evaluationScheme(data=Jester5k, method="split", train=.80, 
                               given=min(rowCounts(Jester5k))-3, goodRating = 5)
num_jokes_to_recommend <- 100

recc_algos <- list(
  IBCF_C = list(name = "IBCF", param = list(method = "Cosine", normalize = "Center")),
  IBCF_P = list(name = "IBCF", param = list(method = "Pearson", normalize = "Center")),
  UBCF_C = list(name = "UBCF", param = list(method = "Cosine", normalize = "Center")),
  UBCF_P = list(name = "UBCF", param = list(method = "Pearson", normalize = "Center")),
  m_SVD = list(name = "SVD", param = list(normalize = "Center")),
  random = list(name = "RANDOM", param=NULL))

recommendation_results <- evaluate(eval_jokes, method = recc_algos, n = c(1,5, seq(10, num_jokes_to_recommend, 10)))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.49sec/0.4sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.6sec/0.33sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.04sec/8.89sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.04sec/7.12sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.15sec/0.2sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.47sec]

Model Evaluation

Looking at the Receiver Operating Characteristics (ROC) curves of all the models we see that UBCF & SVD models work fairly well and capture the most area under the ROC curve (AUC). In specific the UBCF model with Pearson differentiation barely captures the most AUC.

plot(recommendation_results, annotate = TRUE, legend = "topleft") 
title("ROC curve")

plot(recommendation_results, "prec/rec", annotate = TRUE, legend = "bottomright") 
title("Precision-recall")

Looking at the Precision-Recall chart there is more of a visible difference between these 3 models (UBCF with either Cosine or Pearson differentiation).

Precision is defined as the ratio between all the instances the model correctly predicted a good match over the total the number of matches predicted by the model.

Recall (Sensitivity) is defined as the ratio between all the instances the model correctly predicted a good match over all the matches the user would have liked.

By looking at the confusion matrix below, there is a takeoff between Precision and Recall. In our case, having a Type I error is more problematic than a Type II error. This is because Precision shows how much a recommendation can be trusted, if the model has low precision and the recommendations are not trusted then faith in the entire model is diminished. Therefore in this scenario we will prioritize minimizing the Type I error and thus choose a model that has a higher precision value

Based on the Precision-Recall chart, the model of choice is UBCF with Pearson differentiation (UBCF_P)

Serendipity, Novelty, or Diversity

It is common for recommender systems to recommend a very popular item to a large portion of it’s user base or keep recommending a specific type of item numerous times, both of which can result in low satisfaction over time. If we wanted to provide a more diverse set of recommendations we can insert some randomness into our recommendations.

If we compute the RMSE of a HYBRID model which recommends random movies 20% of the time and 80% of the time uses the UBCF model from the previous section, the difference in RMSE are quite small while giving users some diversity in their recommendations. The balance is that Type I errors would likely increase

hybrid_recc <- HybridRecommender(
Recommender(getData(eval_jokes, "train"), method = "UBCF", param = list(method = "Pearson", normalize = "Center")),
Recommender(getData(eval_jokes, "train"), method = "Random"),
weights = c(.8, .2))

UBCF <- Recommender(data = getData(eval_jokes, "train"), method = "UBCF", parameter = list(method = "Pearson", normalize = "Center", nn=40))

RMSE_UBCF <- calcPredictionAccuracy(predict(UBCF, getData(eval_jokes, "known"), type="ratings"), getData(eval_jokes, "unknown"))

RMSE_hybrid <- calcPredictionAccuracy(predict(hybrid_recc, getData(eval_jokes, "known"), type="ratings"), getData(eval_jokes, "unknown"))

df <- rbind(
  UBCF = RMSE_UBCF,
  HYBRID = RMSE_hybrid)

df

##            RMSE      MSE      MAE
## UBCF   4.407429 19.42543 3.474268
## HYBRID 4.511832 20.35663 3.558471

References

Data Science and Machine Learning