The goal of this assignment is give you practice working with accuracy and other recommender system metrics
As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data
Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity
Compare and report on any change in accuracy before and after you’ve made the change in #2
In this assignment we will attempt the best recommender model on the Jester dataset which is ratings of a set of jokes
library(ggplot2)
library(recommenderlab)
library(dplyr)
library(kableExtra)
data("Jester5k")
Exploratory visuals show that this dataset has:
dim(Jester5k)
## [1] 5000 100
# Ratings
all_ratings <- getRatings(Jester5k)
ggplot() + geom_histogram(aes(all_ratings), binwidth = 1, col="white") + labs(title = "User Ratings", x="Ratings", y="Count")
all_ratings_norm <- getRatings(normalize(Jester5k, "Center"))
ggplot() + geom_histogram(aes(all_ratings_norm), binwidth = 1, col="white") + labs(title = "User Ratings Normalized (Center)", x="Ratings", y="Count")
summary(all_ratings_norm)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -18.2616 -3.0169 0.4904 0.0000 3.2279 17.4594
#Users
users <- rowCounts(Jester5k)
ggplot() + geom_histogram(aes(users), binwidth = 5, col="white") + labs(title = "Jokes Rated by Users", x="Jokes Rated", y="Number of Users")
#Density
sum(users)/(dim(Jester5k)[1]*dim(Jester5k)[2])
## [1] 0.724212
We will simulate recommendations on 6 models all Center normalized:
eval_jokes <- evaluationScheme(data=Jester5k, method="split", train=.80,
given=min(rowCounts(Jester5k))-3, goodRating = 5)
num_jokes_to_recommend <- 100
recc_algos <- list(
IBCF_C = list(name = "IBCF", param = list(method = "Cosine", normalize = "Center")),
IBCF_P = list(name = "IBCF", param = list(method = "Pearson", normalize = "Center")),
UBCF_C = list(name = "UBCF", param = list(method = "Cosine", normalize = "Center")),
UBCF_P = list(name = "UBCF", param = list(method = "Pearson", normalize = "Center")),
m_SVD = list(name = "SVD", param = list(normalize = "Center")),
random = list(name = "RANDOM", param=NULL))
recommendation_results <- evaluate(eval_jokes, method = recc_algos, n = c(1,5, seq(10, num_jokes_to_recommend, 10)))
## IBCF run fold/sample [model time/prediction time]
## 1 [0.49sec/0.4sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.6sec/0.33sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.04sec/8.89sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.04sec/7.12sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.15sec/0.2sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0sec/0.47sec]
Looking at the Receiver Operating Characteristics (ROC) curves of all the models we see that UBCF & SVD models work fairly well and capture the most area under the ROC curve (AUC). In specific the UBCF model with Pearson differentiation barely captures the most AUC.
plot(recommendation_results, annotate = TRUE, legend = "topleft")
title("ROC curve")
plot(recommendation_results, "prec/rec", annotate = TRUE, legend = "bottomright")
title("Precision-recall")
Looking at the Precision-Recall chart there is more of a visible difference between these 3 models (UBCF with either Cosine or Pearson differentiation).
Precision is defined as the ratio between all the instances the model correctly predicted a good match over the total the number of matches predicted by the model.
Recall (Sensitivity) is defined as the ratio between all the instances the model correctly predicted a good match over all the matches the user would have liked.
By looking at the confusion matrix below, there is a takeoff between Precision and Recall. In our case, having a Type I error is more problematic than a Type II error. This is because Precision shows how much a recommendation can be trusted, if the model has low precision and the recommendations are not trusted then faith in the entire model is diminished. Therefore in this scenario we will prioritize minimizing the Type I error and thus choose a model that has a higher precision value
Based on the Precision-Recall chart, the model of choice is UBCF with Pearson differentiation (UBCF_P)
It is common for recommender systems to recommend a very popular item to a large portion of it’s user base or keep recommending a specific type of item numerous times, both of which can result in low satisfaction over time. If we wanted to provide a more diverse set of recommendations we can insert some randomness into our recommendations.
If we compute the RMSE of a HYBRID model which recommends random movies 20% of the time and 80% of the time uses the UBCF model from the previous section, the difference in RMSE are quite small while giving users some diversity in their recommendations. The balance is that Type I errors would likely increase
hybrid_recc <- HybridRecommender(
Recommender(getData(eval_jokes, "train"), method = "UBCF", param = list(method = "Pearson", normalize = "Center")),
Recommender(getData(eval_jokes, "train"), method = "Random"),
weights = c(.8, .2))
UBCF <- Recommender(data = getData(eval_jokes, "train"), method = "UBCF", parameter = list(method = "Pearson", normalize = "Center", nn=40))
RMSE_UBCF <- calcPredictionAccuracy(predict(UBCF, getData(eval_jokes, "known"), type="ratings"), getData(eval_jokes, "unknown"))
RMSE_hybrid <- calcPredictionAccuracy(predict(hybrid_recc, getData(eval_jokes, "known"), type="ratings"), getData(eval_jokes, "unknown"))
df <- rbind(
UBCF = RMSE_UBCF,
HYBRID = RMSE_hybrid)
df
## RMSE MSE MAE
## UBCF 4.407429 19.42543 3.474268
## HYBRID 4.511832 20.35663 3.558471