In another use-case of recommendation systems in R, the following will:
We will be designing a simple recommender system on the Jester5K data set, https://grouplens.org/datasets/jester/. The Jester data set is “The data set contains a sample of 5000 users from the anonymous ratings data from the Jester Online Joke Recommender System collected between April 1999 and May 2003.” A user collaborative filtering approaches will be analyzed and compared using evaluation metrics, and a preferred option will be selected and optimized. A method for introducing variance, or “diversity”, will also be evaluated and the Root Mean Square Errors of each solution will be calculated and contrasted.
set.seed(222)
# Libraries
library(recommenderlab)
library(ggplot2)## Warning: package 'ggplot2' was built under R version 3.4.1
# Data loading
data(Jester5k)A summary of the data shows a 5000 by 100 matrix, users and items respectively, with a ratings range of approximately -9.95 to +9.90. There are no missing values. It is a relatively dense matrix. The data is constrained between -10 and +10. There is a sparsity of 0.72, which is relatively dense. The majority of reviews are 0, with a preference for positive reviews compared to negative. There is a near normal distribution for reviews, with a median around 1.
#size
dim(Jester5k)## [1] 5000 100
#summary
jDF <- as(Jester5k, 'data.frame')
jDF$user <- as.numeric(jDF$user)
jDF$item <- as.numeric(jDF$item)
summary(jDF)## user item rating
## Min. : 1 Min. : 1.00 Min. :-9.95
## 1st Qu.:1263 1st Qu.: 21.00 1st Qu.:-3.06
## Median :2495 Median : 43.00 Median : 1.46
## Mean :2503 Mean : 43.79 Mean : 0.85
## 3rd Qu.:3749 3rd Qu.: 63.00 3rd Qu.: 5.10
## Max. :5000 Max. :100.00 Max. : 9.90
# number of ratings
nratings(Jester5k)## [1] 362106
vector_ratings <- as.vector(Jester5k@data)
head(as(Jester5k,"matrix")[,1:10])## j1 j2 j3 j4 j5 j6 j7 j8 j9 j10
## u2841 7.91 9.17 5.34 8.16 -8.74 7.14 8.88 -8.25 5.87 6.21
## u15547 -3.20 -3.50 -9.56 -8.74 -6.36 -3.30 0.78 2.18 -8.40 -8.79
## u15221 -1.70 1.21 1.55 2.77 5.58 3.06 2.72 -4.66 4.51 -3.06
## u15573 -7.38 -8.93 -3.88 -7.23 -4.90 4.13 2.57 3.83 4.37 3.16
## u21505 0.10 4.17 4.90 1.55 5.53 1.50 -3.79 1.94 3.59 4.81
## u15994 0.83 -4.90 0.68 -7.18 0.34 -4.32 -6.17 6.12 -5.58 5.44
#check sparsity
sparsity <- function(ratings){
nratings(ratings) / (dim(ratings)[1] * dim(ratings)[2])
}
sparsity(Jester5k)## [1] 0.724212
#Distribution
qplot(vector_ratings) + ggtitle("Distribution of the ratings")## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#Average ratings
average_ratings <- colMeans(Jester5k)
qplot(average_ratings) + stat_bin(binwidth = 0.1) + ggtitle("Distribution of the average joke rating")## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#constrain / normalize
Jester5k@data@x[Jester5k@data@x[] < -10] <- -10
Jester5k@data@x[Jester5k@data@x[] > 10] <- 10
# worst joke
worst <- which.min(colMeans(Jester5k))
cat(JesterJokes[worst])## How many teddybears does it take to change a lightbulb? It takes only one teddybear, but it takes a whole lot of lightbulbs.
# Keeping only jokes with more than 80 ratings and users with more than twenty rating
Jester5k_r <- Jester5k[rowCounts(Jester5k) > 80, colCounts(Jester5k) > 20]e <- evaluationScheme(Jester5k_r, method = "split",train = 0.8, given = 30, goodRating = 3, k=5)
e## Evaluation scheme with 30 items given
## Method: 'split' with 5 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 1701 x 100 rating matrix of class 'realRatingMatrix' with 167062 ratings.
The model selection list includes user similarities, with comparisons made between cosine/pearson, and the three normalization options NULL, center, and z-score. A Precision-recall and ROC curve were visualized as the means to select the most effective method. These following model parameters were arbitrarily selected due to time and visual analysis limitations.
models <- list(
UBCF_cos_null = list(name = "UBCF", param = list(method = "cosine", normalize = NULL)),
UBCF_prs_null = list(name = "UBCF", param = list(method = "pearson", normalize = NULL)),
UBCF_cos_center = list(name = "UBCF", param = list(method = "cosine", normalize = "center")),
UBCF_prs_center = list(name = "UBCF", param = list(method = "pearson", normalize = "center")),
UBCF_cos_z = list(name = "UBCF", param = list(method = "cosine", normalize = "Z-score")),
UBCF_prs_z = list(name = "UBCF", param = list(method = "pearson", normalize = "Z-score"))
)eval_results <- suppressWarnings(evaluate(x = e, method = models, n = seq(10, 100, 10)))plot(eval_results, "prec/rec", annotate = T, main = "Precision-recall")
title("Precision-recall")plot(eval_results, annotate = 1, legend = "topleft")
title("ROC curve")A user-user model is preferable as a baseline the Pearson similarity resulted in the highest ROC. The normalization was a negligible factor. Once again, the values need to be constrained between -10 and 10.
# Creation of the model - U(ser) B(ased) C(ollaborative) F(iltering)
Rec.model <- Recommender(getData(e, "train"), "UBCF", parameter = list(method = "pearson", normalize = "Z-score", nn=25))
#Making predictions
prediction_UJ <- predict(Rec.model, getData(e, "known"), type="ratings",n=10)
# set all predictions that fall outside the valid range to the boundary values
prediction_UJ@data@x[prediction_UJ@data@x[] < -10] <- -10
prediction_UJ@data@x[prediction_UJ@data@x[] > 10] <- 10calcPredictionAccuracy(prediction_UJ, getData(e, "unknown"))## RMSE MSE MAE
## 4.195269 17.600278 3.243706
rmse_ubcf <- calcPredictionAccuracy(prediction_UJ, getData(e, "unknown"))[1]With the initial UBCF and z-score we receive a 4.1952685 RMSE. I would consider this value to be excessive, and given more time, a more impactful reduction of user and joke outliers ought to be used. However, this RMSE may be more accurate than we would expect, and its variance could be helpful in generating “serendipity” with the user.
Ideally, the n results would display the product of multiple engines. For example, the highest user-content selections paired with marginally randomized elements. In the following example, a second recommender engine is developed to abberate the original data as a complementary pairing to the original data set.
set.seed(223)
#partially randomizing the data -2 to +2 from original data point
'%rand%' <- function(x,y){x + sample(c(1,-1),length(y),replace = TRUE) * y}
Jester.random <- Jester5k
Jester.random@data@x<-sapply(Jester.random@data@x, function(x){x %rand% runif(1, 0, 2) })
e.2 <- evaluationScheme(Jester.random, method = "split",train = 0.8, given = 30, goodRating = 3, k=5)
# Creation of the model - U(ser) B(ased) C(ollaborative) F(iltering)
Rec.model.2 <- Recommender(getData(e.2, "train"), "UBCF", parameter = list(method = "pearson", normalize = "Z-score", nn=25))
#Making predictions
# Target the original dataset!
prediction.2 <- predict(Rec.model.2, getData(e, "known"), type="ratings",n=10)
# set all predictions that fall outside the valid range to the boundary values
prediction.2@data@x[prediction.2@data@x[] < -10] <- -10
prediction.2@data@x[prediction.2@data@x[] > 10] <- 10
calcPredictionAccuracy(prediction.2, getData(e, "unknown"))## RMSE MSE MAE
## 4.240971 17.985837 3.351313
rmse_ubcf.2 <- calcPredictionAccuracy(prediction.2, getData(e, "unknown"))[1]We receive a 4.2409712 RMSE, which is slightly higher than our original model.
In a real-world scenario, a recommender model needs to be measured for efficacy in an online/offline, hybrid environment. Let’s pretend it is 1995 and we are running a joke website. Users are presented with the option to “browse” or to view recommended top-n jokes, based upon known or unknown user behavior. The user may ignore the advice of the recommender and click on “browse”, or the user might completely give-in to the directive of the jokes engine and continuously click on the top suggestions. There may also be a random option. Each of these “clicks” can be measured and attributed, possibly as the “Click-Through Rate (CTR)”, to each user for additional predictive power. The CTR is the ratio of interactions on recommended items and the total of recommended items viewed by users during a session.
Given the relatively normal distribution, determining comparative offline versus online metrics would potentially require an analysis of unpopular, average, and popular jokes, or perhaps determining further divisions involving “tags” or natural language processing. However, this is a jokes website from 1995 so I doubt things really need to go that far, but the potential for optimization is still there. There is only so much one can assume with an offline data model, and one should expect to uncover some serious flaws while analyzing an online comparator. There should be a mutual development and feedback loop between offline and online in a development environment.