Source: For this project I’ll use the Jester joke dataset which I haven’t worked with yet. It has the highest density of the example datasets I was able to find online: http://eigentaste.berkeley.edu/dataset/
# Import 5K x 100 realRatingMatrix of jokes
data(Jester5k)
jester.df <- as(Jester5k, "data.frame")
# Summary of ratings per user
summary(rowCounts(Jester5k))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 36.0 53.0 72.0 72.4 100.0 100.0
# Matrix size
dim(Jester5k)
## [1] 5000 100
# Number of ratings
nratings(Jester5k)
## [1] 362106# Histogram of ratings - normalized to deal with bias
hist(getRatings(normalize(Jester5k, method="Z-score")),
breaks=100,
main = "Histogram of Ratings",
xlab = "Rating",
ylab = "Count")The recommenderlab package will normalize the ratings so that the average for each user is 0, removing the bias from those users who give exceptionally high or low ratings.
evaluation_scheme <- list(
"item-based" = list(name="IBCF",
parameter = list(k = 30, normalize="Z-score")),
"user-based" = list(name="UBCF",
parameter = list(nn = 30, normalize="Z-score")),
"SVD" = list(name="SVD"))
# Number of items to recommend to each user. Model performance will be based on this
n_recommendations = seq(10, 100, 10)
results <- evaluate(x=eval_sets, method=evaluation_scheme, n=n_recommendations)
## IBCF run fold/sample [model time/prediction time]
## 1 [0.6sec/0.27sec]
## 2 [0.53sec/0.26sec]
## 3 [0.41sec/0.42sec]
## 4 [0.42sec/0.29sec]
## 5 [0.72sec/0.57sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.16sec/7.49sec]
## 2 [0.14sec/7.38sec]
## 3 [0.14sec/5.94sec]
## 4 [0.14sec/5.32sec]
## 5 [0.12sec/5.78sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.13sec/0.17sec]
## 2 [0.17sec/0.19sec]
## 3 [0.14sec/0.38sec]
## 4 [0.13sec/0.22sec]
## 5 [0.15sec/0.17sec]Judging by the AUC, it appears that the UBCF and SVD models perform the best, and perform similarly. This is no surprise as we’re working with a fairly dense dataset so the SVD doesn’t affect the matrix all that much.
At this stage I’ll introduce a random model.
My expectation is that it will score worse, just based on the nature of the model and its reduced dependency on actual historical voting patterns, but users may find it preferable over a more accurate yet tired model.
Alternatives for accomplishing this goal could be including more neighbors in the model training process to ensure some selections from the tail, or Recommenderlab’s popular model.
evaluation_scheme <- list(
"item-based" = list(name="IBCF",
parameter = list(k = 30, normalize="Z-score")),
"user-based" = list(name="UBCF",
parameter = list(nn = 30, normalize="Z-score")),
"SVD" = list(name="SVD"),
"Random" = list(name="RANDOM"))
# Number of items to recommend to each user. Model performance will be based on this
n_recommendations = seq(10, 100, 10)
results <- evaluate(x=eval_sets, method=evaluation_scheme, n=n_recommendations)
## IBCF run fold/sample [model time/prediction time]
## 1 [0.48sec/0.3sec]
## 2 [0.52sec/0.23sec]
## 3 [0.48sec/0.22sec]
## 4 [0.42sec/0.21sec]
## 5 [0.38sec/0.23sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.12sec/5.55sec]
## 2 [0.16sec/5.48sec]
## 3 [0.13sec/5.57sec]
## 4 [0.13sec/5.73sec]
## 5 [0.11sec/5.64sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.13sec/0.19sec]
## 2 [0.12sec/0.4sec]
## 3 [0.12sec/0.21sec]
## 4 [0.14sec/0.19sec]
## 5 [0.14sec/0.19sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0sec/0.23sec]
## 2 [0sec/0.41sec]
## 3 [0sec/0.17sec]
## 4 [0sec/0.21sec]
## 5 [0.02sec/0.18sec]As expected, the random model performs very poorly in terms of accuracy.
Another approach to novelty/serendipity could be a weighted/hybrid solution of the top performing UBCF model, along with some randomness.
It’s interesting to note, though not a surprise, that the model “training” time for the random model is 0 seconds for each fold.
If accuracy was the entire goal of this model then at this stage I would start looking into tuning the UBCF model with varying correlation methodologies or neighborhood sizes.
An online evaluation would further tune the model by gauging user reaction to given recommendations in terms of actual ratings to the ones that a UBCF wouldn’t have otherwise recommended.
With more time, I’d have attempted to use the full dataset in a Spark environment, and confirm that generated recommendations are indeed unique.