Overview

The purpose of this exercise is to build a recommender system for demonstration purposes. We will use the recommenderlab library along with the Jester5k dataset. Documentation on recommenderlab can be found at: https://cran.r-project.org/web/packages/recommenderlab/index.html.

Exploring the Jester5k data set

This data set contains a sample of 5000 users who have provided ratings for 100 items on a scale of -10 to +10. In essence, Jester5K is a 1000 x 100 rating matrix with a total of 362106 ratings.

library(recommenderlab)
data(Jester5k)
r <- sample(Jester5k, 1000)
hist(getRatings(normalize(r, method="Z-score")), breaks=100)

Building a basic recommender to output Top-N recommendations

The following code segment builds a model using the POPULAR method with the first 100 data points to issue three (TopN = 3) recommendations for a new user (in 250th place).

r_POPULAR <- Recommender(r[1:100], method="POPULAR")
recom <- predict(r_POPULAR, r[250], n=3)
as(recom, "list")
## $u13721
## [1] "j89" "j6"  "j12"

Evaluating the recommender algorithms

In the following section, we use the first 1000 data points in Jester5k to build recommender models with the algorithms below:

RANDOM: This method produces random recommendations. POPULAR: This method produces recommendations based on the popularity of items. UBCF: This method produces recommendations based on user-based collaborative filtering. IBCF: This method produces recommendations based on item-based collaborative filtering. SVD: This method produces recommendations based on SVD approximation with column-mean imputation.

We use TopN values of 1, 3, 5, 10, 15, and 20 to investigate changes in accuracy.

scheme <- evaluationScheme(r[1:1000], method="split", train=0.7, k=1, given=-5, goodRating=5)

algorithms <- list("random items" = list(name="RANDOM", param=NULL), "popular items" = list(name="POPULAR", param=NULL), "user-based CF" = list(name="UBCF", param=list(nn=50)), "item-based CF" = list(name="IBCF", param=list(k=50)), "SVD approximation" = list(name="SVD", param=list(k=50)))

results <- evaluate(scheme, algorithms, type="topNList", n=c(1, 3, 5, 10, 15, 20))
## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.03sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  [0.02sec/0.5sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.61sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.05sec/0.06sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.08sec/0.05sec]

Plotting the Receiver Operating Characteristic (ROC) Curve

The following plot illustrates the performance of each of the algorithms above. For the chosen data set, UCBF and POPULAR appear to offer the best area under the curve:

plot(results, annotate=c(1,3), legend="bottomright")

Plotting the Precision-Recall (P-R) Curve

The following plot illustrates the performance of each of the algorithms above. Again, UCBF and POPULAR algorithms outperform other methods.

plot(results, "prec/rec", annotate=3, legend="topleft")

Conclusion / Summary

The purpose of this exercise was to build a recommender system for demonstration purposes. We used the recommenderlab library along with the Jester5k dataset.

For the given data set, UCBF and POPULAR methods within recommenderlab provided the best results. It is worth remembering that the choice of the recommender algorithm depends on the context.

In addition to the methods provided within the recommenderlab library, there are other supervised and unsupervised machine learning algorithm as well as neutral network-driven deep learning algorithms that could be used to build recommendation systems.