Introduction

Goal

Practice working with accuracy and other recommender system metrics. In this assignment you’re asked to do at least one or (if you like) both of the following:

Work in a small group, and/or
Choose a different dataset to work with from your previous projects

Accuracy and Beyond

As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.
Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.
Compare and report on any change in accuracy before and after you’ve made the change in #2.
As part of your textual conclusion, discuss one or more additional experiments that could be performed and/or metrics that could be evaluated only if online evaluation was possible. Also, briefly propose how you would design a reasonable online evaluation environment.

Data

Info

For this week project, we are using subset of Jester data from http://eigentaste.berkeley.edu/dataset/

Jester was developed by Ken Goldberg and his group at UC Berkeley and contains around 6 million ratings of 150 jokes. Compared to the other datasets that we use, Jester is unique in two aspects:

it uses continuous ratings from -10 to 10
it has the highest ratings density by an order of magnitude.

Jester has a density of about 30%, meaning that on average a user has rated 30% of all the jokes.

Preprocessing

remove the first column that contains number of ratings count per joke
change 99 to NA

jester <- read.xlsx2("data/jester-data-2.xls", "jester-data-2-new", header = F, 
                     colClasses='numeric',stringsAsFactors=FALSE)

# remove first column
jester <- jester[ , -1]

# Missing value to NA
jester[jester==99] <- NA

# Check NA table
table(is.na(jester))

## 
##   FALSE    TRUE 
## 1708993  641007

total = nrow(jester) * ncol(jester)
count.nan = sum(is.na(jester))
sparsity = round((total-count.nan)/total,4)

# index
names(jester)[1:100] <- paste("item", 1:100, sep="")

#user <- c(seq(1, length(jester)))
#test <- data.frame(user,jester)

# str(jester)
# value chr -> num
fwrite(jester,"jester")
jester <- fread("jester",colClasses="numeric")
# str(jester)

# Subset and create matrix
sample <- jester[sample.int(nrow(jester), 5000, replace=FALSE), ]
matrix <- as.matrix(sample)
real.rating.mat <- as(matrix, "realRatingMatrix")
real.rating.mat

## 5000 x 100 rating matrix of class 'realRatingMatrix' with 362984 ratings.

Sparsity and distribution

This data contains users who have rated more than 36 jokes, thus the sparsity of the matrix should be high. The NA table shows that the sparsity is 0.7272.

Visualization

# Distribution of ratings
hist(getRatings(real.rating.mat), col="skyblue", main="Distribution of Jester Ratings", xlab="Rating")

# Average distribution of ratings
avg.rating <- colMeans(na.omit(sample))
hist(avg.rating, col="skyblue", main="Distribution of Average Jester Ratings", xlab="Average Rating")

The histogram of the jester sample shows that mostly given ratings are positive. We can detect a left skew distribution that is near normal distribution.

Evaluation and Model Selection

Evaluation

items.to.keep <- 15
rating.threshold <- 1
number.of.trial <- 3
method = "split"
eval.method <- evaluationScheme(real.rating.mat, method=method, train=0.75, 
                                k=number.of.trial, 
                                given=items.to.keep, 
                                goodRating=rating.threshold )

# Evaluating a model consists of comparing the recommendations with the unknown purchases. The ratings are between 1 and 5, and we need to define what constitutes good and bad items. For this purpose, we will define a threshold with the minimum rating that is considered good
  
eval.method

## Evaluation scheme with 15 items given
## Method: 'split' with 3 run(s).
## Training set proportion: 0.750
## Good ratings: >=1.000000
## Data set: 5000 x 100 rating matrix of class 'realRatingMatrix' with 362984 ratings.

The selection of model is compared between cosine and pearson and the three normalization options (NULL, center, z-score). A Precision-recall and ROC curve is visualized as the means to select the effective method.

Model Selection

models <- list(
  UBCF.cos.null = list(name = "UBCF", param = list(method = "cosine", normalize = NULL)),
  UBCF.prs.null = list(name = "UBCF", param = list(method = "pearson", normalize = NULL)),
  UBCF.cos.center = list(name = "UBCF", param = list(method = "cosine", normalize = "center")),
  UBCF.prs.center = list(name = "UBCF", param = list(method = "pearson", normalize = "center")),
  UBCF.cos.z = list(name = "UBCF", param = list(method = "cosine", normalize = "Z-score")),
  UBCF.prs.z = list(name = "UBCF", param = list(method = "pearson", normalize = "Z-score"))
)

eval.results <- suppressWarnings(evaluate(x = eval.method, method = models, n = seq(10, 100, 10)))

## UBCF run fold/sample [model time/prediction time]
##   1  [0.006sec/4.893sec] 
##   2  [0.025sec/4.813sec] 
##   3  [0.001sec/4.377sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.001sec/4.129sec] 
##   2  [0sec/3.954sec] 
##   3  [0.001sec/4.068sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.024sec/4.127sec] 
##   2  [0.021sec/4.454sec] 
##   3  [0.031sec/3.869sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.024sec/3.829sec] 
##   2  [0.022sec/3.464sec] 
##   3  [0.022sec/3.507sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.125sec/4.308sec] 
##   2  [0.122sec/4.255sec] 
##   3  [0.123sec/4.245sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.115sec/3.99sec] 
##   2  [0.117sec/3.369sec] 
##   3  [0.117sec/3.866sec]

plot(eval.results, "prec/rec", annotate = T, main = "UBCF-Precision-recall")
title("UBCF-Precision-recall")

plot(eval.results, annotate = 1, legend = "topleft") 
title("UBCF-ROC curve")

models <- list(
  IBCF.cos.null = list(name = "IBCF", param = list(method = "cosine", normalize = NULL)),
  IBCF.prs.null = list(name = "IBCF", param = list(method = "pearson", normalize = NULL)),
  IBCF.cos.center = list(name = "IBCF", param = list(method = "cosine", normalize = "center")),
  IBCF.prs.center = list(name = "IBCF", param = list(method = "pearson", normalize = "center")),
  IBCF.cos.z = list(name = "IBCF", param = list(method = "cosine", normalize = "Z-score")),
  IBCF.prs.z = list(name = "IBCF", param = list(method = "pearson", normalize = "Z-score"))
)

eval.results <- suppressWarnings(evaluate(x = eval.method, method = models, n = seq(10, 100, 10)))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.153sec/0.211sec] 
##   2  [0.205sec/0.203sec] 
##   3  [0.147sec/0.198sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.233sec/0.215sec] 
##   2  [0.294sec/0.18sec] 
##   3  [0.243sec/0.207sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.178sec/0.256sec] 
##   2  [0.174sec/0.274sec] 
##   3  [0.162sec/0.27sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.253sec/0.261sec] 
##   2  [0.258sec/0.274sec] 
##   3  [0.349sec/0.462sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.296sec/0.28sec] 
##   2  [0.254sec/0.296sec] 
##   3  [0.258sec/0.276sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.33sec/0.287sec] 
##   2  [0.344sec/0.288sec] 
##   3  [0.344sec/0.289sec]

plot(eval.results, "prec/rec", annotate = T, main = "IBCF-Precision-recall")
title("IBCF-Precision-recall")

plot(eval.results, annotate = 1, legend = "topleft") 
title("IBCF-ROC curve")

models <- list(
  POPULAR.cos.null = list(name = "POPULAR", param = list(method = "cosine", normalize = NULL)),
  POPULAR.prs.null = list(name = "POPULAR", param = list(method = "pearson", normalize = NULL)),
  POPULAR.cos.center = list(name = "POPULAR", param = list(method = "cosine", normalize = "center")),
  POPULAR.prs.center = list(name = "POPULAR", param = list(method = "pearson", normalize = "center")),
  POPULAR.cos.z = list(name = "POPULAR", param = list(method = "cosine", normalize = "Z-score")),
  POPULAR.prs.z = list(name = "POPULAR", param = list(method = "pearson", normalize = "Z-score"))
)

eval.results <- suppressWarnings(evaluate(x = eval.method, method = models, n = seq(10, 100, 10)))

## POPULAR run fold/sample [model time/prediction time]
##   1  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.008sec/2.123sec] 
##   2  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.031sec/2.094sec] 
##   3  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.006sec/2.04sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.008sec/2.138sec] 
##   2  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.006sec/2.178sec] 
##   3  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.007sec/1.964sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.039sec/1.865sec] 
##   2  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.023sec/1.781sec] 
##   3  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.024sec/1.761sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.027sec/1.769sec] 
##   2  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.024sec/1.755sec] 
##   3  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.028sec/1.786sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.107sec/1.935sec] 
##   2  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.105sec/1.785sec] 
##   3  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.105sec/1.959sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.133sec/2.162sec] 
##   2  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.121sec/2.077sec] 
##   3  Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE
## [0.118sec/2.057sec]

plot(eval.results, "prec/rec", annotate = T, main = "POPULAR-Precision-recall")
title("Popular-Precision-recall")

plot(eval.results, annotate = 1, legend = "topleft") 
title("Popular-ROC curve")

models <- list(
  RANDOM.cos.null = list(name = "RANDOM", param = list(method = "cosine", normalize = NULL)),
  RANDOM.prs.null = list(name = "RANDOM", param = list(method = "pearson", normalize = NULL)),
  RANDOM.cos.center = list(name = "RANDOM", param = list(method = "cosine", normalize = "center")),
  RANDOM.prs.center = list(name = "RANDOM", param = list(method = "pearson", normalize = "center")),
  RANDOM.cos.z = list(name = "RANDOM", param = list(method = "cosine", normalize = "Z-score")),
  RANDOM.prs.z = list(name = "RANDOM", param = list(method = "pearson", normalize = "Z-score"))
)

eval.results <- suppressWarnings(evaluate(x = eval.method, method = models, n = seq(10, 100, 10)))

## RANDOM run fold/sample [model time/prediction time]
##   1  [0.003sec/0.252sec] 
##   2  [0.022sec/0.229sec] 
##   3  [0.002sec/0.23sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0.003sec/0.232sec] 
##   2  [0.003sec/0.232sec] 
##   3  [0.003sec/0.232sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0.003sec/0.455sec] 
##   2  [0.002sec/0.238sec] 
##   3  [0.003sec/0.226sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0.002sec/0.235sec] 
##   2  [0.002sec/0.239sec] 
##   3  [0.002sec/0.223sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0.003sec/0.225sec] 
##   2  [0.003sec/0.218sec] 
##   3  [0.002sec/0.199sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0.002sec/0.229sec] 
##   2  [0.002sec/0.233sec] 
##   3  [0.002sec/0.223sec]

plot(eval.results, "prec/rec", annotate = T, main = "RANDOM-Precision-recall")
title("Random-Precision-recall")

plot(eval.results, annotate = 1, legend = "topleft") 
title("Random-ROC curve")

Prediction and Accuracy

Prediction

# user base cf
ubcf <- Recommender(getData(eval.method, "train"), "UBCF")

# item base cf
ibcf <- Recommender(getData(eval.method, "train"), "IBCF")

# serendipity and novelty
popular <- Recommender(getData(eval.method, "train"), "POPULAR")

random <- Recommender(getData(eval.method, "train"), "RANDOM")

# user base cf
ubcf.after <- Recommender(getData(eval.method, "train"), "UBCF", parameter = list(method = "pearson", normalize = "Z-score"))

# item base cf
ibcf.after <- Recommender(getData(eval.method, "train"), "IBCF", parameter = list(method = "cosine", normalize = "Z-score"))

# serendipity and novelty
popular.after <- Recommender(getData(eval.method, "train"), "POPULAR", parameter = list(method = "pearson", normalize = "Z-score"))

## Available parameter (with default values):
## normalize     =  center
## aggregationRatings    =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colMeans")
## aggregationPopularity     =  function (x, na.rm = FALSE, dims = 1, ...)  standardGeneric("colSums")
## verbose   =  FALSE

random.after <- Recommender(getData(eval.method, "train"), "RANDOM", parameter = list(method = "pearson", normalize = "Z-score"))

ubcf.pred <- predict(ubcf, getData(eval.method, "known"), type="ratings")
ibcf.pred <- predict(ibcf, getData(eval.method, "known"), type="ratings")
pop.pred <- predict(popular, getData(eval.method, "known"), type="ratings")
rand.pred <- predict(random, getData(eval.method, "known"), type="ratings")

ubcf.pred.after <- predict(ubcf.after, getData(eval.method, "known"), type="ratings")
ibcf.pred.after <- predict(ibcf.after, getData(eval.method, "known"), type="ratings")
pop.pred.after <- predict(popular.after, getData(eval.method, "known"), type="ratings")
rand.pred.after <- predict(random.after, getData(eval.method, "known"), type="ratings")

Accuracy

error <- rbind(
  ubcf = calcPredictionAccuracy(ubcf.pred, getData(eval.method, "unknown")), 
  ubcf.after = calcPredictionAccuracy(ubcf.pred.after, getData(eval.method, "unknown")), 
  ibcf = calcPredictionAccuracy(ibcf.pred, getData(eval.method, "unknown")), 
  ibcf.after = calcPredictionAccuracy(ibcf.pred.after, getData(eval.method, "unknown")), 
  rand = calcPredictionAccuracy(rand.pred, getData(eval.method, "unknown")), 
  rand.after = calcPredictionAccuracy(rand.pred.after, getData(eval.method, "unknown")), 
  pop = calcPredictionAccuracy(pop.pred, getData(eval.method, "unknown")),
  pop.after = calcPredictionAccuracy(pop.pred.after, getData(eval.method, "unknown"))
)

kable(error)

	RMSE	MSE	MAE
ubcf	4.544521	20.65267	3.561756
ubcf.after	4.511355	20.35233	3.505248
ibcf	5.180793	26.84062	4.038888
ibcf.after	5.167722	26.70536	4.033240
rand	6.269908	39.31175	4.832408
rand.after	6.291395	39.58165	4.845250
pop	4.458940	19.88214	3.494865
pop.after	4.455285	19.84957	3.460197

Conclusion

In this project, we have implemented four algorithms for recommender system using recommenderlab package. The implemented models are as follows:

user base collaborative filtering
item base collaborative filtering
popular method for novelty
random method for serendipity

The final error rate table contains before and after optimizing numeric parameters for algorithms. The results suggest that the most suitable model is Popular, using pearson correlation as distance function with z-score normalization, followed by UBCF model, using pearson correlation as distance function with z-score normalization.

Precision is the percentage of recommended items that have been rated. Recall is the percentage of rated items that have been recommended. As the precision decreases, we can state that a small percentage of rated items are recommended.

The ROC table shows the relationship between the TP and FP rate. For maximum accuracy, it would be wise to experiment various techniques such as normalization to improve the accuracy. If we wanted to maximize serendipity and trade off the accuracy, the algorithm choice would be Popular model.

To improve the algorithms, further optimizing numeric parameters would be an option such as hyper parameter tuning in order to either achieve the highest recall or precision. It is important to evaluate different techniques using different methods. Sophisticated optimization of numeric parameter is essential in order to elaborate the system. Depending on the objective of the business, the parameters will vary.

This dataset was offline and the recommender system was designed based on offline accuracy. Thus the predictions are generated using pre-existing record of ratings, without the recommender system ever being used.

If it were to be implemented online, the accuracy should be measured by recommender system in present, predicting or recommending an item as users view them.

The metrics for recommender system can vary depending on business objective. It could be high click through rate for interests in items or actual purchase that makes revenue for the business. To find which recommender system to implement in business, it would highly depend on the business objective metric. Thus whicever highest performance shows, should be selected.