DATA 643 Project 4 - Accuracy and Beyond

Data Loading and Preparation
Data summary and preprocessing
Constructing the evaluation model
Correlation method comparison and performance
Model Selection
Prediction Accuracy
Introducing artifical variation
Online model discussion
Supportive links

In another use-case of recommendation systems in R, the following will:

Compare the accuracy of at least two recommender system algorithms against offline data.
Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.
Compare and report on any change in accuracy before and after.
Discuss one or more additional experiments that could be performed and/or metrics that could be evaluated if online evaluation was possible.
Briefly propose how you would design a reasonable online evaluation environment.

We will be designing a simple recommender system on the Jester5K data set, https://grouplens.org/datasets/jester/. The Jester data set is “The data set contains a sample of 5000 users from the anonymous ratings data from the Jester Online Joke Recommender System collected between April 1999 and May 2003.” A user collaborative filtering approaches will be analyzed and compared using evaluation metrics, and a preferred option will be selected and optimized. A method for introducing variance, or “diversity”, will also be evaluated and the Root Mean Square Errors of each solution will be calculated and contrasted.

Data Loading and Preparation

set.seed(222)
# Libraries
library(recommenderlab)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.4.1

# Data loading
data(Jester5k)

Data summary and preprocessing

A summary of the data shows a 5000 by 100 matrix, users and items respectively, with a ratings range of approximately -9.95 to +9.90. There are no missing values. It is a relatively dense matrix. The data is constrained between -10 and +10. There is a sparsity of 0.72, which is relatively dense. The majority of reviews are 0, with a preference for positive reviews compared to negative. There is a near normal distribution for reviews, with a median around 1.

#size
dim(Jester5k)

## [1] 5000  100

#summary
jDF <- as(Jester5k, 'data.frame')
jDF$user <- as.numeric(jDF$user)
jDF$item <- as.numeric(jDF$item)
summary(jDF)

##       user           item            rating     
##  Min.   :   1   Min.   :  1.00   Min.   :-9.95  
##  1st Qu.:1263   1st Qu.: 21.00   1st Qu.:-3.06  
##  Median :2495   Median : 43.00   Median : 1.46  
##  Mean   :2503   Mean   : 43.79   Mean   : 0.85  
##  3rd Qu.:3749   3rd Qu.: 63.00   3rd Qu.: 5.10  
##  Max.   :5000   Max.   :100.00   Max.   : 9.90

# number of ratings
nratings(Jester5k)

## [1] 362106

vector_ratings <- as.vector(Jester5k@data)
head(as(Jester5k,"matrix")[,1:10])

##           j1    j2    j3    j4    j5    j6    j7    j8    j9   j10
## u2841   7.91  9.17  5.34  8.16 -8.74  7.14  8.88 -8.25  5.87  6.21
## u15547 -3.20 -3.50 -9.56 -8.74 -6.36 -3.30  0.78  2.18 -8.40 -8.79
## u15221 -1.70  1.21  1.55  2.77  5.58  3.06  2.72 -4.66  4.51 -3.06
## u15573 -7.38 -8.93 -3.88 -7.23 -4.90  4.13  2.57  3.83  4.37  3.16
## u21505  0.10  4.17  4.90  1.55  5.53  1.50 -3.79  1.94  3.59  4.81
## u15994  0.83 -4.90  0.68 -7.18  0.34 -4.32 -6.17  6.12 -5.58  5.44

#check sparsity
sparsity <- function(ratings){
  nratings(ratings) / (dim(ratings)[1] * dim(ratings)[2])
}

sparsity(Jester5k)

## [1] 0.724212

#Distribution
qplot(vector_ratings) + ggtitle("Distribution of the ratings")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Average ratings
average_ratings <- colMeans(Jester5k)
qplot(average_ratings) + stat_bin(binwidth = 0.1) + ggtitle("Distribution of the average joke rating")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#constrain / normalize
Jester5k@data@x[Jester5k@data@x[] < -10] <- -10
Jester5k@data@x[Jester5k@data@x[] > 10] <- 10

# worst joke
worst <- which.min(colMeans(Jester5k))
cat(JesterJokes[worst])

## How many teddybears does it take to change a lightbulb? It takes only one teddybear, but it takes a whole lot of lightbulbs.

# Keeping only jokes with more than 80 ratings and users with more than twenty rating
Jester5k_r <- Jester5k[rowCounts(Jester5k) > 80,  colCounts(Jester5k) > 20]

Constructing the evaluation model

e <- evaluationScheme(Jester5k_r, method = "split",train = 0.8, given = 30, goodRating = 3, k=5)
e

## Evaluation scheme with 30 items given
## Method: 'split' with 5 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 1701 x 100 rating matrix of class 'realRatingMatrix' with 167062 ratings.

Correlation method comparison and performance

The model selection list includes user similarities, with comparisons made between cosine/pearson, and the three normalization options NULL, center, and z-score. A Precision-recall and ROC curve were visualized as the means to select the most effective method. These following model parameters were arbitrarily selected due to time and visual analysis limitations.

models <- list(
  UBCF_cos_null = list(name = "UBCF", param = list(method = "cosine", normalize = NULL)),
  UBCF_prs_null = list(name = "UBCF", param = list(method = "pearson", normalize = NULL)),
  UBCF_cos_center = list(name = "UBCF", param = list(method = "cosine", normalize = "center")),
  UBCF_prs_center = list(name = "UBCF", param = list(method = "pearson", normalize = "center")),
  UBCF_cos_z = list(name = "UBCF", param = list(method = "cosine", normalize = "Z-score")),
  UBCF_prs_z = list(name = "UBCF", param = list(method = "pearson", normalize = "Z-score"))
)

eval_results <- suppressWarnings(evaluate(x = e, method = models, n = seq(10, 100, 10)))

plot(eval_results, "prec/rec", annotate = T, main = "Precision-recall")
title("Precision-recall")

plot(eval_results, annotate = 1, legend = "topleft") 
title("ROC curve")

Model Selection

A user-user model is preferable as a baseline the Pearson similarity resulted in the highest ROC. The normalization was a negligible factor. Once again, the values need to be constrained between -10 and 10.

# Creation of the model - U(ser) B(ased) C(ollaborative) F(iltering)
Rec.model <- Recommender(getData(e, "train"), "UBCF", parameter = list(method = "pearson", normalize = "Z-score", nn=25))

#Making predictions 
prediction_UJ <- predict(Rec.model, getData(e, "known"), type="ratings",n=10)

# set all predictions that fall outside the valid range to the boundary values
prediction_UJ@data@x[prediction_UJ@data@x[] < -10] <- -10
prediction_UJ@data@x[prediction_UJ@data@x[] > 10] <- 10

Prediction Accuracy

calcPredictionAccuracy(prediction_UJ, getData(e, "unknown"))

##      RMSE       MSE       MAE 
##  4.195269 17.600278  3.243706

rmse_ubcf <- calcPredictionAccuracy(prediction_UJ, getData(e, "unknown"))[1]

With the initial UBCF and z-score we receive a 4.1952685 RMSE. I would consider this value to be excessive, and given more time, a more impactful reduction of user and joke outliers ought to be used. However, this RMSE may be more accurate than we would expect, and its variance could be helpful in generating “serendipity” with the user.

Introducing artifical variation

Ideally, the n results would display the product of multiple engines. For example, the highest user-content selections paired with marginally randomized elements. In the following example, a second recommender engine is developed to abberate the original data as a complementary pairing to the original data set.

set.seed(223)

#partially randomizing the data -2 to +2 from original data point
'%rand%' <- function(x,y){x + sample(c(1,-1),length(y),replace = TRUE) * y}
Jester.random <-  Jester5k
Jester.random@data@x<-sapply(Jester.random@data@x, function(x){x %rand% runif(1, 0, 2) })

e.2 <- evaluationScheme(Jester.random, method = "split",train = 0.8, given = 30, goodRating = 3, k=5)
# Creation of the model - U(ser) B(ased) C(ollaborative) F(iltering)
Rec.model.2 <- Recommender(getData(e.2, "train"), "UBCF", parameter = list(method = "pearson", normalize = "Z-score", nn=25))

#Making predictions 
# Target the original dataset!
prediction.2 <- predict(Rec.model.2, getData(e, "known"), type="ratings",n=10)

# set all predictions that fall outside the valid range to the boundary values
prediction.2@data@x[prediction.2@data@x[] < -10] <- -10
prediction.2@data@x[prediction.2@data@x[] > 10] <- 10

calcPredictionAccuracy(prediction.2, getData(e, "unknown"))

##      RMSE       MSE       MAE 
##  4.240971 17.985837  3.351313

rmse_ubcf.2 <- calcPredictionAccuracy(prediction.2, getData(e, "unknown"))[1]

We receive a 4.2409712 RMSE, which is slightly higher than our original model.

Online model discussion

In a real-world scenario, a recommender model needs to be measured for efficacy in an online/offline, hybrid environment. Let’s pretend it is 1995 and we are running a joke website. Users are presented with the option to “browse” or to view recommended top-n jokes, based upon known or unknown user behavior. The user may ignore the advice of the recommender and click on “browse”, or the user might completely give-in to the directive of the jokes engine and continuously click on the top suggestions. There may also be a random option. Each of these “clicks” can be measured and attributed, possibly as the “Click-Through Rate (CTR)”, to each user for additional predictive power. The CTR is the ratio of interactions on recommended items and the total of recommended items viewed by users during a session.

Given the relatively normal distribution, determining comparative offline versus online metrics would potentially require an analysis of unpopular, average, and popular jokes, or perhaps determining further divisions involving “tags” or natural language processing. However, this is a jokes website from 1995 so I doubt things really need to go that far, but the potential for optimization is still there. There is only so much one can assume with an offline data model, and one should expect to uncover some serious flaws while analyzing an online comparator. There should be a mutual development and feedback loop between offline and online in a development environment.

Supportive links

https://rpubs.com/tarashnot/recommender_comparison

https://rpubs.com/cyadusha/recsys2

https://grouplens.org/datasets/jester/

https://stackoverflow.com/questions/28240243/function-that-do-addition-or-subtraction-randomly-in-r