1 Objective

The goal of this assignment is to try out different ways of implementing and configuring a recommender, and to evaluate different approaches. Start with an existing dataset of user-item ratings, and implement at least two of these recommendation algorithms:

Content-Based Filtering
User-User Collaborative Filtering
Item-Item Collaborative Filtering

Evaluate and compare different approaches, and provide at least one graph and a summary of findings and recommendations.

2 Data source and approach

For this project, we use dataset 1_2 from the Jester database of joke ratings. The dataset includes ratings of 100 jokes from 23,500 users, each of whom has rated 36 or more jokes. Ratings are given as real values ranging from -10.00 (worse) to +10.00 (best).

We use the recommenderlab package to develop and test our recommender system using different recommendation algorithms. In addition, we use some of the ideas and code suggestions from the recommenderlab vignette, which I also used in a project for DATA 607 last fall.

library(tidyverse)
library(recommenderlab)
library(knitr)

3 Data preparation

We start by preparing the data for the recommender system. We note from the data description for the Jester database that the rating value “99” corresponds to “null” or “not rated”, and that the first column in the dataset is the number of jokes rated by each user. So our data preparation includes the following steps:

Load the raw data (from disk or from GitHub repo)
Confirm that the number of non-blank ratings per row matches the number in the first column
Drop the first column
Convert rating values of “99” to NA
Set row names to user ID and column names to joke ID
Create the ratings matrix as type realRatingMatrix
View the ratings matrix.

In the framework of the user-item utility matrix, the “users” are the raters and the “items” are the jokes. We see that the ratings matrix includes 1.71 million ratings in total, with 23,500 users and 100 jokes. This implies that the ratings matrix is 73% populated with non-blank ratings (or conversely, 27% sparse), compared to a total of 2.35 million possible ratings.

# load data from github or from disk
# github repo
file <- "https://raw.githubusercontent.com/kecbenson/DATA612/master/jester-data-2.csv"
# local directory
#file <- "C:/Users/Kevin/Documents/CUNY - MSDS/DATA 612/jester_dataset_1_2/jester-data-2.csv"     
df <- read.csv(file, header = FALSE)

# check that rowcounts match, then discard first column
all.equal(rowSums(df[ , 2:ncol(df)] != 99), df[ , 1])

## [1] TRUE

df <- df[-1]

# set values of 99 to NA and set row & column names
df[df == 99] <- NA
rownames(df) <- paste0("u", 0:(nrow(df)-1))
colnames(df) <- paste0("j", 0:(ncol(df)-1))

glimpse(df)

## Observations: 23,500
## Variables: 100
## $ j0  <dbl> NA, -4.37, NA, 0.34, NA, NA, 3.50, -7.67, 1.02, 3.64, 9.27...
## $ j1  <dbl> 8.11, -3.88, NA, -6.55, NA, NA, 2.28, 7.14, 6.07, 1.60, 6....
## $ j2  <dbl> NA, 0.73, NA, 2.86, NA, NA, 3.01, -6.31, 6.65, -0.39, -9.8...
## $ j3  <dbl> NA, -3.20, NA, NA, NA, NA, -0.63, NA, -0.87, 3.74, -8.83, ...
## $ j4  <dbl> -2.28, -6.41, 0.73, -3.64, 9.13, -1.46, 4.95, -9.32, 6.80,...
## $ j5  <dbl> -4.22, 1.17, NA, 1.12, NA, NA, -0.49, 6.07, 0.68, -9.61, 0...
## $ j6  <dbl> 5.49, 7.82, 5.53, 5.34, -9.32, 2.72, -0.49, 3.59, 5.34, 8....
## $ j7  <dbl> -2.62, -4.76, 3.25, 2.33, -2.04, -3.83, -0.68, -2.72, 4.56...
## $ j8  <dbl> NA, -6.41, NA, NA, NA, NA, 3.30, NA, -0.39, -7.09, -6.17, ...
## $ j9  <dbl> -2.28, 0.73, NA, 2.33, NA, NA, -1.41, -8.01, 8.25, 3.88, -...
## $ j10 <dbl> 5.34, 1.99, NA, 3.54, NA, NA, 4.71, -9.13, 8.93, -6.36, 9....
## $ j11 <dbl> 3.54, 0.63, NA, 0.92, 2.04, NA, 3.30, -8.06, 6.21, -9.76, ...
## $ j12 <dbl> -9.71, -1.41, -2.04, -5.87, -2.04, -3.83, 1.26, 2.67, 0.10...
## $ j13 <dbl> 3.30, -4.32, 2.91, 2.77, NA, 0.97, -4.03, 7.43, 9.08, -7.9...
## $ j14 <dbl> -5.58, 0.63, 2.62, 4.76, -2.23, -3.83, -5.49, -7.23, -0.73...
## $ j15 <dbl> -6.07, -9.22, -1.70, -3.54, -8.50, -1.26, -5.44, -7.23, 2....
## $ j16 <dbl> -0.29, 0.87, 2.28, -7.38, 3.16, 0.00, 2.86, -0.53, 6.46, -...
## $ j17 <dbl> 2.33, 0.68, 3.11, -4.37, 2.18, -4.90, 1.17, 1.60, -5.34, -...
## $ j18 <dbl> 3.93, 0.29, 2.91, -1.75, 3.93, -2.72, 1.80, -7.91, 0.97, 2...
## $ j19 <dbl> -0.34, -4.51, 6.75, -4.81, 2.96, -0.39, 1.07, -8.11, 3.83,...
## $ j20 <dbl> -6.89, -6.65, 2.72, -1.75, 6.41, 1.50, 2.62, -7.23, 8.59, ...
## $ j21 <dbl> -2.09, -4.03, 2.82, 2.72, 2.62, NA, 0.73, -9.13, 4.71, 4.8...
## $ j22 <dbl> -8.83, 1.94, NA, 2.96, NA, NA, -0.39, -7.77, 8.45, -9.13, ...
## $ j23 <dbl> NA, -3.11, NA, NA, NA, NA, -1.94, NA, -0.44, 6.12, -8.98, ...
## $ j24 <dbl> 5.15, -4.32, NA, 1.02, NA, NA, 1.99, -8.88, 8.59, 1.60, -6...
## $ j25 <dbl> 3.98, -0.63, 4.22, 2.96, 5.10, 2.62, 1.21, 7.38, 7.52, -9....
## $ j26 <dbl> 4.47, 5.83, 2.77, 3.25, 0.05, 3.74, 2.09, 0.00, 6.26, -9.3...
## $ j27 <dbl> 3.88, 3.64, 3.35, 0.10, 6.12, NA, 3.74, -8.59, 9.08, -3.83...
## $ j28 <dbl> 5.29, 0.39, 4.22, 2.18, 7.18, 3.06, 0.63, 1.65, 7.28, -0.4...
## $ j29 <dbl> NA, -0.44, NA, 0.97, NA, NA, 3.30, -8.16, 6.17, -6.17, -6....
## $ j30 <dbl> 7.09, 0.73, 3.20, -6.89, 8.35, 2.04, 2.04, 8.40, 6.99, -8....
## $ j31 <dbl> -2.23, 1.07, 4.51, -3.93, 7.43, 4.03, -2.48, -6.55, 8.30, ...
## $ j32 <dbl> NA, 3.06, NA, NA, NA, NA, 3.35, -7.86, -7.91, -2.67, -0.49...
## $ j33 <dbl> NA, -4.51, NA, -2.43, NA, NA, 2.18, -8.01, 6.80, -4.90, 4....
## $ j34 <dbl> 3.11, -0.39, 5.24, 4.17, 7.57, 4.27, 4.32, 0.44, 2.82, 3.2...
## $ j35 <dbl> -4.42, 3.11, 2.33, -4.47, 5.49, 5.49, 1.89, -1.50, 3.50, 4...
## $ j36 <dbl> NA, 1.02, NA, NA, NA, NA, -1.99, NA, 0.15, 3.30, -9.08, 4....
## $ j37 <dbl> -5.87, -2.09, 0.00, 0.63, 8.98, 3.06, 4.17, -8.88, 1.99, 0...
## $ j38 <dbl> NA, -6.21, 1.84, 1.50, NA, NA, 2.43, 8.20, 9.17, -5.87, -0...
## $ j39 <dbl> -7.28, -4.71, 4.42, 3.45, NA, NA, 2.86, -9.13, 8.20, -4.17...
## $ j40 <dbl> NA, 1.07, NA, 2.91, NA, NA, 4.47, -8.45, 8.40, -3.54, -9.6...
## $ j41 <dbl> -4.71, 2.77, 2.82, 1.65, 5.63, 2.72, -1.02, 7.38, 6.80, -9...
## $ j42 <dbl> -8.93, 2.04, NA, NA, NA, NA, 2.38, -7.48, 8.30, -2.62, -0....
## $ j43 <dbl> NA, -5.97, NA, 3.06, NA, NA, -0.63, NA, -1.26, 1.75, -8.83...
## $ j44 <dbl> 3.40, -0.58, NA, 0.58, 4.66, NA, -4.76, 7.86, 8.01, -2.57,...
## $ j45 <dbl> -7.18, -3.74, 4.17, 3.93, NA, 0.97, -2.09, 7.67, 7.48, -1....
## $ j46 <dbl> 2.28, 3.30, NA, 3.69, NA, NA, 3.16, 7.48, 8.11, -8.88, -8....
## $ j47 <dbl> 7.48, -1.46, 3.69, 5.15, NA, 0.97, 3.79, -8.69, 7.91, 1.31...
## $ j48 <dbl> 5.15, 6.02, 3.79, 3.30, 4.17, 4.08, 4.27, -7.57, 9.03, -3....
## $ j49 <dbl> 5.73, 0.44, 2.48, 1.55, 8.11, 1.99, 1.99, 1.60, 8.88, -3.5...
## $ j50 <dbl> NA, -4.22, NA, -4.90, 7.33, NA, 1.70, 8.20, 7.28, -9.66, -...
## $ j51 <dbl> NA, -0.44, NA, -6.99, NA, NA, 4.08, 7.82, 8.11, -9.66, -7....
## $ j52 <dbl> 5.78, -1.94, 3.64, 3.06, 6.02, 4.90, 0.87, 0.68, 9.08, -4....
## $ j53 <dbl> 3.20, 8.45, 3.01, 3.64, 4.71, 0.97, -2.62, 7.38, 8.88, -2....
## $ j54 <dbl> -0.44, -3.01, NA, 1.21, -0.58, NA, 2.04, -8.69, 6.21, -3.3...
## $ j55 <dbl> 5.73, -4.51, 2.52, 0.68, 7.09, 1.50, 3.30, -7.23, 8.83, -8...
## $ j56 <dbl> NA, 3.54, NA, NA, -2.23, 0.68, 3.06, NA, 3.54, -8.01, -9.8...
## $ j57 <dbl> NA, -0.44, 4.71, NA, NA, NA, -3.74, NA, 0.19, -5.78, -4.13...
## $ j58 <dbl> NA, 4.56, NA, 1.02, NA, NA, 0.34, -8.16, 6.65, -9.76, -4.1...
## $ j59 <dbl> NA, 2.04, NA, 2.28, NA, NA, -0.92, -7.67, 6.07, -0.34, -0....
## $ j60 <dbl> -8.25, 5.78, -0.34, 1.41, 5.78, 2.09, 1.75, 8.35, 8.93, -3...
## $ j61 <dbl> 4.61, 5.63, 3.35, 1.21, 1.94, 3.06, 0.78, -7.23, 3.40, -1....
## $ j62 <dbl> NA, 6.02, NA, 0.83, NA, NA, 3.69, -7.72, 8.11, -9.71, -6.8...
## $ j63 <dbl> NA, -4.37, NA, -0.10, NA, NA, -0.68, -7.67, 8.54, 6.75, -0...
## $ j64 <dbl> -3.98, -8.98, 3.88, 4.71, 5.24, 3.20, 3.35, -1.80, 7.82, 1...
## $ j65 <dbl> 4.90, 0.87, -0.15, 1.12, 6.36, 1.89, 1.65, -8.01, 7.67, -5...
## $ j66 <dbl> NA, -4.66, NA, 3.01, NA, NA, 0.92, -6.07, 5.29, 0.24, -0.3...
## $ j67 <dbl> -4.32, 2.82, 3.06, 2.67, NA, 1.50, 2.52, -7.23, -0.29, -0....
## $ j68 <dbl> -2.72, 2.86, 2.96, 3.06, NA, 3.69, 1.70, -8.01, 0.78, -6.0...
## $ j69 <dbl> NA, -4.03, NA, 1.31, NA, NA, 5.00, -8.01, 8.06, -9.71, -3....
## $ j70 <dbl> NA, 3.06, NA, -3.64, NA, NA, 7.18, NA, -0.05, 0.44, NA, NA...
## $ j71 <dbl> NA, 2.91, NA, NA, NA, NA, 6.75, NA, 8.50, -2.14, NA, NA, N...
## $ j72 <dbl> NA, 5.44, 4.56, 2.43, NA, NA, 4.71, NA, 8.88, -9.56, NA, N...
## $ j73 <dbl> NA, -4.76, NA, NA, NA, NA, 3.20, NA, 4.27, -9.71, NA, NA, ...
## $ j74 <dbl> NA, -8.93, NA, NA, NA, NA, 2.23, NA, 9.08, -9.27, NA, NA, ...
## $ j75 <dbl> NA, -0.49, NA, NA, NA, NA, 4.47, NA, 8.54, -9.08, NA, NA, ...
## $ j76 <dbl> NA, 8.16, NA, NA, -0.05, NA, 3.50, -4.32, 6.17, -8.01, NA,...
## $ j77 <dbl> NA, -3.16, NA, NA, NA, NA, 5.97, NA, 8.74, -9.61, NA, NA, ...
## $ j78 <dbl> NA, 0.83, NA, NA, NA, NA, 0.44, NA, 3.20, -9.61, NA, NA, N...
## $ j79 <dbl> NA, 1.94, NA, -4.66, NA, NA, 4.51, NA, 8.50, -9.56, NA, NA...
## $ j80 <dbl> NA, -0.49, NA, NA, NA, 1.80, 7.14, NA, 0.92, 4.61, 9.13, 8...
## $ j81 <dbl> NA, 2.62, NA, NA, NA, NA, 5.00, -6.21, 9.13, -0.29, -9.71,...
## $ j82 <dbl> NA, 1.99, NA, NA, NA, NA, 8.98, NA, 8.54, -9.56, NA, NA, N...
## $ j83 <dbl> NA, -3.11, NA, NA, NA, NA, 5.39, NA, 6.36, -3.45, NA, NA, ...
## $ j84 <dbl> NA, -0.44, NA, NA, NA, NA, 4.27, NA, 9.03, -9.71, NA, NA, ...
## $ j85 <dbl> NA, -6.50, NA, NA, NA, NA, 4.47, NA, 9.03, 1.12, NA, NA, N...
## $ j86 <dbl> NA, 2.04, NA, NA, NA, NA, 3.20, NA, 3.54, -6.50, 9.03, NA,...
## $ j87 <dbl> NA, -3.16, NA, NA, NA, NA, 4.56, NA, 4.66, -6.94, -5.83, N...
## $ j88 <dbl> 1.21, 2.09, NA, NA, NA, NA, 4.22, 1.65, 8.93, -5.10, NA, N...
## $ j89 <dbl> NA, 5.34, 2.38, NA, NA, NA, 3.83, NA, 6.46, -6.46, NA, NA,...
## $ j90 <dbl> NA, 5.73, NA, NA, NA, NA, 1.89, NA, 6.75, 8.59, NA, NA, NA...
## $ j91 <dbl> NA, -6.70, NA, NA, NA, NA, 3.54, NA, 8.74, -3.69, NA, NA, ...
## $ j92 <dbl> NA, 1.99, NA, NA, NA, NA, 2.38, NA, 9.17, -1.84, NA, NA, N...
## $ j93 <dbl> NA, 2.62, NA, NA, NA, 1.89, 2.77, NA, 7.52, 3.11, NA, 7.04...
## $ j94 <dbl> NA, -0.49, 3.16, NA, NA, NA, -0.24, NA, 5.73, 0.87, NA, NA...
## $ j95 <dbl> -5.92, 3.45, NA, NA, NA, NA, 2.28, NA, 5.92, 1.02, NA, NA,...
## $ j96 <dbl> NA, 3.20, NA, NA, NA, NA, 5.05, 1.60, -6.60, 4.17, NA, NA,...
## $ j97 <dbl> NA, -0.53, NA, NA, NA, NA, 4.51, NA, 0.24, -6.55, NA, NA, ...
## $ j98 <dbl> NA, -0.53, NA, NA, NA, NA, 4.08, NA, 9.08, -8.93, NA, NA, ...
## $ j99 <dbl> NA, -2.96, NA, NA, NA, NA, 2.96, NA, 8.98, -9.42, NA, NA, ...

# create raw ratings matrix 
r <- as.matrix(df) %>% as("realRatingMatrix")
r

## 23500 x 100 rating matrix of class 'realRatingMatrix' with 1708993 ratings.

getRatingMatrix(r)[1:10, 1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'j0', 'j1', 'j2' ... ]]

##                                                               
## u0  .     8.11  .     .    -2.28 -4.22  5.49 -2.62  .    -2.28
## u1 -4.37 -3.88  0.73 -3.20 -6.41  1.17  7.82 -4.76 -6.41  0.73
## u2  .     .     .     .     0.73  .     5.53  3.25  .     .   
## u3  0.34 -6.55  2.86  .    -3.64  1.12  5.34  2.33  .     2.33
## u4  .     .     .     .     9.13  .    -9.32 -2.04  .     .   
## u5  .     .     .     .    -1.46  .     2.72 -3.83  .     .   
## u6  3.50  2.28  3.01 -0.63  4.95 -0.49 -0.49 -0.68  3.30 -1.41
## u7 -7.67  7.14 -6.31  .    -9.32  6.07  3.59 -2.72  .    -8.01
## u8  1.02  6.07  6.65 -0.87  6.80  0.68  5.34  4.56 -0.39  8.25
## u9  3.64  1.60 -0.39  3.74 -6.36 -9.61  8.88 -4.08 -7.09  3.88

For comparison, we also create the normalized forms of the rating matrix, using centering and Z-score methods for normalization.

# create normalized ratings matrix
r_n1 <- normalize(r)
r_n1

## 23500 x 100 rating matrix of class 'realRatingMatrix' with 1708993 ratings.
## Normalized using center on rows.

getRatingMatrix(r_n1)[1:10, 1:10] %>% round(2)

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'j0', 'j1', 'j2' ... ]]

##                                                                
## u0  .     8.17  .     .    -2.22 -4.16   5.55 -2.56  .    -2.22
## u1 -4.09 -3.60  1.01 -2.92 -6.13  1.45   8.10 -4.48 -6.13  1.01
## u2  .     .     .     .    -2.17  .      2.63  0.35  .     .   
## u3 -0.18 -7.07  2.34  .    -4.16  0.60   4.82  1.81  .     1.81
## u4  .     .     .     .     5.65  .    -12.80 -5.52  .     .   
## u5  .     .     .     .    -2.79  .      1.39 -5.16  .     .   
## u6  1.39  0.17  0.90 -2.74  2.84 -2.60  -2.60 -2.79  1.19 -3.52
## u7 -4.87  9.94 -3.51  .    -6.52  8.87   6.39  0.08  .    -5.21
## u8 -4.57  0.48  1.06 -6.46  1.21 -4.91  -0.25 -1.03 -5.98  2.66
## u9  7.07  5.03  3.04  7.17 -2.93 -6.18  12.31 -0.65 -3.66  7.31

r_n2 <- normalize(r, method = "Z-score")
r_n2

## 23500 x 100 rating matrix of class 'realRatingMatrix' with 1708993 ratings.
## Normalized using z-score on rows.

getRatingMatrix(r_n2)[1:10, 1:10] %>% round(2)

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'j0', 'j1', 'j2' ... ]]

##                                                               
## u0  .     1.57  .     .    -0.43 -0.80  1.07 -0.49  .    -0.43
## u1 -1.02 -0.90  0.25 -0.73 -1.54  0.36  2.03 -1.12 -1.54  0.25
## u2  .     .     .     .    -1.23  .     1.49  0.20  .     .   
## u3 -0.05 -2.09  0.69  .    -1.23  0.18  1.42  0.53  .     0.53
## u4  .     .     .     .     1.27  .    -2.87 -1.24  .     .   
## u5  .     .     .     .    -1.08  .     0.54 -2.00  .     .   
## u6  0.51  0.06  0.33 -1.01  1.05 -0.96 -0.96 -1.03  0.44 -1.30
## u7 -0.76  1.54 -0.54  .    -1.01  1.38  0.99  0.01  .    -0.81
## u8 -1.20  0.13  0.28 -1.70  0.32 -1.29 -0.07 -0.27 -1.57  0.70
## u9  1.41  1.00  0.61  1.43 -0.59 -1.23  2.45 -0.13 -0.73  1.46

We use image to visually represent portions of the raw and normalized ratings matrices.

# show part of ratings matrices
image(r[1:25, 1:50], main = "Raw Ratings")

image(r_n1[1:25, 1:50], main = "Normalized Ratings - Centered")

image(r_n1[1:25, 1:50], main = "Normalized Ratings - Z-Score")

4 Exploratory data analysis

4.1 Distribution of overall ratings

Next we do some preliminary data analysis. Reviewing the distributions of the raw and normalized ratings, we see that the mean and median ratings are 0.8 and 1.4, respectively. As expected, the normalized ratings are centered at 0, and they are approximately normally distributed, as seen in the histogram of rating Z-scores. It is interesting that the ratings distribution is skewed to the right, with a higher frequency of ratings above 0 than ratings below 0. This is true even after normalizing the ratings, which suggests a significant user bias toward ratings above 0. Perhaps some people are too polite to give bad jokes a negative rating?

# distribution of ratings
par(mfrow = c(1, 3))
hist(getRatings(r), main = "Raw Ratings")
hist(getRatings(r_n1), main = "Norm. Ratings - Centered")
hist(getRatings(r_n2), prob = TRUE, xlim = c(-5, 5), breaks = 40, 
     main = "Norm. Ratings - Z-Score")
curve(dnorm(x), add = TRUE, col = "blue")

par(mfrow = c(1,1))
summary(getRatings(r))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -9.9500 -3.2000  1.3600  0.7573  5.1000 10.0000

summary(getRatings(r_n1))

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -18.7149  -2.9704   0.4446   0.0000   3.1672  18.2654

summary(getRatings(r_n2))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -9.3382 -0.7244  0.1254  0.0000  0.7520  8.8608

4.2 Distribution of rating count and mean rating per user

Turning to ratings per user, we can see that the number of jokes rated per user averages about 72-73. There are two modes in the distribution, from users who rated all 100 jokes and from those who rated 70-74 jokes. Also there appears to be a slight mode at the minimum of 36; this probably arises from the user cutoff of 36 ratings applied when creating the dataset (the Jester data description indicates that only users with at least 36 ratings are included). In addition, the average raw rating per user is approximately 1 and ranges from -10 to 9; as expected the average normalized ratings per user is 0.

# distribution of rating count & avg rating per user
hist(rowCounts(r), breaks = 40, main = "Distribution of Rating Count per User")

summary(rowCounts(r))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   36.00   53.00   72.00   72.72  100.00  100.00

par(mfrow = c(1,3))
hist(rowMeans(r), main = "Avg. Rating per User")
hist(rowMeans(r_n1), main = "Avg. Rating (Centered) per User")
hist(rowMeans(r_n2), main = "Avg. Rating (Z-Score) per User")

par(mfrow = c(1,1))
summary(rowMeans(r))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -9.9474 -0.8841  0.8597  0.7576  2.5702  9.3133

summary(rowMeans(r_n1))

##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1.460e-14 -2.842e-16  0.000e+00  4.248e-18  2.961e-16  1.296e-14

summary(rowMeans(r_n2))

##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -6.399e-13 -6.550e-17  2.000e-19 -2.500e-17  6.990e-17  8.982e-14

4.3 Distribution of rating count and mean rating per joke

Reviewing the rating matrix along the second dimension (by joke), we can see that the average number of ratings per joke is 17,090 (mean) or 18,648 (median). The distribution appears bi-modal, with many jokes receiving under 11,000 ratings and some jokes receiving well over 20,000 ratings (with a particular spike from jokes rated by virtually all 23,500 users). There are no jokes in the range of 11,000 to 14,000 ratings. Further, the average rating per joke is about 1, with a high of 3.5 and low of -3.6. As expected, the average normalized rating per joke is approximately 0.

# distribution of rating count & avg rating per country
hist(colCounts(r), breaks = 40, main = "Distribution of Rating Count per Joke")

summary(colCounts(r))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8164    9732   18648   17090   23329   23499

par(mfrow = c(1,3))
hist(colMeans(r), main = "Avg. Rating per Joke")
hist(colMeans(r_n1), main = "Avg. Rating (Centered) per Joke")
hist(colMeans(r_n2), main = "Avg. Rating (Z-Score) per Joke")

par(mfrow = c(1,1))
summary(colMeans(r))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -3.5738 -0.4375  0.9445  0.7163  1.8245  3.4593

summary(colMeans(r_n1))

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -4.25191 -1.19483  0.20238 -0.06176  0.97356  2.69231

summary(colMeans(r_n2))

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.95941 -0.25569  0.05488 -0.01312  0.22596  0.60609

We can see that the most frequently rated jokes were rated by virtually all 23,500 users.

# top N most frequently rated jokes
N <- 10
t <- sort(colCounts(r), decreasing = TRUE) %>% head(N)
topN <- cbind(Rank = 1:N, 
              Joke = names(t), 
              Frequency = t
)
rownames(topN) <- NULL
kable(topN, caption = "Top 10 Most Frequently Rated Jokes")

Top 10 Most Frequently Rated Jokes
Rank	Joke	Frequency
1	j12	23499
2	j14	23499
3	j16	23499
4	j35	23499
5	j49	23499
6	j4	23498
7	j19	23498
8	j52	23498
9	j6	23497
10	j7	23497

On the other hand, the least frequently rated jokes were rated by under 9,000 users.

# top N least frequently rated jokes
N <- 10
t <- sort(colCounts(r), decreasing = FALSE) %>% head(N)
topN <- cbind(Rank = 1:N, 
              Joke = names(t), 
              Frequency = t
)
rownames(topN) <- NULL
kable(topN, caption = "10 Least Frequently Rated Jokes")

10 Least Frequently Rated Jokes
Rank	Joke	Frequency
1	j70	8164
2	j72	8231
3	j71	8288
4	j73	8392
5	j74	8393
6	j77	8494
7	j75	8513
8	j76	8551
9	j78	8586
10	j79	8643

Finally, we summarize the top 10 most highly rated jokes based on the raw ratings as well as the normalized ratings (for both centering and Z-scores). Note that the top 10 lists are almost identical regardless of rating method; only a few jokes move slightly in position.

# top N most highly rated jokes
N <- 10
t <- sort(colMeans(r), decreasing = TRUE) %>% head(N)
t1 <- sort(colMeans(r_n1), decreasing = TRUE) %>% head(N)
t2 <- sort(colMeans(r_n2), decreasing = TRUE) %>% head(N)
topN <- cbind(Rank = 1:N, 
              Joke = names(t), Raw = round(t, 2), 
              Joke1 = names(t1), Centered = round(t1, 2), 
              Joke2 = names(t2), "Z-Score" = round(t2, 2) 
)
rownames(topN) <- NULL
kable(topN, caption = "Top 10 Most Highly Rated Jokes (Based on Raw, Centered, and Z-Score Ratings)")

Top 10 Most Highly Rated Jokes (Based on Raw, Centered, and Z-Score Ratings)
Rank	Joke	Raw	Joke1	Centered	Joke2	Z-Score
1	j88	3.46	j49	2.69	j49	0.61
2	j49	3.45	j88	2.55	j88	0.58
3	j31	3	j31	2.24	j31	0.5
4	j26	2.95	j26	2.19	j26	0.5
5	j34	2.94	j34	2.18	j35	0.49
6	j35	2.92	j35	2.16	j34	0.49
7	j61	2.66	j61	1.9	j28	0.43
8	j71	2.66	j28	1.9	j61	0.43
9	j28	2.65	j52	1.78	j52	0.4
10	j52	2.53	j71	1.75	j48	0.39

Here are the top three jokes:

Joke 88: “A Czechoslovakian man felt his eyesight was growing steadily worse, and felt it was time to go see an optometrist. The doctor started with some simple testing, and showed him a standard eye chart with letters of diminishing size: CRKBNWXSKZY … ‘Can you read this?’ the doctor asked. ‘Read it?’ the Czech answered. ‘Doc, I know him!’”
Joke 49: “Three engineering students were gathered together discussing the possible designers of the human body. One said, ‘It was a mechanical engineer. Just look at all the joints.’ Another said, ‘No, it was an electrical engineer. The nervous systems many thousands of electrical connections.’ The last said, ‘Actually it was a civil engineer. Who else would run a toxic waste pipeline through a recreational area?’”
Joke 31: “President Clinton looks up from his desk in the Oval Office to see one of his aides nervously approach him. ‘What is it?’ exclaims the President. ‘It’s this Abortion Bill Mr. President, what do you want to do about it?’ the aide replies. ‘Just go ahead and pay it.’ responds the President.”

Likewise, the lowest rated jokes are consistent regardless of rating method.

# top N lowest-rated jokes
N <- 10
t <- sort(colMeans(r), decreasing = FALSE) %>% head(N)
t1 <- sort(colMeans(r_n1), decreasing = FALSE) %>% head(N)
t2 <- sort(colMeans(r_n2), decreasing = FALSE) %>% head(N)
topN <- cbind(Rank = 1:N, 
              Joke = names(t), Raw = round(t, 2), 
              Joke1 = names(t1), Centered = round(t1, 2), 
              Joke2 = names(t2), "Z-Score" = round(t2, 2) 
)
rownames(topN) <- NULL
kable(topN, caption = "10 Lowest-Rated Jokes (Based on Raw, Centered, and Z-Score Ratings)")

10 Lowest-Rated Jokes (Based on Raw, Centered, and Z-Score Ratings)
Rank	Joke	Raw	Joke1	Centered	Joke2	Z-Score
1	j57	-3.57	j57	-4.25	j57	-0.96
2	j15	-2.89	j15	-3.64	j15	-0.87
3	j14	-2.18	j14	-2.94	j14	-0.7
4	j43	-1.87	j43	-2.54	j12	-0.58
5	j56	-1.81	j56	-2.5	j43	-0.56
6	j23	-1.78	j12	-2.5	j56	-0.55
7	j12	-1.74	j23	-2.41	j23	-0.52
8	j36	-1.41	j73	-2.3	j73	-0.52
9	j32	-1.41	j32	-2.1	j16	-0.49
10	j73	-1.4	j36	-2.1	j19	-0.48

Here are the three worst-rated jokes in the dataset:

Joke 57: “Why are there so many Jones’s in the phone book? Because they all have phones.”
Joke 15: “Q: What did the blind person say when given some matzah? A: Who the hell wrote this?”
Joke 14: “The father was very anxious to marry off his only daughter so he wanted to impress her date. ‘Do you like to screw,’ he says. ‘Huh’ replied the surprised first date. ‘My daughter she loves to screw and she’s good at it, you and her should go screw,’ carefully explained the father. Now very interested the boy replied, ‘Yes, sir.’ Minutes later the girl came down the stairs, kissed her father goodbye and the couple left. After only a few minutes she reappeared, furious, dress torn, hair a mess and screamed ‘Dammit, Daddy, it’s the TWIST, get it straight!’”

5 Model development

5.1 Sample the dataset

Because of the large size of the dataset, we create a smaller more manageable dataset with random sampling. This allows more efficient model development on my PC. We randomly sample 2,000 users from the dataset, which results in a 2,000-by-100 rating matrix with 145,509 ratings. Again the rating matrix is roughly 73% dense (27% sparse), out of 200,000 possible ratings.

# random sample of dataset
set.seed(42)
n_samp <- 2000
idx <- sort(sample(1:nrow(r), n_samp))
rs <- r[idx]
rs

## 2000 x 100 rating matrix of class 'realRatingMatrix' with 145509 ratings.

The distributions of rating count per user and per joke are similar to the prior distributions (seen above in the exploratory data analysis), other than the fact that we have a smaller set of 2,000 users rather than 23,500.

hist(rowCounts(rs), breaks = 40, main = "Distribution of Rating Count per User")

summary(rowCounts(rs))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   36.00   53.00   72.00   72.75  100.00  100.00

hist(colCounts(rs), breaks = 20, main = "Distribution of Rating Count per Joke")

summary(colCounts(rs))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     674     827    1597    1455    1988    2000

5.2 Create and test sample recommender models

Now we test the recommenderlab infrastructure, by creating and testing sample recommender algorithms.

5.2.1 Top-N lists

We start by creating a sample recommender system using the “POPULAR” algorithm (see list of prediction algorithms below). First we show top 5 and top 3 recommended jokes for various users. Note that the top-N lists are returned already sorted in descending order.

## create recommender, show sample outputs
rec <- Recommender(rs, method = "POPULAR")
rec

## Recommender of type 'POPULAR' for 'realRatingMatrix' 
## learned using 2000 users.

names(getModel(rec))

## [1] "topN"                  "ratings"               "normalize"            
## [4] "aggregationRatings"    "aggregationPopularity" "verbose"

# sample predicted top5 list, for 5 users
rec_top5 <- predict(rec, rs[51:55], n = 5)
rec_top5

## Recommendations as 'topNList' with n = 5 for 5 users.

as(rec_top5, "list")

## $u529
## [1] "j88" "j71" "j92" "j75" "j27"
## 
## $u556
## [1] "j88" "j71" "j92" "j75" "j87"
## 
## $u563
## [1] "j88" "j71" "j92" "j75" "j87"
## 
## $u583
## [1] "j88" "j11" "j71" "j37" "j92"
## 
## $u586
## [1] "j88" "j71" "j92" "j75" "j87"

# sample predicted top3 list, for 5 users
rec_top3 <- predict(rec, rs[51:55], n = 3)
rec_top3

## Recommendations as 'topNList' with n = 3 for 5 users.

as(rec_top3, "list")

## $u529
## [1] "j88" "j71" "j92"
## 
## $u556
## [1] "j88" "j71" "j92"
## 
## $u563
## [1] "j88" "j71" "j92"
## 
## $u583
## [1] "j88" "j11" "j71"
## 
## $u586
## [1] "j88" "j71" "j92"

5.2.2 Predicted ratings

Second, we show predicted ratings for various users. In the ratings version, the users’ actual ratings show up as “NA”, but they are included in the ratingMatrix version. Also note that the ratings are normalized using centering on rows.

# sample predicted ratings (NA for actual ratings), for 5 users
rec_rtg <- predict(rec, rs[51:55], type = "ratings")
rec_rtg

## 5 x 100 rating matrix of class 'realRatingMatrix' with 244 ratings.

as(rec_rtg, "matrix")[ , 1:10] %>% round(2)

##         j0    j1    j2    j3 j4    j5 j6 j7    j8    j9
## u529 -0.34 -1.13 -1.25 -2.73 NA -0.26 NA NA -1.93    NA
## u556    NA    NA    NA    NA NA    NA NA NA    NA    NA
## u563 -1.31    NA -2.23 -3.71 NA    NA NA NA -2.91 -1.28
## u583 -3.48 -4.27 -4.40 -5.88 NA -3.40 NA NA -5.08 -3.45
## u586  1.31    NA    NA -1.08 NA  1.40 NA NA -0.28    NA

# sample predicted ratings (with actual ratings), for 5 users
rec_rtg1 <- predict(rec, rs[51:55], type = "ratingMatrix")
rec_rtg1

## 5 x 100 rating matrix of class 'realRatingMatrix' with 500 ratings.

as(rec_rtg1, "matrix")[ , 1:10] %>% round(2)

##         j0    j1    j2    j3    j4    j5     j6    j7    j8    j9
## u529 -0.34 -1.13 -1.25 -2.73 -0.27 -0.26  -2.17 -3.53 -1.93  2.54
## u556  1.24 -0.46  0.36 -3.71 -6.92 -3.96  -3.52 -2.31  0.51 -4.59
## u563 -1.31 -6.73 -2.23 -3.71  3.80  7.93  10.26 -2.22 -2.91 -1.28
## u583 -3.48 -4.27 -4.40 -5.88  0.34 -3.40  -2.91  6.46 -5.08 -3.45
## u586  1.31 -5.90 -8.77 -1.08  5.21  1.40 -10.81  3.66 -0.28 -7.94

5.2.3 Evaluation of predicted ratings

Third, we measure and compare error metrics of predicted ratings from competing models. Below we compare the UBCF (user-based collaborative filtering) and IBCF (item-based collaborative filtering) algorithms. To do this, we set up an evaluation scheme based on the following criteria:

90/10 split of the data into training and holdout / test samples.
For the holdout sample, the model is given 20 ratings (“given-20 protocol”) and then it predicts ratings; recall that for this dataset, each user has rated at least 36 jokes.
For predicted ratings, a rating of 5 or higher (on a scale from -10.0 to +10.0) will be classified as a good rating.

Calculated error metrics include the root mean squared error (RMSE), the mean squared error (MSE), and the mean average error (MAE).

# sample errors of predicted ratings: UBCF vs IBCF
es <- evaluationScheme(rs, method = "split", train = 0.9, given = 20, goodRating = 5)
rec1 <- Recommender(getData(es, "train"), "UBCF")
rec1

## Recommender of type 'UBCF' for 'realRatingMatrix' 
## learned using 1800 users.

rec2 <- Recommender(getData(es, "train"), "IBCF")
rec2

## Recommender of type 'IBCF' for 'realRatingMatrix' 
## learned using 1800 users.

pred1 <- predict(rec1, getData(es, "known"), type = "ratings")
pred1

## 200 x 100 rating matrix of class 'realRatingMatrix' with 16000 ratings.

pred2 <- predict(rec2, getData(es, "known"), type = "ratings")
pred2

## 200 x 100 rating matrix of class 'realRatingMatrix' with 15994 ratings.

error <- rbind(
    UBCF = calcPredictionAccuracy(pred1, getData(es, "unknown")),
    IBCF = calcPredictionAccuracy(pred2, getData(es, "unknown"))
)
error %>% round(3) %>% kable(caption = "Evaluation of Rating Prediction Errors: UBCF vs. IBCF")

Evaluation of Rating Prediction Errors: UBCF vs. IBCF
	RMSE	MSE	MAE
UBCF	4.523	20.454	3.584
IBCF	5.107	26.079	4.018

5.2.4 Evaluation of top-N recommender system

Finally, we evaluate a recommender system with 3 measures of model performance:

Confusion matrix (contingency table)
Receiver Operator Characteristic (ROC) curve
Precision-recall plot.

We illustrate with a top-N recommender system using the UBCF algorithm under the following evaluation scheme:

Cross validation with k=4 folds
Given-20 protocol (same as before, i.e., for each user in the holdout sample, the model will be given 20 ratings and then will predict ratings)
Predicted ratings of >= 5 are classified as good ratings (same as before)
N varies from 1 to 80 for the top-N lists (recall that the dataset includes a total of 100 jokes).

# sample evaluation of top-N recommender algorithm: UBCF
es1 <- evaluationScheme(rs, method = "cross", k = 4, given = 20, goodRating = 5)
es1

## Evaluation scheme with 20 items given
## Method: 'cross-validation' with 4 run(s).
## Good ratings: >=5.000000
## Data set: 2000 x 100 rating matrix of class 'realRatingMatrix' with 145509 ratings.

results <- evaluate(es1, method = "UBCF", type = "topNList", 
                    n = c(1, 5, 10, 20, 30, 40, 50, 60, 80))

## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/1.03sec] 
##   2  [0.01sec/1.27sec] 
##   3  [0.02sec/1.05sec] 
##   4  [0.01sec/1.05sec]

results

## Evaluation results for 4 folds/samples using method 'UBCF'.

avg(results) %>% round(3) %>% 
    kable(caption = "Confusion Matrix (k=4): Top-N Lists for N=1 to 100")

Confusion Matrix (k=4): Top-N Lists for N=1 to 100
	TP	FP	FN	TN	precision	recall	TPR	FPR
1	0.414	0.586	13.160	65.839	0.414	0.042	0.042	0.008
5	1.924	3.075	11.650	63.350	0.385	0.193	0.193	0.045
10	3.543	6.457	10.032	59.968	0.354	0.339	0.339	0.094
20	5.878	14.122	7.696	52.303	0.294	0.512	0.512	0.207
30	7.468	22.532	6.107	43.894	0.249	0.613	0.613	0.333
40	8.834	31.166	4.741	35.259	0.221	0.698	0.698	0.462
50	10.076	39.924	3.498	26.502	0.202	0.770	0.770	0.594
60	11.293	48.707	2.281	17.719	0.188	0.840	0.840	0.727
80	13.575	66.425	0.000	0.000	0.170	1.000	1.000	1.000

plot(results, annotate = TRUE, main = "ROC Curve for UBCF Method")

plot(results, "prec/rec", annotate = TRUE, main = "Precision-Recall Plot for UBCF Method")

5.3 Develop recommender models

In order to develop a recommender system for the Jester dataset, we need to decide which recommendation algorithm to use. The recommenderlab package offers a variety of algorithms to choose from. For this project, we consider the 5 algorithms below, some of which depend on certain parameters:

RANDOM: randomly chosen items
POPULAR: item popularity
IBCF: item-based collaborative filtering; depends on parameter k, similarity measure, and normalization method (defaults are 30, cosine, and center, respectively)
UBCF: user-based collaborative filtering; depends on parameter nn, similarity measure, and normalization method (defaults are 25, cosine, and center, respectively)
SVD: singular value decomposition method; depends on parameter k and normalization method (defaults are 10 and center).

5.3.1 Determine optimal parameters

Note that the IBCF, UBCF, and SVD algorithms each depend on one or more input parameters. What are the optimal parameter choices for each algorithm? In order to determine this, we evaluate the models using ROC curves and precision-recall plots for predicted top-N lists under a range of values for the k and nn parameters and choices for the similarity measure; we did not test normalization method (centering vs. Z-score). Based on this analysis, we choose the following parameters:

IBCF: k = 5 and Pearson similarity (with centering normalization)
UBCF: nn = 50 and Pearson similarity (with centering normalization)
SVD: k = 10 (with centering normalization).

For development of the recommender models, we use an evaluation scheme similar to those used previously:

90/10 split of training and holdout samples
Given-20 protocol
Predicted ratings >= 5 are classified as good ratings.

## compare recommender algorithms
scheme <- evaluationScheme(rs, method = "split", train = 0.9, given = 20, goodRating = 5)
scheme

## Evaluation scheme with 20 items given
## Method: 'split' with 1 run(s).
## Training set proportion: 0.900
## Good ratings: >=5.000000
## Data set: 2000 x 100 rating matrix of class 'realRatingMatrix' with 145509 ratings.

# compare IBCF for various k, similarity=cosine (default)
alg_ic <- list(
    "ib5" = list(name="IBCF", param = list(k = 5)),
    "ib15" = list(name="IBCF", param = list(k = 15)),
    "ib30" = list(name="IBCF", param = list(k = 30)),
    "ib50" = list(name="IBCF", param = list(k = 50))
)
results_ic <- evaluate(scheme, alg_ic, type = "topNList", 
                      n = c(1, 5, 10, 20, 30, 40, 50, 60, 80))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.14sec/0.03sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.14sec/0.06sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.13sec/0.04sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.13sec/0.04sec]

# compare IBCF for various k, similarity=pearson
alg_ip <- list(
    "ib5" = list(name="IBCF", param = list(k = 5, method="pearson")),
    "ib15" = list(name="IBCF", param = list(k = 15, method="pearson")),
    "ib30" = list(name="IBCF", param = list(k = 30, method="pearson")),
    "ib50" = list(name="IBCF", param = list(k = 50, method="pearson"))
)
results_ip <- evaluate(scheme, alg_ip, type = "topNList", 
                      n = c(1, 5, 10, 20, 30, 40, 50, 60, 80))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.17sec/0.03sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.17sec/0.04sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.18sec/0.04sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.17sec/0.04sec]

par(mfrow = c(1, 2))
# plot ROC curves
plot(results_ic, annotate = 2, legend = "bottomright")
title(main = "ROC: IBCF-Cosine")
plot(results_ip, annotate = 2, legend = "bottomright")
title(main = "ROC: IBCF-Pearson")

# plot precision-recall plots
plot(results_ic, "prec/rec", annotate = 2, legend = "bottomright", ylim = c(0, 0.3)) 
title(main = "Prec-Recall: IBCF-Cosine")
plot(results_ip, "prec/rec", annotate = 2, legend = "bottomright", ylim = c(0, 0.3)) 
title(main = "Prec-Recall: IBCF-Pearson")

# compare UBCF for various nearest neighbors, similarity=cosine (default)
alg_uc <- list(
    "ub5" = list(name="UBCF", param = list(nn = 5)),
    "ub15" = list(name="UBCF", param = list(nn = 15)),
    "ub25" = list(name="UBCF", param = list(nn = 25)),
    "ub50" = list(name="UBCF", param = list(nn = 50))
    )
results_uc <- evaluate(scheme, alg_uc, type = "topNList", 
                      n = c(1, 5, 10, 20, 30, 40, 50, 60, 80))

## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.43sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.44sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.01sec/0.44sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.46sec]

# compare UBCF for various nearest neighbors, similarity=pearson
alg_up <- list(
    "ub5" = list(name="UBCF", param = list(nn = 5, method="pearson")),
    "ub15" = list(name="UBCF", param = list(nn = 15, method="pearson")),
    "ub25" = list(name="UBCF", param = list(nn = 25, method="pearson")),
    "ub50" = list(name="UBCF", param = list(nn = 50, method="pearson"))
    )
results_up <- evaluate(scheme, alg_up, type = "topNList", 
                      n = c(1, 5, 10, 20, 30, 40, 50, 60, 80))

## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.48sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.03sec/0.45sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.49sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.51sec]

# plot ROC curves
plot(results_uc, annotate = 4, legend = "bottomright")
title(main = "ROC: UBCF-Cosine")
plot(results_up, annotate = 4, legend = "bottomright")
title(main = "ROC: UBCF-Pearson")

# plot precision-recall plots
plot(results_uc, "prec/rec", annotate = 4, legend = "bottomleft", ylim = c(0, 0.4))
title(main = "Prec-Recall: UBCF-Cosine")
plot(results_up, "prec/rec", annotate = 4, legend = "bottomleft", ylim = c(0, 0.4))
title(main = "Prec-Recall: UBCF-Pearson")

# compare SVD for various k
alg_s <- list(
    "svd3" = list(name="SVD", param = list(k = 3)),
    "svd5" = list(name="SVD", param = list(k = 5)),
    "svd10" = list(name="SVD", param = list(k = 10)),
    "svd20" = list(name="SVD", param = list(k = 20))
)
results_s <- evaluate(scheme, alg_s, type = "topNList", 
                      n = c(1, 5, 10, 20, 30, 40, 50, 60, 80))

## SVD run fold/sample [model time/prediction time]
##   1  [0.05sec/0.03sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.05sec/0.03sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.04sec/0.03sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.08sec/0.04sec]

par(mfrow = c(1, 1))
plot(results_s, annotate = 3, legend = "bottomright")
title(main = "ROC Curve: SVD")

plot(results_s, "prec/rec", annotate = 3, legend = "bottomleft", ylim = c(0, 0.4))
title(main = "Precision-Recall Plot: SVD")

5.3.2 Compare the algorithms

Having chosen the algorithm parameters, now we can evaluate model performance using the 5 algorithms. In our case, predictor systems based on the UBCF and POPULAR algorithms produce the best performance, as measured by ROC curves, precision-recall plots, and error metrics.

# now compare different algorithms with optimal parameters
algorithms <- list(
    "random items" = list(name="RANDOM", param = NULL),
    "popular items" = list(name="POPULAR", param = NULL),
    "item-based CF" = list(name="IBCF", param = list(k = 5, method = "pearson")),
    "user-based CF" = list(name="UBCF", param = list(nn = 50, method = "pearson")),
    "SVD approximation" = list(name="SVD", param = list(k = 10))
)
# run algorithms for top-N list
results <- evaluate(scheme, algorithms, type = "topNList", 
                    n = c(1, 5, 10, 20, 30, 40, 50, 60, 80))

## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.06sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  [0.02sec/0.31sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.16sec/0.03sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.45sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.04sec/0.04sec]

results

## List of evaluation results for 5 recommenders:
## Evaluation results for 1 folds/samples using method 'RANDOM'.
## Evaluation results for 1 folds/samples using method 'POPULAR'.
## Evaluation results for 1 folds/samples using method 'IBCF'.
## Evaluation results for 1 folds/samples using method 'UBCF'.
## Evaluation results for 1 folds/samples using method 'SVD'.

plot(results, annotate = 4, legend = "bottomright")
title(main = "Comparison of ROC Curves")

plot(results, "prec/rec", annotate = 4, legend = "topright", ylim = c(0, 0.4))
title(main = "Comparison of Precision-Recall Plots")

# run algorithms for predicted ratings
results <- evaluate(scheme, algorithms, type = "ratings")

## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.01sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  [0.02sec/0.01sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.16sec/0.01sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.44sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.05sec/0.02sec]

results

## List of evaluation results for 5 recommenders:
## Evaluation results for 1 folds/samples using method 'RANDOM'.
## Evaluation results for 1 folds/samples using method 'POPULAR'.
## Evaluation results for 1 folds/samples using method 'IBCF'.
## Evaluation results for 1 folds/samples using method 'UBCF'.
## Evaluation results for 1 folds/samples using method 'SVD'.

plot(results)
title(main = "Comparison of Model Error vs. Actual Ratings")

plot(results, ylim = c(0, 10))
title(main = "Same Comparison: Close-up")

6 Summary of findings

The recommenderlab package offers a wide variety of tools and algorithms to test and develop recommender models. In this project we explored some of these features using a subset of the Jester dataset. Some conclusions from our work include:

Developing and testing recommender models can be computationally intensive. From the initial dataset of 23,500 users (with 1.71 million ratings), I used random sampling to reduce the dataset to a subset of 2,000 users (with 145,509 ratings). This allowed a more time-efficient and manageable development process on my home PC.
For both the IBCF and UBCF algorithms, Pearson similarity out-performed cosine similarity for this dataset. It would be interesting to see if this conclusion still holds under different evaluation schemes.
Out of the 5 algorithms considered, UBCF and POPULAR were the top-performing algorithms, as measured by ROC, precision-recall, and error metrics. It is disconcerting that simply choosing the most popular items works just as well as our collaborative filtering model. Perhaps the UBCF model can be improved for higher nearest-neighbor values.
In the final analysis, the best models (UBCF and POPULAR) are still not particularly good. For instance, the RMSE and MAE for both models are in the 4.0-4.5 range, meaning that model-predicted ratings are accurate to within \(\pm\) 4 to \(\pm\) 4.5 points on average, which is a spread of 8-9 points out of a total rating scale of 20 points (from -10 to +10).

Some suggestions for future work include:

Test different normalization methods for the IBCF, UBCF, and SVD algorithms, and assess the impact of normalization choice on model performance. The recommenderlab documentation indicates that centering and Z-score are the only methods currently implemented.
Explore hybrid models using the HybridRecommender function in recommenderlab, and evaluate their performance vs. the standalone algorithms. For instance, try combinations of UBCF, IBCF, and SVD to see if a hybrid model can out-perform POPULAR.
Consider alternative evaluation schemes. To what extent do our conclusions on model performance depend on the evaluation framework? For instance, would the rank ordering of the models be the same had we chosen an “All-but-5” protocol for the holdout set instead of the “Given 20” protocol?

DATA 612 Project 2

Content-Based and Collaborative Filtering

Kevin Benson

June 18, 2019