DATA643 Project4: Accuracy and Beyond

Kelly Shaffer, Yun Mai
June 29, 2017

In this project, the accuracy of five recommender system algorithms will be compared against the offline data.

Serendipity will be introduced to the user-based collaborative filtering model to improve the user experience.

Load packages

#convert to data to spase matrix
suppressWarnings(suppressMessages(library(Matrix)))
#recommedation model building
suppressWarnings(suppressMessages(library(recommenderlab)))
#data manipulation
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(tidyr)))
suppressWarnings(suppressMessages(library(stringr)))
suppressWarnings(suppressMessages(library(knitr)))
# similarity calculation
suppressWarnings(suppressMessages(library(proxy)))
#linear algebra  here used for rmse calculation
suppressWarnings(suppressMessages(library(Metrics)))
#visualizations
suppressWarnings(suppressMessages(library(ggplot2)))

1. Load the data

data(MovieLense)

2. Explore the data

2.1 Distribution of ratings

#distribution of ratings
vector_ratings <- as.vector(MovieLense@data)
(rating_frq <- table(vector_ratings))

## vector_ratings
##       0       1       2       3       4       5 
## 1469760    6059   11307   27002   33947   21077

qplot(vector_ratings) + ggtitle("Distribution of the ratings") + labs(x = "Score")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There are a lot of missing values(0). The most common rating is 4.

2.2 Select the most relevant data

#selece only the users who have rated at least 50 movies and movies that had been rated more than 100 times
(ratings_movies <- MovieLense[rowCounts(MovieLense) > 50,
colCounts(MovieLense) > 100])

## 560 x 332 rating matrix of class 'realRatingMatrix' with 55298 ratings.

2.3 Distribution of user rating mean and items and distribution of number of rated items and number of ratings a item has

#convert the data to realRatingMatrix
ml <- as(MovieLense@data, "realRatingMatrix")

#distribution of user rating means
user_summary <-  as.data.frame(cbind('mean'=rowMeans(ratings_movies),'number'=rowCounts(ml)))

## Warning in cbind(mean = rowMeans(ratings_movies), number = rowCounts(ml)):
## number of rows of result is not a multiple of vector length (arg 1)

user_summary <-as.data.frame(sapply(user_summary, function(x) as.numeric(as.character(x))))

#distribution of movie rating means
item_summary <- as.data.frame(cbind('mean'=colMeans(ml), 'number'=colCounts(ratings_movies)))

## Warning in cbind(mean = colMeans(ml), number = colCounts(ratings_movies)):
## number of rows of result is not a multiple of vector length (arg 2)

item_summary <-as.data.frame(sapply(item_summary, function(x) as.numeric(as.character(x))))

par(mfrow=c(1,2))

#distribution 1
ggplot(user_summary,aes(mean)) +
  geom_histogram(binwidth = 0.05,col='white',fill="orchid3") + labs(x = "Average User Rating", y = "Count of Movies", title = "Distribution 1") + geom_vline(xintercept = mean(user_summary$mean),col='grey',size=2)

#distribution 2
ggplot(item_summary,aes(mean)) +
  geom_histogram(binwidth = 0.05,col='white',fill="seagreen3") + labs(x = "Average Movie Rating", y = "Count of Movies", title = "Distribution 2") + geom_vline(xintercept = mean(item_summary$mean),col='grey',size=2)

#distribution 3 
ggplot(user_summary,aes(number)) +
  geom_histogram(binwidth = 0.8,fill="violetred2") + labs(x = "Count of Rated Items", y = "Count of Users", title = "Distribution 3")

#distribution 4
ggplot(item_summary,aes(number)) +
  geom_histogram(binwidth = 0.8,fill="turquoise2") + labs(x = "Count of Ratings per Movie", y = "Count of Movies", title = "Distribution 4")

From the figures, we can see:

1) Distribution 1 contains the rating means of users and distribution 2 contains the rating means of movies. We can see that the distributions are normal with a grey line indicating the mean.

2) Distribution 3 contains the number of movies rated by each user. Distribution 4 contains the number of times each movie was rated. We can see that both distributions are right skewed, suggesting that many movies have been rated by only a few users.

The movies which have been rated less than 100 times will be removed for our analysis going forward.

2.4 Viewing the matrix by building a heat map whose colors represent the ratings

image(ratings_movies, main = "Heatmap of the rating matrix")

# select the top users and movies
top_movies <- quantile(rowCounts(ratings_movies), 0.98)
top_users <- quantile(colCounts(ratings_movies), 0.98)

#heatmap of the top users and movies
image(ratings_movies[rowCounts(ratings_movies) > top_movies,
colCounts(ratings_movies) > top_users], main = "Heatmap of the top
users and movies")

3.Normalizing the data

ratings_movies_nor <- normalize(ratings_movies)
getRatingMatrix(ratings_movies_nor)[1:5,1:5]

## 5 x 5 sparse Matrix of class "dgCMatrix"
##   Toy Story (1995) GoldenEye (1995) Get Shorty (1995)
## 1       1.17341040       -0.8265896        -0.8265896
## 2       0.07317073        .                 .        
## 3       .                 .                 .        
## 5       0.71739130       -0.2826087         .        
## 6       0.38194444        .                 .        
##   Twelve Monkeys (1995) Babe (1995)
## 1             0.1734104  -2.8265896
## 2             .           .        
## 3             .           .        
## 5             .           .        
## 6            -1.6180556   0.3819444

4. Binarizing the data

(mean(getRatingMatrix(ratings_movies),na.rm=TRUE))

## [1] 1.102705

ratings_movies_bi <- binarize(ratings_movies, minRating = 1)
getRatingMatrix(ratings_movies_bi)

## itemMatrix in sparse format with
##  560 rows (elements/transactions) and
##  332 columns (items)

5. Comparing models

5.1 Evaluating the recommendations

# use the minimum number of items purchased by any user tp decide item number to keep
(min(rowCounts(ratings_movies)))

## [1] 18

n_fold <- 4
items_to_keep <- 15
rating_threshold <- 3

# Use k-fold to validate models
eval_sets <- evaluationScheme(data = ratings_movies, method = "cross-validation",k = n_fold, given = items_to_keep, goodRating = rating_threshold)

models  <- list(
  RANDOM = list(name = "RANDOM", param = NULL),
  POPULAR = list(name = "POPULAR", param = NULL),
  IBCF=list(name="IBCF",param=list(method = "cosine")),
  UBCF=list(name="UBCF", param=list(method = "cosine")),
  SVD=list(name="SVD", param=list(k =100))
)

# varying the number of items we want to recommend to users
n_rec <- c(1, 5, seq(10, 100, 10))

# evaluating the recommendations
results <- evaluate(x = eval_sets, method = models, n= n_rec)

## RANDOM run fold/sample [model time/prediction time]
##   1  [0.02sec/0.09sec] 
##   2  [0sec/0.09sec] 
##   3  [0sec/0.1sec] 
##   4  [0sec/0.1sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  [0sec/0.36sec] 
##   2  [0.01sec/0.38sec] 
##   3  [0.02sec/0.33sec] 
##   4  [0sec/0.39sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.41sec/0.11sec] 
##   2  [0.4sec/0.11sec] 
##   3  [0.36sec/0.11sec] 
##   4  [0.38sec/0.11sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/0.37sec] 
##   2  [0sec/0.55sec] 
##   3  [0sec/0.33sec] 
##   4  [0sec/0.32sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.5sec/0.11sec] 
##   2  [0.68sec/0.12sec] 
##   3  [0.63sec/0.13sec] 
##   4  [0.58sec/0.08sec]

# extract the related average confusion matrices
(avg_matrices <- lapply(results, avg))

## $RANDOM
##            TP        FP       FN       TN precision      recall
## 1    0.300000  0.700000 72.86964 243.1304 0.3000000 0.003710834
## 5    1.362500  3.637500 71.80714 240.1929 0.2725000 0.017694191
## 10   2.655357  7.344643 70.51429 236.4857 0.2655357 0.034585415
## 20   5.119643 14.880357 68.05000 228.9500 0.2559821 0.067484632
## 30   7.323214 22.676786 65.84643 221.1536 0.2441071 0.098416096
## 40   9.542857 30.457143 63.62679 213.3732 0.2385714 0.128622660
## 50  11.730357 38.269643 61.43929 205.5607 0.2346071 0.158258313
## 60  13.921429 46.078571 59.24821 197.7518 0.2320238 0.188344250
## 70  16.212500 53.787500 56.95714 190.0429 0.2316071 0.219148970
## 80  18.483929 61.516071 54.68571 182.3143 0.2310491 0.250191267
## 90  20.783929 69.216071 52.38571 174.6143 0.2309325 0.281634800
## 100 23.062500 76.937500 50.10714 166.8929 0.2306250 0.312489665
##             TPR         FPR
## 1   0.003710834 0.002790151
## 5   0.017694191 0.014728273
## 10  0.034585415 0.029791959
## 20  0.067484632 0.060596092
## 30  0.098416096 0.092585917
## 40  0.128622660 0.124538168
## 50  0.158258313 0.156715584
## 60  0.188344250 0.188818111
## 70  0.219148970 0.220396152
## 80  0.250191267 0.252096283
## 90  0.281634800 0.283772893
## 100 0.312489665 0.315503957
## 
## $POPULAR
##             TP         FP       FN       TN precision     recall
## 1    0.7214286  0.2785714 72.44821 243.5518 0.7214286 0.01197377
## 5    2.7625000  2.2375000 70.40714 241.5929 0.5525000 0.04205479
## 10   5.0785714  4.9214286 68.09107 238.9089 0.5078571 0.07488194
## 20   9.1053571 10.8946429 64.06429 232.9357 0.4552679 0.13135044
## 30  12.2589286 17.7410714 60.91071 226.0893 0.4086310 0.17523814
## 40  14.9196429 25.0803571 58.25000 218.7500 0.3729911 0.21192055
## 50  18.0642857 31.9357143 55.10536 211.8946 0.3612857 0.25277782
## 60  20.8500000 39.1500000 52.31964 204.6804 0.3475000 0.29027315
## 70  23.4589286 46.5410714 49.71071 197.2893 0.3351276 0.32690892
## 80  26.1267857 53.8732143 47.04286 189.9571 0.3265848 0.36295416
## 90  28.9214286 61.0785714 44.24821 182.7518 0.3213492 0.39953657
## 100 31.1160714 68.8839286 42.05357 174.9464 0.3111607 0.42868182
##            TPR         FPR
## 1   0.01197377 0.001032292
## 5   0.04205479 0.008570093
## 10  0.07488194 0.019001687
## 20  0.13135044 0.042545224
## 30  0.17523814 0.070125655
## 40  0.21192055 0.100045874
## 50  0.25277782 0.127542655
## 60  0.29027315 0.156794182
## 70  0.32690892 0.186922897
## 80  0.36295416 0.216775694
## 90  0.39953657 0.245875236
## 100 0.42868182 0.277839959
## 
## $IBCF
##             TP         FP       FN       TN precision      recall
## 1    0.3517857  0.6482143 72.81786 243.1821 0.3517857 0.004704604
## 5    1.5071429  3.4928571 71.66250 240.3375 0.3014286 0.020421888
## 10   2.8375000  7.1625000 70.33214 236.6679 0.2837500 0.038260790
## 20   5.5589286 14.4410714 67.61071 229.3893 0.2779464 0.075186480
## 30   8.1500000 21.8500000 65.01964 221.9804 0.2716667 0.111546479
## 40  10.7107143 29.2892857 62.45893 214.5411 0.2677679 0.147078492
## 50  13.1946429 36.8053571 59.97500 207.0250 0.2638929 0.184117461
## 60  15.6553571 44.3446429 57.51429 199.4857 0.2609226 0.218873772
## 70  18.1875000 51.8017857 54.98214 192.0286 0.2598382 0.255920967
## 80  20.8767857 59.0946429 52.29286 184.7357 0.2609989 0.295753385
## 90  23.3946429 66.5589286 49.77500 177.2714 0.2599969 0.332184465
## 100 25.8071429 74.1214286 47.36250 169.7089 0.2581691 0.367284151
##             TPR         FPR
## 1   0.004704604 0.002585776
## 5   0.020421888 0.014064897
## 10  0.038260790 0.028969708
## 20  0.075186480 0.058513435
## 30  0.111546479 0.088679090
## 40  0.147078492 0.119099210
## 50  0.184117461 0.149967775
## 60  0.218873772 0.180853028
## 70  0.255920967 0.211373666
## 80  0.295753385 0.241237780
## 90  0.332184465 0.271860369
## 100 0.367284151 0.302996193
## 
## $UBCF
##            TP        FP       FN       TN precision     recall        TPR
## 1    0.687500  0.312500 72.48214 243.5179 0.6875000 0.01213981 0.01213981
## 5    2.867857  2.132143 70.30179 241.6982 0.5735714 0.04757398 0.04757398
## 10   5.225000  4.775000 67.94464 239.0554 0.5225000 0.08346185 0.08346185
## 20   9.253571 10.746429 63.91607 233.0839 0.4626786 0.14199344 0.14199344
## 30  12.667857 17.332143 60.50179 226.4982 0.4222619 0.19051825 0.19051825
## 40  15.848214 24.151786 57.32143 219.6786 0.3962054 0.23568372 0.23568372
## 50  18.819643 31.180357 54.35000 212.6500 0.3763929 0.27681225 0.27681225
## 60  21.580357 38.419643 51.58929 205.4107 0.3596726 0.31402751 0.31402751
## 70  24.125000 45.875000 49.04464 197.9554 0.3446429 0.34802450 0.34802450
## 80  26.558929 53.441071 46.61071 190.3893 0.3319866 0.37957817 0.37957817
## 90  28.891071 61.108929 44.27857 182.7214 0.3210119 0.41020202 0.41020202
## 100 31.117857 68.882143 42.05179 174.9482 0.3111786 0.43854524 0.43854524
##             FPR
## 1   0.001190574
## 5   0.008285847
## 10  0.018608597
## 20  0.042323293
## 30  0.068694515
## 40  0.096221087
## 50  0.124584753
## 60  0.153956768
## 70  0.184347296
## 80  0.215264680
## 90  0.246684614
## 100 0.278480965
## 
## $SVD
##            TP        FP       FN       TN precision      recall
## 1    0.375000  0.625000 72.79464 243.2054 0.3750000 0.005697137
## 5    1.887500  3.112500 71.28214 240.7179 0.3775000 0.028998859
## 10   3.467857  6.532143 69.70179 237.2982 0.3467857 0.051861848
## 20   6.275000 13.725000 66.89464 230.1054 0.3137500 0.091864973
## 30   9.030357 20.969643 64.13929 222.8607 0.3010119 0.131267811
## 40  11.616071 28.383929 61.55357 215.4464 0.2904018 0.167688858
## 50  14.116071 35.883929 59.05357 207.9464 0.2823214 0.202292078
## 60  16.566071 43.433929 56.60357 200.3964 0.2761012 0.236276017
## 70  18.885714 51.114286 54.28393 192.7161 0.2697959 0.268850232
## 80  21.194643 58.805357 51.97500 185.0250 0.2649330 0.299672173
## 90  23.514286 66.485714 49.65536 177.3446 0.2612698 0.330601939
## 100 25.746429 74.253571 47.42321 169.5768 0.2574643 0.360884564
##             TPR         FPR
## 1   0.005697137 0.002517459
## 5   0.028998859 0.012577072
## 10  0.051861848 0.026449141
## 20  0.091864973 0.055758008
## 30  0.131267811 0.085297391
## 40  0.167688858 0.115575127
## 50  0.202292078 0.146175698
## 60  0.236276017 0.177105827
## 70  0.268850232 0.208557614
## 80  0.299672173 0.240072793
## 90  0.330601939 0.271404157
## 100 0.360884564 0.303318332

To interpret these results, this key may be helpful:

. True Positives (TP): These are recommended items that have been rated

. False Positives (FP): These are recommended items that haven't been rated . False Negatives(FN): These are not recommended items that have been rated

. True Negatives (TN): These are not recommended items that haven't been rated

. True Positive Rate (TPR): This is the percentage of purchased items that have been recommended. It's the number of TP divided by the number of purchased items (TP + FN).

. False Positive Rate (FPR): This is the percentage of not purchased items that have been recommended. It's the number of FP divided by the number of not purchased items (FP + TN).

5.2 Evaluating the ratings

recommender_random <- Recommender(data = getData(eval_sets, "train"),
method = "RANDOM",parameter = NULL)

recommender_popular <- Recommender(data = getData(eval_sets, "train"),
method = "POPULAR",parameter = NULL)

recommender_ibcf <- Recommender(data = getData(eval_sets, "train"),
method = "IBCF",parameter = list(method = "cosine"))

recommender_ubcf <- Recommender(data = getData(eval_sets, "train"),
method = "UBCF",parameter = list(method = "cosine"))

recommender_svd <- Recommender(data = getData(eval_sets, "train"),
method = "SVD",parameter = list(k =100))

items_to_recommend <- 10

eval_prediction_random <- predict(object = recommender_random, newdata =
getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

eval_prediction_popular <- predict(object = recommender_popular, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

eval_prediction_ibcf <- predict(object = recommender_ibcf, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

eval_prediction_ubcf <- predict(object = recommender_ubcf, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

eval_prediction_svd <- predict(object = recommender_svd, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

# compare RMSE for different models
######################RANDOM######################
eval_accuracy_random <- calcPredictionAccuracy(
x = eval_prediction_random, data = getData(eval_sets, "unknown"), byUser = F)

eval_accuracy_random_user <- calcPredictionAccuracy(
x = eval_prediction_random, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_random_user)

##           RMSE       MSE      MAE
## [1,] 1.3597243 1.8488502 1.101434
## [2,] 0.9268175 0.8589907 0.728238
## [3,] 1.3961119 1.9491284 1.088635
## [4,] 1.8334279 3.3614578 1.503241
## [5,] 1.5094240 2.2783609 1.169797
## [6,] 1.4094622 1.9865838 1.079278

qplot(eval_accuracy_random_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the Random RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

######################POPULAR######################
eval_accuracy_popular <- calcPredictionAccuracy(
x = eval_prediction_popular, data = getData(eval_sets, "unknown"), byUser =F) 

eval_accuracy_popular_user <- calcPredictionAccuracy(
x = eval_prediction_popular, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_popular_user)

##         RMSE       MSE       MAE
## 2  0.8252278 0.6810009 0.5617565
## 10 0.4714709 0.2222848 0.3910219
## 42 0.9589071 0.9195028 0.7670577
## 49 1.3703266 1.8777950 1.1194657
## 56 0.8764282 0.7681264 0.6900810
## 59 0.9892119 0.9785402 0.7399601

qplot(eval_accuracy_popular_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the Popular RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

######################IBCF######################
eval_accuracy_ibcf <- calcPredictionAccuracy(
x = eval_prediction_ibcf, data = getData(eval_sets, "unknown"), byUser = F)

eval_accuracy_ibcf_user <- calcPredictionAccuracy(
x = eval_prediction_ibcf, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_ibcf_user)

##         RMSE       MSE       MAE
## 2  1.3116035 1.7203037 0.9834384
## 10 0.6672847 0.4452689 0.5211795
## 42 1.2864529 1.6549612 0.9421041
## 49 1.7794485 3.1664371 1.3509088
## 56 1.3133327 1.7248427 0.9765165
## 59 1.3559660 1.8386439 0.9244176

qplot(eval_accuracy_ibcf_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the IBCF RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

######################UBCF######################
eval_accuracy_ubcf <- calcPredictionAccuracy(
x = eval_prediction_ubcf, data = getData(eval_sets, "unknown"), byUser = F)

eval_accuracy_ubcf_user <- calcPredictionAccuracy(
x = eval_prediction_ubcf, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_ubcf_user)

##         RMSE       MSE       MAE
## 2  0.8609476 0.7412308 0.6314881
## 10 0.4991643 0.2491650 0.3975030
## 42 0.9342446 0.8728129 0.7246420
## 49 1.3601553 1.8500224 1.1325268
## 56 0.8884840 0.7894038 0.6565416
## 59 1.0730427 1.1514207 0.8145044

qplot(eval_accuracy_ubcf_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the IBCF RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

######################SVD######################
eval_accuracy_svd <- calcPredictionAccuracy(
x = eval_prediction_svd, data = getData(eval_sets, "unknown"), byUser = F)

eval_accuracy_svd_user <- calcPredictionAccuracy(
x = eval_prediction_svd, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_svd_user)

##         RMSE       MSE       MAE
## 2  0.9400020 0.8836037 0.7102671
## 10 0.5833815 0.3403340 0.4447397
## 42 1.0026193 1.0052454 0.7776307
## 49 1.3821925 1.9104560 1.1574958
## 56 0.9367730 0.8775437 0.6853434
## 59 1.1105352 1.2332885 0.8429122

qplot(eval_accuracy_svd_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the IBCF RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

kable(RMSE_COMPARE <- rbind(eval_accuracy_random,eval_accuracy_popular,eval_accuracy_ibcf,eval_accuracy_ubcf,eval_accuracy_svd))

	RMSE	MSE	MAE
eval_accuracy_random	1.3400123	1.7956330	1.0451535
eval_accuracy_popular	0.9486848	0.9000029	0.7402682
eval_accuracy_ibcf	1.2423668	1.5434753	0.9204428
eval_accuracy_ubcf	0.9729319	0.9465964	0.7618365
eval_accuracy_svd	1.0051898	1.0104065	0.7925634

RMSE_COMPARE <- as.data.frame(RMSE_COMPARE)
RMSE_COMPARE$model <- c('Random','Popular','IBCF','UBCF','SVD')

ggplot(RMSE_COMPARE, aes(model,RMSE))+ geom_point(colour = "blue", size = 3)

Above, we chose to look at the RMSE scores both with by user and not by user mostly for demonstration purposes. You can see that by user, the RMSE values don't give us the information we need. We want to look at the performance of the entire model, not the performance of the model for each individual. Therefore, using byUser = FALSE is the recommended approach.

We can see that the popular recommender is our best performer, with user-based collaborative filtering coming in second place, and singular value decomposition coming in third.

Serendipity, novelty, and diversity

If we want to go beyond accuracy of a model, we can start looking at measures such as serendipity, novelty, and diversity. There is a lot of research out there now stating that an accurate model is not necessarily a good one, particularly if the recommendations aren't useful to the user.

In the case of serendipity, a formal definition hasn't been created but we can think of it as the model predicting something unexpected and useful that the user was not specifically searching for. This does not necessarily improve the accuracy of the model, but it improves the quality of the predictions the model makes.

Novelty is a metric rooted in popularity. In our case, if a movie is very popular, there's a good chance a lot of our users have seen it. In the case of a recommendation system, we would want to decrease the weight of these popular shows and increase the weight of shows that are less popular but similar to the movies the user has watched before.

Diversity is our final metric, based on keeping recommendations fresh for the user. Users like new recommendations, and are going to get bored with the same ones over and over. One way to implement this in a recommender system would be to shuffle the top recommendations. That way, they're still relevant, but the user isn't seeing the same recommendations over and over.

6.Serendipity

6.1 Build a modle by combinding user-based and genre_based method

# convert data to realRatingMatrix format
mls <- as(ratings_movies,"realRatingMatrix")

items_genre_all <- MovieLenseMeta[,-c(2,3,4)]
items_genre_movie <- semi_join(data.frame('title'=colnames(ratings_movies)), items_genre_all, by = "title")
items_genre <- semi_join(items_genre_all, items_genre_movie, by = "title")

rating_m <- as.matrix(getRatingMatrix(ratings_movies))

# number of user, genre, and movie
nuser <- length(rating_m[,1])
ngenre <- length(items_genre[1,])-1
nmovie <- length(items_genre[,1])

# Compute average rating of each genre for a given user
user_genre <- data.frame(matrix(0, nrow = nuser, ncol = ngenre))

for (i in 1:nuser){
  for(j in 1:ngenre){
    v1 <- as.matrix(rating_m[i,])
    v2 <- as.matrix(items_genre[,(j+1)])
    user_genre[i,j] <- (t(v1) %*% v2)/sum(v2)
  }
}

# Compute average rating of each item based on average rating for each genre 
rating_genre_based <- data.frame(matrix(0, nrow = nuser, ncol =nmovie))
for (i in 1:nuser){
  for(j in 1:nmovie){
    v1 <- as.matrix(user_genre[i,])
    v2 <- as.matrix(items_genre[j,2:(ngenre+1)])
    rating_genre_based[i,j] <- (v1 %*% t(v2))/sum(v2)
  }
}

rownames(rating_genre_based) <- rownames(rating_m)
colnames(rating_genre_based) <- colnames(rating_m)
rownames(user_genre) <- rownames(rating_m)
colnames(user_genre) <- colnames(items_genre[,2:19])

# Select a random set of users
set.seed(1)
users <- sample(rownames(getRatingMatrix(ratings_movies)),size=20)

# This rating profile based on genre shows the aggregated inclination of each user towards movie genres. Assumming that users like similar items, movies that are most similar to a user's preference for an movie's feature could be recommended.

#convert the predicted ratings based on genre to binary 
avg_user_genre <- as.matrix(rowMeans(user_genre))
user_genre_bi <- user_genre - t(avg_user_genre)
user_genre_bi[user_genre_bi < 0] <- 0
user_genre_bi[user_genre_bi > 0] <- 1

#find the most similar movies for a selected user 
profile <- user_genre_bi[users[1],] 
colnames(profile) <- colnames(items_genre[,2:19])
sim_matrix <- rbind.data.frame(profile, items_genre[,2:19])
sim_matrix <- data.frame(lapply(sim_matrix,function(x) as.integer(x))) #convert data to type integer

#Calculate Jaccard distance between user profile and all movies
similarity <- dist(sim_matrix, method = "Jaccard")
similarity <- as.data.frame(as.matrix(similarity[1:333]))
rows <- which(similarity == min(similarity))
#Recommended movies
print(paste('Recommend use with userID', users[1],'the following movies:', items_genre[rows-1,1]))

## [1] "Recommend use with userID 255 the following movies: GoodFellas (1990)"          
## [2] "Recommend use with userID 255 the following movies: George of the Jungle (1997)"
## [3] "Recommend use with userID 255 the following movies: Fly Away Home (1996)"

6.2 Expected list of Recommendations (User-Based Collaborative Filtering)

Recommend 10 movies to 20 randomly chose users and creat recommendation list RS.

# Build a User-based recommendation model 
rec <- Recommender(mls,method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=1))

## Warning: Unknown parameters: minRating

## Available parameter (with default values):
## method    =  cosine
## nn    =  25
## sample    =  FALSE
## normalize     =  center
## verbose   =  FALSE

# Creat top-10 recommendation lists for five users
recom_movie <- predict(rec, ratings_movies[users], n=10)
movie_rec <- as(recom_movie, "list")
movie_ubcf <- data.frame(movie_rec[1:20])
colnames(movie_ubcf) <- users
kable(movie_ubcf[,1:5], caption = "Recommend 10 movies to users")

255	342	514	854	194
Dead Man Walking (1995)	Shawshank Redemption, The (1994)	Trainspotting (1996)	Wrong Trousers, The (1993)	Reservoir Dogs (1992)
Godfather, The (1972)	Schindler's List (1993)	Babe (1995)	GoodFellas (1990)	Shining, The (1980)
Face/Off (1997)	Godfather, The (1972)	Lion King, The (1994)	Close Shave, A (1995)	Willy Wonka and the Chocolate Factory (1971)
Close Shave, A (1995)	Cinema Paradiso (1988)	Sound of Music, The (1965)	Platoon (1986)	Rear Window (1954)
Raiders of the Lost Ark (1981)	Toy Story (1995)	Primal Fear (1996)	Godfather: Part II, The (1974)	Terminator 2: Judgment Day (1991)
Aliens (1986)	Much Ado About Nothing (1993)	American President, The (1995)	Raging Bull (1980)	People vs. Larry Flynt, The (1996)
Sling Blade (1996)	Vertigo (1958)	Pulp Fiction (1994)	Clockwork Orange, A (1971)	Wrong Trousers, The (1993)
Abyss, The (1989)	Full Monty, The (1997)	Sling Blade (1996)	To Kill a Mockingbird (1962)	Contact (1997)
Mr. Holland's Opus (1995)	Strictly Ballroom (1992)	Raising Arizona (1987)	Face/Off (1997)	Good Will Hunting (1997)
Star Trek: The Wrath of Khan (1982)	Piano, The (1993)	Fried Green Tomatoes (1991)	Back to the Future (1985)	Face/Off (1997)

# Creat the recommendation list RS.
d <- gather(movie_ubcf,user,movie,1:20)

## Warning: attributes are not identical across measure variables; they will
## be dropped

rs <- unique(as.character(d[,2])) 
rs_df<- data.frame('Expected_list_UBCF'=rs)
kable(head(rs_df))

Expected_list_UBCF

Dead Man Walking (1995)
Godfather, The (1972)
Face/Off (1997)
Close Shave, A (1995)
Raiders of the Lost Ark (1981) Aliens (1986)

6.3 Unexpected list of Recommendations

Create the PPM list for top-119 movies using a primitive recommendation method

# Primitive recommendation is based on picking top movies that has got the ratings from largest number of users and movies that got the highest average ratings.

# number of ratings for each item
number_of_users_for_items <- as.data.frame(colCounts(mls))
number_of_users_for_items[,1] <-number_of_users_for_items[order(-number_of_users_for_items[,1]),] 
colnames(number_of_users_for_items) <- c('movie')

# Compute the movie IDs with the largest number of users.
avg_rating_for_items <- as.data.frame(colMeans(mls))
avg_rating_for_items[,1] <-avg_rating_for_items[order(-avg_rating_for_items[,1]),]
colnames(avg_rating_for_items) <- c('movie')

topN_PPM <- unique(c(rownames(number_of_users_for_items),rownames(avg_rating_for_items)))

# Get the PPM list based on a primitive recommendation method. Here, top 119 movies are selected based on the highest average rating and the largest number of users for each movie.
number_of_rankings_requested = 119
top119_PPM <- topN_PPM[1:119]

# Unexpected list of Recommendations is the list of items that are in RS list but not in PPM.
top119_PPM_df <- data.frame('movie'= top119_PPM)
top119_PPM_df <- data.frame(lapply(top119_PPM_df, as.character), stringsAsFactors=FALSE)
rs_df <- data.frame('movie'= rs)
rs_df <- data.frame(lapply(rs_df, as.character), stringsAsFactors=FALSE)
unexp <- anti_join(rs_df,top119_PPM_df,by.x= 'movie', by.y='movie')

## Joining, by = "movie"

unexp_df <- data.frame('Unexpected_list'=unexp)
kable(head(unexp_df,n=10))

movie

Face/Off (1997)
Close Shave, A (1995)
Sling Blade (1996)
Star Trek: The Wrath of Khan (1982) Schindler's List (1993)
Vertigo (1958)
Full Monty, The (1997)
Strictly Ballroom (1992)
Piano, The (1993)
Trainspotting (1996)

6.4 Predict serendity list

# retrive the predict ratings generated by UBCF model
recom_rating <- predict(rec, ratings_movies[users], type="ratings")
movie_predict <- as(recom_rating, "list")
predict_rating <- as(recom_rating, "matrix") 
predict_rating <- as.data.frame(predict_rating)
head(predict_rating[,1:4])

##     Toy Story (1995) GoldenEye (1995) Get Shorty (1995)
## 255         3.115311         2.854167          2.831789
## 342         4.140593         3.293230                NA
## 514               NA         3.900019                NA
## 854               NA         3.079813                NA
## 194               NA         3.017854                NA
## 843               NA         2.813357          2.560468
##     Twelve Monkeys (1995)
## 255                    NA
## 342                    NA
## 514                    NA
## 854                    NA
## 194                    NA
## 843                    NA

nameList <- unexp[,1]
predict_rating <- predict_rating[,colnames(predict_rating) %in% nameList] 

# Subset the rating matrix to get the predictions for randomly selected users.
predict_rating_sub <- as.data.frame(predict_rating[,rownames(predict_rating) %in% users])
predict_rating_sub$user <- rownames(predict_rating_sub)
predict_rating_sub <- predict_rating_sub[,c(62,1:61)]
predict_rating_sub$user <- sapply(predict_rating_sub$user, as.numeric)

# Compute the list of useful items from the unexpected set of movies.
# Usefulness is when the ratings of movies > 2.5
predict_rating_sub_long <- gather(predict_rating_sub,movie,rating,2:62) 

usefulness <- predict_rating_sub_long[which(predict_rating_sub_long$rating>2.5),]
usefulness <- usefulness[order(usefulness$user),]
kable(head(usefulness),caption="Useful_List")

	user	movie	rating
50	58	Patton (1970)	3.902803
70	58	Young Frankenstein (1974)	4.308861
170	58	Last of the Mohicans, The (1992)	4.062107
230	58	Fifth Element, The (1997)	3.653026
250	58	Contact (1997)	4.316963
310	58	Leaving Las Vegas (1995)	4.258151

6.5 Improve Serendipity List

# subset item_genre matrix to small data set which contains only the movies in serendipity list
items_genre_imt <- semi_join(items_genre_all,data.frame('title'=unique(usefulness$movie)), by = "title")

# Compute average rating of each genre for a given user
v1 <- as.matrix(predict_rating_sub[,2:62]) 
v1[is.na(v1)] <- 0
user_genre_imt <- v1 %*% as.matrix(items_genre_imt[,2:19])
v3 <- colSums(items_genre_imt[,2:19])
v4 <- matrix(rep((1/v3),18),18,20)

## Warning in matrix(rep((1/v3), 18), 18, 20): data length [324] is not a sub-
## multiple or multiple of the number of columns [20]

v4 <- t(v4)
user_genre_imt <- user_genre_imt[,-7] # Genre Documentary has no rating, so we remove this genre
v4 <- v4[,-7] # remove genre Documentary 
user_genre_sp <- user_genre_imt * v4 # average rating of each genre 

# Compute average rating of each item based on average rating for each genre 
items_genre_sp <- items_genre_imt[,-c(1,8)] # remove genre Documentary 
rating_genre_based_sp <- user_genre_sp %*% t(items_genre_sp)#average rating of each item based on genre 
colnames(rating_genre_based_sp) <-items_genre_imt$title
kable(head(rating_genre_based_sp[,1:6]),caption="Improved_Serendipity_List")

	Patton (1970)	Young Frankenstein (1974)	Last of the Mohicans, The (1992)	Fifth Element, The (1997)	Contact (1997)	Leaving Las Vegas (1995)
255	5.364220	4.789906	8.030055	5.480612	5.688661	5.662999
342	5.444992	3.014967	6.515864	1.748378	3.015978	4.802111
514	4.717684	4.357967	8.114585	5.545373	4.926453	4.834792
854	3.844094	5.769871	6.211448	5.116786	4.981673	4.318114
194	2.326376	2.027171	2.993236	2.316702	2.149810	1.740571
843	2.731897	1.566843	4.475721	4.101452	3.434087	2.513222

6.6 Compare the serendipity list and the imporved serendipity list based on by genre

predict_rating_sub_1 <- predict_rating_sub[,-1]
# rerange the item sequence of serendipity list 
predict_rating_sub_1 <- predict_rating_sub_1 [colnames(rating_genre_based_sp)]
serendipity <- predict_rating_sub_1
#calculate the difference between serendipity list and the imporved serendipity list and normalize by rating sacle 5.
delta <- ((rating_genre_based_sp - serendipity)/5)*100
#Movies whose delta are below the threshold can be removed from the serendipity list. Here set the threshold as -5%
less_serendipitous <- delta
less_serendipitous$less <- rowSums(less_serendipitous < -5, na.rm=T)
less_serendipitous <- data.frame('UserID'=rownames(less_serendipitous), 'Number of less serendipitous movies'=less_serendipitous[,62])
kable(less_serendipitous )

UserID	Number.of.less.serendipitous.movies
255	1
342	16
514	16
854	9
194	17
843	14
885	15
591	12
555	10
58	9
195	11
167	13
617	7
344	10
693	11
447	14
637	10
910	10
336	6
921	11

From the above table, we can see that a significant number of movies are considered less serendipitous based on the above threshold,

delta$UserID <- rownames(delta)
delta_long <- gather(delta,movie,delta_rating,1:61)
delta_long$UserID <- sapply(delta_long$UserID, as.numeric)
ceiling <- max(delta_long$delta_rating,na.rm=T)
bottom <- min(delta_long$delta_rating,na.rm=T)
ggplot(delta_long,aes(x=UserID,y=delta_rating))+geom_point(fill="blue", shape=21, size  = 2) +ylim(bottom,ceiling)+geom_hline(yintercept = -5,linetype="dashed",size=1.5)+ggtitle("Serendipity List of Movies")

## Warning: Removed 338 rows containing missing values (geom_point).

The figure above shows the serendipity list of movies generated by a standard collaborative filtering method for 20 randomly selected users. The differences in useful ratings and genre-based ratings below 5% are considered less serendipitous. A significant fraction of movies has been identified as less serendipitous items based on the algorithm used here.

actual <- as.data.frame(as.matrix(getRatingMatrix(ratings_movies)))
actual$userid<-rownames(actual)
actual_long <- gather(actual, title, rating,1:332)
actual_rating <- semi_join(actual_long,data.frame('title'=unique(usefulness$movie)), by = "title")
actual_rating <- actual_rating[which(actual_rating$userid %in% c(users)),]
actual_rating$userid <- sapply(actual_rating$userid, as.numeric)
actual_rating <- actual_rating[match(delta_long$UserID, actual_rating$userid),]

rating_genre_based_sp_1 <- data.frame(rating_genre_based_sp)
rating_genre_based_sp_1$userid <- rownames(rating_genre_based_sp)
rating_genre_based_sp_long <- gather(rating_genre_based_sp_1, movie,rating,1:61)

rmse(actual_rating$rating,rating_genre_based_sp_long$rating)

## [1] 5.516745

The accuracy decrease after introduce the serendipity list into the UBCF model.

#####Diversity

ml.model <- Recommender(ml[1:900], method = "UBCF")
#top 5 recommendations for user
ml.523.top10 <- as(predict(ml.model, ml["523"], n = 10), "list")

length(ml.523.top10[[1]])
sample(ml.523.top10[[1]], 5)

r <- Recommender(MovieLense[1:800], method = "POPULAR")
r
getModel(r)$topN

#Recommend 3 movies to 2 users
recom <- predict(r, MovieLense[809:810], n = 10)
as(recom, "list")
length(recom)

rnorm(10)

Conclusion:

After introducing serendipity to our model, we noticed an increase in our RMSE. Though the model is less accurate in one measure, it is potentially more valuable in the eyes of user experience. Trade-offs like this must be made when designing a recommender system, and the designer has to choose what is more valuable.

Offline vs. online experiments

Offline experiments:

These are what we've done up to this point, using offline data and simulating user behavior. It is the least risky since there is not threat of driving away users. Therefore, they are the easiest to conduct, due to the fact that interaction with the user is not required.

Online experiments:

Where real user populations interact with the recommender system. These experiments evaluate the performance of the recommender system. The user is oblivious to the conducted experiment in these scenarios. Online experiments are known to be the most trustworthy experiment. Here, we want to look at the change in the behavior of users when interacting with different recommendation systems. In this type of setup, small subsets of the user population are directed toward different systems and then their interactions with the different systems are recorded. A user's assignment to a system must be random. Being that an online experiment can have a negative impact on the "real" system, you want to make sure that your online experiment is last, after extensive research offline.

A great way to examine the effectiveness of an online experiment is to measure the significance level (p-value) between two algorithms, A and B. This will help us determine whether there was a statistically significant difference in our results or they were due to chance. ANOVA or the Friedman test for ranking can be used for the comparison of multiple algorithms at once.

In the case of our Movielens data set, we could implement online experiments that are rolled out to users of movielens.org. In these online experiments, we want to manipulate variables that will enhance the user experience, leading the user to use the site more, rate more movies, etc.

We could design an online experiment where we increase diversity and record the user behavior in response to the change. This might be something like preventing the website from showing a user a movie too many times. If movielens suggests a movie to you and you don't watch or rate it within a certain period of time, then that movie will be removed from your list of recommendations. That way, we are keeping the selection fresh for the user, always providing them with something new to watch.

We could also implement changes to our system surrounding novelty and roll them out to a small group of users. A user is most likely aware of the incredibly popular movies, and if they haven't rated them yet, chances are they don't want to see them. Therefore, we don't want to suggest these movies to the users. We will put less weighting on the popular movies, and more weighting on the less popular movies, so that these appear more frequently in the recommendations. This will enhance the user experience by providing potentially more useful recommendations. We can measure how useful those recommendations are by observing their behavior on the site. If they rate the less popular movies we suggest, then we may be able to conclude that this approach is effective in meeting our business need.

Reference: 1. Suboojitha Sridharan,Introducing Serendipity In Recommender Systems Though Collaborative Methods,(2014). Open Access Master's Theses. Paper 453.http://digitalcommons.uri.edu/theses/453

UserID	Number.of.less.serendipitous.movies
255	1
342	16
514	16
854	9
194	17
843	14
885	15
591	12
555	10
58	9
195	11
167	13
617	7
344	10
693	11
447	14
637	10
910	10
336	6
921	11

UserID	Number.of.less.serendipitous.movies
255	1
342	16
514	16
854	9
194	17
843	14
885	15
591	12
555	10
58	9
195	11
167	13
617	7
344	10
693	11
447	14
637	10
910	10
336	6
921	11

UserID	Number.of.less.serendipitous.movies
255	1
342	16
514	16
854	9
194	17
843	14
885	15
591	12
555	10
58	9
195	11
167	13
617	7
344	10
693	11
447	14
637	10
910	10
336	6
921	11