DATA643 Project4: Accuracy and Beyond

In this project, the accuracy of five recommender system algorithms will be compared against the offline data.

Serendipity will be introduced to the user-based collaborative filtering model to improve the user experience.

Load packages

1. Load the data

data(MovieLense)

2. Explore the data

2.1 Distribution of ratings

#distribution of ratings
vector_ratings <- as.vector(MovieLense@data)
(rating_frq <- table(vector_ratings))

## vector_ratings
##       0       1       2       3       4       5 
## 1469760    6059   11307   27002   33947   21077

qplot(vector_ratings) + ggtitle("Distribution of the ratings") + labs(x = "Score")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There are a lot of missing values(0). The most common rating is 4.

2.2 Select the most relevant data

#selece only the users who have rated at least 50 movies and movies that had been rated more than 100 times
(ratings_movies <- MovieLense[rowCounts(MovieLense) > 50,
colCounts(MovieLense) > 100])

## 560 x 332 rating matrix of class 'realRatingMatrix' with 55298 ratings.

2.3 Distribution of user rating mean and items and distribution of number of rated items and number of ratings a item has

#convert the data to realRatingMatrix
ml <- as(MovieLense@data, "realRatingMatrix")

#distribution of user rating means
user_summary <-  as.data.frame(cbind('mean'=rowMeans(ratings_movies),'number'=rowCounts(ml)))

## Warning in cbind(mean = rowMeans(ratings_movies), number = rowCounts(ml)):
## number of rows of result is not a multiple of vector length (arg 1)

user_summary <-as.data.frame(sapply(user_summary, function(x) as.numeric(as.character(x))))

#distribution of movie rating means
item_summary <- as.data.frame(cbind('mean'=colMeans(ml), 'number'=colCounts(ratings_movies)))

## Warning in cbind(mean = colMeans(ml), number = colCounts(ratings_movies)):
## number of rows of result is not a multiple of vector length (arg 2)

item_summary <-as.data.frame(sapply(item_summary, function(x) as.numeric(as.character(x))))

par(mfrow=c(1,2))

#distribution 1
ggplot(user_summary,aes(mean)) +
  geom_histogram(binwidth = 0.05,col='white',fill="orchid3") + labs(x = "Average User Rating", y = "Count of Movies", title = "Distribution 1") + geom_vline(xintercept = mean(user_summary$mean),col='grey',size=2)

#distribution 2
ggplot(item_summary,aes(mean)) +
  geom_histogram(binwidth = 0.05,col='white',fill="seagreen3") + labs(x = "Average Movie Rating", y = "Count of Movies", title = "Distribution 2") + geom_vline(xintercept = mean(item_summary$mean),col='grey',size=2)

#distribution 3 
ggplot(user_summary,aes(number)) +
  geom_histogram(binwidth = 0.8,fill="violetred2") + labs(x = "Count of Rated Items", y = "Count of Users", title = "Distribution 3")

#distribution 4
ggplot(item_summary,aes(number)) +
  geom_histogram(binwidth = 0.8,fill="turquoise2") + labs(x = "Count of Ratings per Movie", y = "Count of Movies", title = "Distribution 4")

From the figures, we can see:

1) Distribution 1 contains the rating means of users and distribution 2 contains the rating means of movies. We can see that the distributions are normal with a grey line indicating the mean.

2) Distribution 3 contains the number of movies rated by each user. Distribution 4 contains the number of times each movie was rated. We can see that both distributions are right skewed, suggesting that many movies have been rated by only a few users.

The movies which have been rated less than 100 times will be removed for our analysis going forward.

2.4 Viewing the matrix by building a heat map whose colors represent the ratings

image(ratings_movies, main = "Heatmap of the rating matrix")

# select the top users and movies
top_movies <- quantile(rowCounts(ratings_movies), 0.98)
top_users <- quantile(colCounts(ratings_movies), 0.98)

#heatmap of the top users and movies
image(ratings_movies[rowCounts(ratings_movies) > top_movies,
colCounts(ratings_movies) > top_users], main = "Heatmap of the top
users and movies")

3.Normalizing the data

ratings_movies_nor <- normalize(ratings_movies)
getRatingMatrix(ratings_movies_nor)[1:5,1:5]

## 5 x 5 sparse Matrix of class "dgCMatrix"
##   Toy Story (1995) GoldenEye (1995) Get Shorty (1995)
## 1       1.17341040       -0.8265896        -0.8265896
## 2       0.07317073        .                 .        
## 3       .                 .                 .        
## 5       0.71739130       -0.2826087         .        
## 6       0.38194444        .                 .        
##   Twelve Monkeys (1995) Babe (1995)
## 1             0.1734104  -2.8265896
## 2             .           .        
## 3             .           .        
## 5             .           .        
## 6            -1.6180556   0.3819444

4. Binarizing the data

(mean(getRatingMatrix(ratings_movies),na.rm=TRUE))

## [1] 1.102705

ratings_movies_bi <- binarize(ratings_movies, minRating = 1)
getRatingMatrix(ratings_movies_bi)

## itemMatrix in sparse format with
##  560 rows (elements/transactions) and
##  332 columns (items)

5. Comparing models

5.1 Evaluating the recommendations

# use the minimum number of items purchased by any user tp decide item number to keep
(min(rowCounts(ratings_movies)))

## [1] 18

n_fold <- 4
items_to_keep <- 15
rating_threshold <- 3

# Use k-fold to validate models
eval_sets <- evaluationScheme(data = ratings_movies, method = "cross-validation",k = n_fold, given = items_to_keep, goodRating = rating_threshold)

models  <- list(
  RANDOM = list(name = "RANDOM", param = NULL),
  POPULAR = list(name = "POPULAR", param = NULL),
  IBCF=list(name="IBCF",param=list(method = "cosine")),
  UBCF=list(name="UBCF", param=list(method = "cosine")),
  SVD=list(name="SVD", param=list(k =100))
)

# varying the number of items we want to recommend to users
n_rec <- c(1, 5, seq(10, 100, 10))

# evaluating the recommendations
results <- evaluate(x = eval_sets, method = models, n= n_rec)

## RANDOM run fold/sample [model time/prediction time]
##   1  [0.02sec/0.06sec] 
##   2  [0sec/0.04sec] 
##   3  [0sec/0.07sec] 
##   4  [0sec/0.06sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  [0.01sec/0.17sec] 
##   2  [0sec/0.18sec] 
##   3  [0sec/0.2sec] 
##   4  [0sec/0.19sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.25sec/0.06sec] 
##   2  [0.23sec/0.05sec] 
##   3  [0.22sec/0.06sec] 
##   4  [0.23sec/0.05sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.2sec] 
##   2  [0sec/0.19sec] 
##   3  [0.02sec/0.19sec] 
##   4  [0sec/0.21sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.28sec/0.05sec] 
##   2  [0.31sec/0.05sec] 
##   3  [0.28sec/0.05sec] 
##   4  [0.29sec/0.05sec]

# extract the related average confusion matrices
(avg_matrices <- lapply(results, avg))

## $RANDOM
##             TP         FP       FN       TN precision      recall
## 1    0.3285714  0.6714286 72.70536 243.2946 0.3285714 0.004751494
## 5    1.3910714  3.6089286 71.64286 240.3571 0.2782143 0.018800523
## 10   2.6339286  7.3660714 70.40000 236.6000 0.2633929 0.034771013
## 20   5.0107143 14.9892857 68.02321 228.9768 0.2505357 0.067270668
## 30   7.2678571 22.7321429 65.76607 221.2339 0.2422619 0.099457029
## 40   9.4553571 30.5446429 63.57857 213.4214 0.2363839 0.129299255
## 50  11.6428571 38.3571429 61.39107 205.6089 0.2328571 0.159635206
## 60  13.8642857 46.1357143 59.16964 197.8304 0.2310714 0.189704063
## 70  16.1660714 53.8339286 56.86786 190.1321 0.2309439 0.222247748
## 80  18.4089286 61.5910714 54.62500 182.3750 0.2301116 0.252360573
## 90  20.8160714 69.1839286 52.21786 174.7821 0.2312897 0.285477541
## 100 23.0553571 76.9446429 49.97857 167.0214 0.2305536 0.315768747
##             TPR         FPR
## 1   0.004751494 0.002705416
## 5   0.018800523 0.014611863
## 10  0.034771013 0.029935347
## 20  0.067270668 0.061085528
## 30  0.099457029 0.092975963
## 40  0.129299255 0.125099069
## 50  0.159635206 0.157232959
## 60  0.189704063 0.189164756
## 70  0.222247748 0.220800022
## 80  0.252360573 0.252616528
## 90  0.285477541 0.283744918
## 100 0.315768747 0.315513966
## 
## $POPULAR
##             TP         FP       FN       TN precision     recall
## 1    0.7196429  0.2803571 72.31429 243.6857 0.7196429 0.01197822
## 5    2.7767857  2.2232143 70.25714 241.7429 0.5553571 0.04195815
## 10   5.0928571  4.9071429 67.94107 239.0589 0.5092857 0.07457286
## 20   9.1732143 10.8267857 63.86071 233.1393 0.4586607 0.13214219
## 30  12.0910714 17.9089286 60.94286 226.0571 0.4030357 0.17255224
## 40  15.0964286 24.9035714 57.93750 219.0625 0.3774107 0.21423751
## 50  18.0196429 31.9803571 55.01429 211.9857 0.3603929 0.25188087
## 60  21.0285714 38.9714286 52.00536 204.9946 0.3504762 0.29300278
## 70  23.5625000 46.4375000 49.47143 197.5286 0.3366071 0.32675936
## 80  26.0839286 53.9160714 46.95000 190.0500 0.3260491 0.36092015
## 90  28.5267857 61.4732143 44.50714 182.4929 0.3169643 0.39424370
## 100 31.1071429 68.8928571 41.92679 175.0732 0.3110714 0.42835442
##            TPR         FPR
## 1   0.01197822 0.001038530
## 5   0.04195815 0.008491175
## 10  0.07457286 0.018876969
## 20  0.13214219 0.042216412
## 30  0.17255224 0.070867267
## 40  0.21423751 0.099180570
## 50  0.25188087 0.127731178
## 60  0.29300278 0.155893341
## 70  0.32675936 0.186294041
## 80  0.36092015 0.216771486
## 90  0.39424370 0.247690674
## 100 0.42835442 0.277831869
## 
## $IBCF
##             TP         FP       FN       TN precision      recall
## 1    0.3428571  0.6571429 72.69107 243.3089 0.3428571 0.004784149
## 5    1.4589286  3.5410714 71.57500 240.4250 0.2917857 0.019239660
## 10   2.8267857  7.1732143 70.20714 236.7929 0.2826786 0.038530232
## 20   5.4267857 14.5732143 67.60714 229.3929 0.2713393 0.073704328
## 30   7.9517857 22.0482143 65.08214 221.9179 0.2650595 0.110055096
## 40  10.4982143 29.5017857 62.53571 214.4643 0.2624554 0.146730724
## 50  13.0250000 36.9750000 60.00893 206.9911 0.2605000 0.182895654
## 60  15.5732143 44.4267857 57.46071 199.5393 0.2595536 0.221087871
## 70  18.1071429 51.8928571 54.92679 192.0732 0.2586735 0.256525401
## 80  20.5339286 59.4660714 52.50000 184.5000 0.2566741 0.291873604
## 90  23.0625000 66.9375000 49.97143 177.0286 0.2562500 0.328242675
## 100 25.6142857 74.3750000 47.41964 169.5911 0.2561668 0.366631295
##             TPR         FPR
## 1   0.004784149 0.002654013
## 5   0.019239660 0.014292539
## 10  0.038530232 0.029053575
## 20  0.073704328 0.059207686
## 30  0.110055096 0.089775002
## 40  0.146730724 0.120306965
## 50  0.182895654 0.150919378
## 60  0.221087871 0.181383035
## 70  0.256525401 0.211913728
## 80  0.291873604 0.242983796
## 90  0.328242675 0.273575665
## 100 0.366631295 0.304125911
## 
## $UBCF
##             TP         FP       FN       TN precision     recall
## 1    0.6714286  0.3285714 72.36250 243.6375 0.6714286 0.01192375
## 5    2.8696429  2.1303571 70.16429 241.8357 0.5739286 0.04747620
## 10   5.2035714  4.7964286 67.83036 239.1696 0.5203571 0.08332505
## 20   9.0892857 10.9107143 63.94464 233.0554 0.4544643 0.13942781
## 30  12.5857143 17.4142857 60.44821 226.5518 0.4195238 0.18952017
## 40  15.8232143 24.1767857 57.21071 219.7893 0.3955804 0.23425245
## 50  18.6178571 31.3821429 54.41607 212.5839 0.3723571 0.27270109
## 60  21.3892857 38.6107143 51.64464 205.3554 0.3564881 0.31059864
## 70  23.9803571 46.0196429 49.05357 197.9464 0.3425765 0.34651964
## 80  26.3571429 53.6428571 46.67679 190.3232 0.3294643 0.37714342
## 90  28.7464286 61.2535714 44.28750 182.7125 0.3194048 0.40802008
## 100 31.0160714 68.9839286 42.01786 174.9821 0.3101607 0.43747336
##            TPR         FPR
## 1   0.01192375 0.001256794
## 5   0.04747620 0.008251025
## 10  0.08332505 0.018689591
## 20  0.13942781 0.042947002
## 30  0.18952017 0.068995540
## 40  0.23425245 0.096211850
## 50  0.27270109 0.125431796
## 60  0.31059864 0.154775270
## 70  0.34651964 0.184848790
## 80  0.37714342 0.215934396
## 90  0.40802008 0.246919899
## 100 0.43747336 0.278566676
## 
## $SVD
##             TP         FP       FN       TN precision      recall
## 1    0.3839286  0.6160714 72.65000 243.3500 0.3839286 0.006832558
## 5    1.7178571  3.2821429 71.31607 240.6839 0.3435714 0.026314440
## 10   3.2803571  6.7196429 69.75357 237.2464 0.3280357 0.049360758
## 20   6.2142857 13.7857143 66.81964 230.1804 0.3107143 0.092362805
## 30   8.9053571 21.0946429 64.12857 222.8714 0.2968452 0.129903637
## 40  11.4928571 28.5071429 61.54107 215.4589 0.2873214 0.166361204
## 50  14.0250000 35.9750000 59.00893 207.9911 0.2805000 0.201502311
## 60  16.3946429 43.6053571 56.63929 200.3607 0.2732440 0.233678400
## 70  18.7678571 51.2321429 54.26607 192.7339 0.2681122 0.267187336
## 80  21.0750000 58.9250000 51.95893 185.0411 0.2634375 0.297843591
## 90  23.4053571 66.5946429 49.62857 177.3714 0.2600595 0.329361873
## 100 25.7375000 74.2625000 47.29643 169.7036 0.2573750 0.361183758
##             TPR         FPR
## 1   0.006832558 0.002488234
## 5   0.026314440 0.013262865
## 10  0.049360758 0.027167188
## 20  0.092362805 0.055937053
## 30  0.129903637 0.085732737
## 40  0.166361204 0.116040518
## 50  0.201502311 0.146505653
## 60  0.233678400 0.177633129
## 70  0.267187336 0.208826591
## 80  0.297843591 0.240256589
## 90  0.329361873 0.271656799
## 100 0.361183758 0.303008329

To interpret these results, this key may be helpful:

. True Positives (TP): These are recommended items that have been rated

. False Positives (FP): These are recommended items that haven’t been rated . False Negatives(FN): These are not recommended items that have been rated

. True Negatives (TN): These are not recommended items that haven’t been rated

. True Positive Rate (TPR): This is the percentage of purchased items that have been recommended. It’s the number of TP divided by the number of purchased items (TP + FN).

. False Positive Rate (FPR): This is the percentage of not purchased items that have been recommended. It’s the number of FP divided by the number of not purchased items (FP + TN).

5.2 Evaluating the ratings

recommender_random <- Recommender(data = getData(eval_sets, "train"),
method = "RANDOM",parameter = NULL)

recommender_popular <- Recommender(data = getData(eval_sets, "train"),
method = "POPULAR",parameter = NULL)

recommender_ibcf <- Recommender(data = getData(eval_sets, "train"),
method = "IBCF",parameter = list(method = "cosine"))

recommender_ubcf <- Recommender(data = getData(eval_sets, "train"),
method = "UBCF",parameter = list(method = "cosine"))

recommender_svd <- Recommender(data = getData(eval_sets, "train"),
method = "SVD",parameter = list(k =100))

items_to_recommend <- 10

eval_prediction_random <- predict(object = recommender_random, newdata =
getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

eval_prediction_popular <- predict(object = recommender_popular, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

eval_prediction_ibcf <- predict(object = recommender_ibcf, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

eval_prediction_ubcf <- predict(object = recommender_ubcf, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

eval_prediction_svd <- predict(object = recommender_svd, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

# compare RMSE for different models
######################RANDOM######################
eval_accuracy_random <- calcPredictionAccuracy(
x = eval_prediction_random, data = getData(eval_sets, "unknown"), byUser = F)

eval_accuracy_random_user <- calcPredictionAccuracy(
x = eval_prediction_random, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_random_user)

##          RMSE      MSE       MAE
## [1,] 1.451276 2.106202 1.1151512
## [2,] 1.337349 1.788503 1.0308103
## [3,] 1.093439 1.195610 0.7633556
## [4,] 1.098184 1.206007 0.8396755
## [5,] 1.561598 2.438587 1.2467081
## [6,] 1.472916 2.169481 1.1639751

qplot(eval_accuracy_random_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the Random RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

######################POPULAR######################
eval_accuracy_popular <- calcPredictionAccuracy(
x = eval_prediction_popular, data = getData(eval_sets, "unknown"), byUser =F) 

eval_accuracy_popular_user <- calcPredictionAccuracy(
x = eval_prediction_popular, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_popular_user)

##         RMSE       MSE       MAE
## 2  0.6275100 0.3937688 0.5499331
## 14 1.1574183 1.3396171 0.8707250
## 16 0.7310052 0.5343686 0.5265590
## 18 0.8019874 0.6431838 0.6976061
## 21 1.0927126 1.1940207 0.9193394
## 23 0.8127401 0.6605464 0.6502611

qplot(eval_accuracy_popular_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the Popular RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

######################IBCF######################
eval_accuracy_ibcf <- calcPredictionAccuracy(
x = eval_prediction_ibcf, data = getData(eval_sets, "unknown"), byUser = F)

eval_accuracy_ibcf_user <- calcPredictionAccuracy(
x = eval_prediction_ibcf, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_ibcf_user)

##        RMSE      MSE       MAE
## 2  1.177366 1.386191 1.0237338
## 14 1.287356 1.657287 0.9344477
## 16 1.065587 1.135475 0.6739815
## 18 1.050587 1.103733 0.8178706
## 21 1.377252 1.896823 1.0701019
## 23 1.105258 1.221595 0.7758693

qplot(eval_accuracy_ibcf_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the IBCF RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

######################UBCF######################
eval_accuracy_ubcf <- calcPredictionAccuracy(
x = eval_prediction_ubcf, data = getData(eval_sets, "unknown"), byUser = F)

eval_accuracy_ubcf_user <- calcPredictionAccuracy(
x = eval_prediction_ubcf, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_ubcf_user)

##         RMSE       MSE       MAE
## 2  0.6675509 0.4456242 0.5819348
## 14 1.1266948 1.2694411 0.8646379
## 16 0.7847265 0.6157957 0.6119964
## 18 0.8327871 0.6935343 0.7192054
## 21 1.2129192 1.4711730 1.0343360
## 23 0.9014106 0.8125410 0.7577947

qplot(eval_accuracy_ubcf_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the IBCF RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

######################SVD######################
eval_accuracy_svd <- calcPredictionAccuracy(
x = eval_prediction_svd, data = getData(eval_sets, "unknown"), byUser = F)

eval_accuracy_svd_user <- calcPredictionAccuracy(
x = eval_prediction_svd, data = getData(eval_sets, "unknown"), byUser = TRUE)

head(eval_accuracy_svd_user)

##         RMSE       MSE       MAE
## 2  0.7653761 0.5858006 0.6283317
## 14 1.1314783 1.2802431 0.8630623
## 16 0.8101806 0.6563926 0.6521109
## 18 0.8446688 0.7134654 0.7260224
## 21 1.2770420 1.6308361 1.0812567
## 23 0.9561779 0.9142761 0.8117162

qplot(eval_accuracy_svd_user[, "RMSE"]) + geom_histogram(binwidth = 0.05) + ggtitle("Distribution of the IBCF RMSE by user")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

kable(RMSE_COMPARE <- rbind(eval_accuracy_random,eval_accuracy_popular,eval_accuracy_ibcf,eval_accuracy_ubcf,eval_accuracy_svd))

	RMSE	MSE	MAE
eval_accuracy_random	1.3191730	1.7402174	1.0385862
eval_accuracy_popular	0.9334270	0.8712860	0.7312318
eval_accuracy_ibcf	1.2042800	1.4502903	0.8944834
eval_accuracy_ubcf	0.9679624	0.9369512	0.7600586
eval_accuracy_svd	1.0023309	1.0046673	0.7925106

RMSE_COMPARE <- as.data.frame(RMSE_COMPARE)
RMSE_COMPARE$model <- c('Random','Popular','IBCF','UBCF','SVD')

ggplot(RMSE_COMPARE, aes(model,RMSE))+ geom_point(colour = "blue", size = 3)

Above, we chose to look at the RMSE scores both by user and not by user mostly for demonstration purposes. You can see that by user, the RMSE values don’t give us the information we need. We want to look at the performance of the entire model, not the performance of the model for each individual. Therefore, using byUser = FALSE is the recommended approach.

We can see that the popular recommender is our best performer, with user-based collaborative filtering coming in second place, and singular value decomposition coming in third.

Serendipity, novelty, and diversity

If we want to go beyond accuracy of a model, we can start looking at measures such as serendipity, novelty, and diversity. There is a lot of research out there now stating that an accurate model is not necessarily a good one, particularly if the recommendations aren’t useful to the user.

In the case of serendipity, a formal definition hasn’t been created but we can think of it as the model predicting something unexpected and useful that the user was not specifically searching for. This does not necessarily improve the accuracy of the model, but it improves the quality of the predictions the model makes.

Novelty is a metric rooted in popularity. In our case, if a movie is very popular, there’s a good chance a lot of our users have seen it. In the case of a recommendation system, we would want to decrease the weight of these popular shows and increase the weight of shows that are less popular but similar to the movies the user has watched before.

Diversity is our final metric, based on keeping recommendations fresh for the user. Users like new recommendations, and are going to get bored with the same ones over and over. One way to implement this in a recommender system would be to shuffle the top recommendations. That way, they’re still relevant, but the user isn’t seeing the same recommendations over and over.

6.Serendipity

6.1 Build a model by combinding user-based and genre_based method

# convert data to realRatingMatrix format
mls <- as(ratings_movies,"realRatingMatrix")

items_genre_all <- MovieLenseMeta[,-c(2,3,4)]
items_genre_movie <- semi_join(data.frame('title'=colnames(ratings_movies)), items_genre_all, by = "title")
items_genre <- semi_join(items_genre_all, items_genre_movie, by = "title")

rating_m <- as.matrix(getRatingMatrix(ratings_movies))

# number of user, genre, and movie
nuser <- length(rating_m[,1])
ngenre <- length(items_genre[1,])-1
nmovie <- length(items_genre[,1])

# Compute average rating of each genre for a given user
user_genre <- data.frame(matrix(0, nrow = nuser, ncol = ngenre))

for (i in 1:nuser){
  for(j in 1:ngenre){
    v1 <- as.matrix(rating_m[i,])
    v2 <- as.matrix(items_genre[,(j+1)])
    user_genre[i,j] <- (t(v1) %*% v2)/sum(v2)
  }
}

# Compute average rating of each item based on average rating for each genre 
rating_genre_based <- data.frame(matrix(0, nrow = nuser, ncol =nmovie))
for (i in 1:nuser){
  for(j in 1:nmovie){
    v1 <- as.matrix(user_genre[i,])
    v2 <- as.matrix(items_genre[j,2:(ngenre+1)])
    rating_genre_based[i,j] <- (v1 %*% t(v2))/sum(v2)
  }
}

rownames(rating_genre_based) <- rownames(rating_m)
colnames(rating_genre_based) <- colnames(rating_m)
rownames(user_genre) <- rownames(rating_m)
colnames(user_genre) <- colnames(items_genre[,2:19])

# Select a random set of users
set.seed(1)
users <- sample(rownames(getRatingMatrix(ratings_movies)),size=20)

# This rating profile based on genre shows the aggregated inclination of each user towards movie genres. Assumming that users like similar items, movies that are most similar to a user's preference for an movie's feature could be recommended.

#convert the predicted ratings based on genre to binary 
avg_user_genre <- as.matrix(rowMeans(user_genre))
user_genre_bi <- user_genre - t(avg_user_genre)
user_genre_bi[user_genre_bi < 0] <- 0
user_genre_bi[user_genre_bi > 0] <- 1

#find the most similar movies for a selected user 
profile <- user_genre_bi[users[1],] 
colnames(profile) <- colnames(items_genre[,2:19])
sim_matrix <- rbind.data.frame(profile, items_genre[,2:19])
sim_matrix <- data.frame(lapply(sim_matrix,function(x) as.integer(x))) #convert data to type integer

#Calculate Jaccard distance between user profile and all movies
similarity <- dist(sim_matrix, method = "Jaccard")
similarity <- as.data.frame(as.matrix(similarity[1:333]))
rows <- which(similarity == min(similarity))
#Recommended movies
print(paste('Recommend use with userID', users[1],'the following movies:', items_genre[rows-1,1]))

## [1] "Recommend use with userID 255 the following movies: GoodFellas (1990)"          
## [2] "Recommend use with userID 255 the following movies: George of the Jungle (1997)"
## [3] "Recommend use with userID 255 the following movies: Fly Away Home (1996)"

6.2 Expected list of Recommendations (User-Based Collaborative Filtering)

Recommend 10 movies to 20 randomly chose users and create recommendation list RS.

# Build a User-based recommendation model 
rec <- Recommender(mls,method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=1))

## Warning: Unknown parameters: minRating

## Available parameter (with default values):
## method    =  cosine
## nn    =  25
## sample    =  FALSE
## normalize     =  center
## verbose   =  FALSE

# Creat top-10 recommendation lists for five users
recom_movie <- predict(rec, ratings_movies[users], n=10)
movie_rec <- as(recom_movie, "list")
movie_ubcf <- data.frame(movie_rec[1:20])
colnames(movie_ubcf) <- users
kable(movie_ubcf[,1:5], caption = "Recommend 10 movies to users")

Recommend 10 movies to users
255	342	514	854	194
Dead Man Walking (1995)	Shawshank Redemption, The (1994)	Trainspotting (1996)	Wrong Trousers, The (1993)	Reservoir Dogs (1992)
Godfather, The (1972)	Schindler’s List (1993)	Babe (1995)	GoodFellas (1990)	Shining, The (1980)
Face/Off (1997)	Godfather, The (1972)	Lion King, The (1994)	Close Shave, A (1995)	Willy Wonka and the Chocolate Factory (1971)
Close Shave, A (1995)	Cinema Paradiso (1988)	Sound of Music, The (1965)	Platoon (1986)	Rear Window (1954)
Raiders of the Lost Ark (1981)	Toy Story (1995)	Primal Fear (1996)	Godfather: Part II, The (1974)	Terminator 2: Judgment Day (1991)
Aliens (1986)	Much Ado About Nothing (1993)	American President, The (1995)	Raging Bull (1980)	People vs. Larry Flynt, The (1996)
Sling Blade (1996)	Vertigo (1958)	Pulp Fiction (1994)	Clockwork Orange, A (1971)	Wrong Trousers, The (1993)
Abyss, The (1989)	Full Monty, The (1997)	Sling Blade (1996)	To Kill a Mockingbird (1962)	Contact (1997)
Mr. Holland’s Opus (1995)	Strictly Ballroom (1992)	Raising Arizona (1987)	Face/Off (1997)	Good Will Hunting (1997)
Star Trek: The Wrath of Khan (1982)	Piano, The (1993)	Fried Green Tomatoes (1991)	Back to the Future (1985)	Face/Off (1997)

# Creat the recommendation list RS.
d <- gather(movie_ubcf,user,movie,1:20)

## Warning: attributes are not identical across measure variables; they will
## be dropped

rs <- unique(as.character(d[,2])) 
rs_df<- data.frame('Expected_list_UBCF'=rs)
kable(head(rs_df))

Expected_list_UBCF

Dead Man Walking (1995)
Godfather, The (1972)
Face/Off (1997)
Close Shave, A (1995)
Raiders of the Lost Ark (1981) Aliens (1986)

6.3 Unexpected list of Recommendations

Create the PPM list for top-119 movies using a primitive recommendation method

# Primitive recommendation is based on picking top movies that has got the ratings from largest number of users and movies that got the highest average ratings.

# number of ratings for each item
number_of_users_for_items <- as.data.frame(colCounts(mls))
number_of_users_for_items[,1] <-number_of_users_for_items[order(-number_of_users_for_items[,1]),] 
colnames(number_of_users_for_items) <- c('movie')

# Compute the movie IDs with the largest number of users.
avg_rating_for_items <- as.data.frame(colMeans(mls))
avg_rating_for_items[,1] <-avg_rating_for_items[order(-avg_rating_for_items[,1]),]
colnames(avg_rating_for_items) <- c('movie')

topN_PPM <- unique(c(rownames(number_of_users_for_items),rownames(avg_rating_for_items)))

# Get the PPM list based on a primitive recommendation method. Here, top 119 movies are selected based on the highest average rating and the largest number of users for each movie.
number_of_rankings_requested = 119
top119_PPM <- topN_PPM[1:119]

# Unexpected list of Recommendations is the list of items that are in RS list but not in PPM.
top119_PPM_df <- data.frame('movie'= top119_PPM)
top119_PPM_df <- data.frame(lapply(top119_PPM_df, as.character), stringsAsFactors=FALSE)
rs_df <- data.frame('movie'= rs)
rs_df <- data.frame(lapply(rs_df, as.character), stringsAsFactors=FALSE)
unexp <- anti_join(rs_df,top119_PPM_df,by.x= 'movie', by.y='movie')

## Joining, by = "movie"

unexp_df <- data.frame('Unexpected_list'=unexp)
kable(head(unexp_df,n=10))

movie

Face/Off (1997)
Close Shave, A (1995)
Sling Blade (1996)
Star Trek: The Wrath of Khan (1982) Schindler’s List (1993)
Vertigo (1958)
Full Monty, The (1997)
Strictly Ballroom (1992)
Piano, The (1993)
Trainspotting (1996)

6.4 Predict serendipity list

# retrive the predict ratings generated by UBCF model
recom_rating <- predict(rec, ratings_movies[users], type="ratings")
movie_predict <- as(recom_rating, "list")
predict_rating <- as(recom_rating, "matrix") 
predict_rating <- as.data.frame(predict_rating)
head(predict_rating[,1:4])

##     Toy Story (1995) GoldenEye (1995) Get Shorty (1995)
## 255         3.115311         2.854167          2.831789
## 342         4.140593         3.293230                NA
## 514               NA         3.900019                NA
## 854               NA         3.079813                NA
## 194               NA         3.017854                NA
## 843               NA         2.813357          2.560468
##     Twelve Monkeys (1995)
## 255                    NA
## 342                    NA
## 514                    NA
## 854                    NA
## 194                    NA
## 843                    NA

nameList <- unexp[,1]
predict_rating <- predict_rating[,colnames(predict_rating) %in% nameList] 

# Subset the rating matrix to get the predictions for randomly selected users.
predict_rating_sub <- as.data.frame(predict_rating[,rownames(predict_rating) %in% users])
predict_rating_sub$user <- rownames(predict_rating_sub)
predict_rating_sub <- predict_rating_sub[,c(62,1:61)]
predict_rating_sub$user <- sapply(predict_rating_sub$user, as.numeric)

# Compute the list of useful items from the unexpected set of movies.
# Usefulness is when the ratings of movies > 2.5
predict_rating_sub_long <- gather(predict_rating_sub,movie,rating,2:62) 

usefulness <- predict_rating_sub_long[which(predict_rating_sub_long$rating>2.5),]
usefulness <- usefulness[order(usefulness$user),]
kable(head(usefulness),caption="Useful_List")

Useful_List
	user	movie	rating
50	58	Patton (1970)	3.902803
70	58	Young Frankenstein (1974)	4.308861
170	58	Last of the Mohicans, The (1992)	4.062107
230	58	Fifth Element, The (1997)	3.653026
250	58	Contact (1997)	4.316963
310	58	Leaving Las Vegas (1995)	4.258151

6.5 Improve serendipity List

# subset item_genre matrix to small data set which contains only the movies in serendipity list
items_genre_imt <- semi_join(items_genre_all,data.frame('title'=unique(usefulness$movie)), by = "title")

# Compute average rating of each genre for a given user
v1 <- as.matrix(predict_rating_sub[,2:62]) 
v1[is.na(v1)] <- 0
user_genre_imt <- v1 %*% as.matrix(items_genre_imt[,2:19])
v3 <- colSums(items_genre_imt[,2:19])
v4 <- matrix(rep((1/v3),18),18,20)

## Warning in matrix(rep((1/v3), 18), 18, 20): data length [324] is not a sub-
## multiple or multiple of the number of columns [20]

v4 <- t(v4)
user_genre_imt <- user_genre_imt[,-7] # Genre Documentary has no rating, so we remove this genre
v4 <- v4[,-7] # remove genre Documentary 
user_genre_sp <- user_genre_imt * v4 # average rating of each genre 

# Compute average rating of each item based on average rating for each genre 
items_genre_sp <- items_genre_imt[,-c(1,8)] # remove genre Documentary 
rating_genre_based_sp <- user_genre_sp %*% t(items_genre_sp)#average rating of each item based on genre 
colnames(rating_genre_based_sp) <-items_genre_imt$title
kable(head(rating_genre_based_sp[,1:6]),caption="Improved_Serendipity_List")

Improved_Serendipity_List
	Patton (1970)	Young Frankenstein (1974)	Last of the Mohicans, The (1992)	Fifth Element, The (1997)	Contact (1997)	Leaving Las Vegas (1995)
255	5.364220	4.789906	8.030055	5.480612	5.688661	5.662999
342	5.444992	3.014967	6.515864	1.748378	3.015978	4.802111
514	4.717684	4.357967	8.114585	5.545373	4.926453	4.834792
854	3.844094	5.769871	6.211448	5.116786	4.981673	4.318114
194	2.326376	2.027171	2.993236	2.316702	2.149810	1.740571
843	2.731897	1.566843	4.475721	4.101452	3.434087	2.513222

6.6 Compare the serendipity list and the improved serendipity list based on by genre

predict_rating_sub_1 <- predict_rating_sub[,-1]
# rerange the item sequence of serendipity list 
predict_rating_sub_1 <- predict_rating_sub_1 [colnames(rating_genre_based_sp)]
serendipity <- predict_rating_sub_1
#calculate the difference between serendipity list and the imporved serendipity list and normalize by rating sacle 5.
delta <- ((rating_genre_based_sp - serendipity)/5)*100
#Movies whose delta are below the threshold can be removed from the serendipity list. Here set the threshold as -5%
less_serendipitous <- delta
less_serendipitous$less <- rowSums(less_serendipitous < -5, na.rm=T)
less_serendipitous <- data.frame('UserID'=rownames(less_serendipitous), 'Number of less serendipitous movies'=less_serendipitous[,62])
kable(less_serendipitous )

UserID	Number.of.less.serendipitous.movies
255	1
342	16
514	16
854	9
194	17
843	14
885	15
591	12
555	10
58	9
195	11
167	13
617	7
344	10
693	11
447	14
637	10
910	10
336	6
921	11

From the above table, we can see that a significant number of movies are considered less serendipitous based on the above threshold,

delta$UserID <- rownames(delta)
delta_long <- gather(delta,movie,delta_rating,1:61)
delta_long$UserID <- sapply(delta_long$UserID, as.numeric)
ceiling <- max(delta_long$delta_rating,na.rm=T)
bottom <- min(delta_long$delta_rating,na.rm=T)
ggplot(delta_long,aes(x=UserID,y=delta_rating))+geom_point(fill="blue", shape=21, size  = 2) +ylim(bottom,ceiling)+geom_hline(yintercept = -5,linetype="dashed",size=1.5)+ggtitle("Serendipity List of Movies")

## Warning: Removed 338 rows containing missing values (geom_point).

The figure above shows the serendipity list of movies generated by a standard collaborative filtering method for 20 randomly selected users. The differences in useful ratings and genre-based ratings below 5% are considered less serendipitous. A significant fraction of movies has been identified as less serendipitous items based on the algorithm used here.

actual <- as.data.frame(as.matrix(getRatingMatrix(ratings_movies)))
actual$userid<-rownames(actual)
actual_long <- gather(actual, title, rating,1:332)
actual_rating <- semi_join(actual_long,data.frame('title'=unique(usefulness$movie)), by = "title")
actual_rating <- actual_rating[which(actual_rating$userid %in% c(users)),]
actual_rating$userid <- sapply(actual_rating$userid, as.numeric)
actual_rating <- actual_rating[match(delta_long$UserID, actual_rating$userid),]

rating_genre_based_sp_1 <- data.frame(rating_genre_based_sp)
rating_genre_based_sp_1$userid <- rownames(rating_genre_based_sp)
rating_genre_based_sp_long <- gather(rating_genre_based_sp_1, movie,rating,1:61)

rmse(actual_rating$rating,rating_genre_based_sp_long$rating)

## [1] 5.516745

The accuracy decrease after introduce the serendipity list into the UBCF model.

#####Diversity

ml.model <- Recommender(ml[1:900], method = "UBCF")
#top 5 recommendations for user
ml.523.top10 <- as(predict(ml.model, ml["523"], n = 10), "list")

length(ml.523.top10)
sample(ml.523.top10, 5)


r <- Recommender(MovieLense[1:800], method = "POPULAR")
r
getModel(r)$topN

#Recommend 3 movies to 2 users
recom <- predict(r, MovieLense[809:810], n = 10)
as(recom, "list")
length(recom)

rnorm(10)

Conclusion:

After introducing serendipity to our model, we noticed an increase in our RMSE. Though the model is less accurate in one measure, it is potentially more valuable in the eyes of user experience. Trade-offs like this must be made when designing a recommender system, and the designer has to choose what is more valuable.

Offline vs. online experiments

Offline experiments:

These are what we’ve done up to this point, using offline data and simulating user behavior. It is the least risky since there is not threat of driving away users. Therefore, they are the easiest to conduct, due to the fact that interaction with the user is not required.

Online experiments:

Where real user populations interact with the recommender system. These experiments evaluate the performance of the recommender system. The user is oblivious to the conducted experiment in these scenarios. Online experiments are known to be the most trustworthy experiment. Here, we want to look at the change in the behavior of users when interacting with different recommendation systems. In this type of setup, small subsets of the user population are directed toward different systems and then their interactions with the different systems are recorded. A user’s assignment to a system must be random. Being that an online experiment can have a negative impact on the “real” system, you want to make sure that your online experiment is last, after extensive research offline.

A great way to examine the effectiveness of an online experiment is to measure the significance level (p-value) between two algorithms, A and B. This will help us determine whether there was a statistically significant difference in our results or they were due to chance. ANOVA or the Friedman test for ranking can be used for the comparison of multiple algorithms at once.

In the case of our Movielens data set, we could implement online experiments that are rolled out to users of movielens.org. In these online experiments, we want to manipulate variables that will enhance the user experience, leading the user to use the site more, rate more movies, etc.

We could design an online experiment where we increase diversity and record the user behavior in response to the change. This might be something like preventing the website from showing a user a movie too many times. If movielens suggests a movie to you and you don’t watch or rate it within a certain period of time, then that movie will be removed from your list of recommendations. That way, we are keeping the selection fresh for the user, always providing them with something new to watch.

We could also implement changes to our system surrounding novelty and roll them out to a small group of users. A user is most likely aware of the incredibly popular movies, and if they haven’t rated them yet, chances are they don’t want to see them. Therefore, we don’t want to suggest these movies to the users. We will put less weighting on the popular movies, and more weighting on the less popular movies, so that these appear more frequently in the recommendations. This will enhance the user experience by providing potentially more useful recommendations. We can measure how useful those recommendations are by observing their behavior on the site. If they rate the less popular movies we suggest, then we may be able to conclude that this approach is effective in meeting our business need.

Reference: 1. Suboojitha Sridharan,Introducing Serendipity In Recommender Systems Though Collaborative Methods,(2014). Open Access Master’s Theses. Paper 453.http://digitalcommons.uri.edu/theses/453