Evaluating Recommender Systems

Motivation

The purpose here is to build several recommender systems for a dataset and evaluate their performance. When we apply the system to a testing subset of the dataset, the accuracy of several recommender models can be compared.

Data Utilized

The dataset used here is the MovieLense dataset included in the recommenderlab package. The ratings all range between 1 and 5. Given that each user in the dataset has a maximum of 735 rated movies and a minimum of 19 rated movies, we will pick users who have not rated more than 1,400 movies so that we can facilitate better recommendations.

library(recommenderlab)

## Loading required package: Matrix

## Loading required package: arules

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

## Loading required package: proxy

## 
## Attaching package: 'proxy'

## The following object is masked from 'package:Matrix':
## 
##     as.matrix

## The following objects are masked from 'package:stats':
## 
##     as.dist, dist

## The following object is masked from 'package:base':
## 
##     as.matrix

## Loading required package: registry

data(MovieLense)
#ncol(BX) - max(rowCounts(BX))
ML <- MovieLense[ncol(MovieLense) - rowCounts(MovieLense) > 1400]

Data Splitting

To evaluate the accuracy, we split the adjusted dataset into a training set and a testing set. The built-in evaluationScheme is used. The training set includes 80% of the adjusted dataset. The remaining 20% is the testing set. The number of items given for evaluation is 19 because in this adjusted dataset the minimum number of movies rated by a user is 19. The threshold here is the minimum rating that would be considered good which is the average of the highest rating (5) and lowest rating (1).

eval_sets <- evaluationScheme(data = ML, method = "split",
train = 0.8, given = 19, goodRating = 3)

Models Considered

The algorithms used are UBCF and IBCF. For each algorithm, two similarity methods are used - Cosine and Pearson Correlation.

models <- list(
IBCF_cos = list(name = "IBCF", param = list(method = "cosine")),
IBCF_cor = list(name = "IBCF", param = list(method = "pearson")),
UBCF_cos = list(name = "UBCF", param = list(method = "cosine")),
UBCF_cor = list(name = "UBCF", param = list(method = "pearson"))
)

Model Evaluation

Each of the four models is evaluated with each model providing 1 to 19 recommendations per user.

eval_results <- evaluate(x = eval_sets, method = models, n = 1:19)

## IBCF run fold/sample [model time/prediction time]
##   1  [99.681sec/0.221sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [98.633sec/0.335sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.009sec/12.676sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.009sec/18.644sec]

Model Accuracy

The confusion matrix for each model is extracted at each number of recommendations for each model.

## $IBCF_cos
##           TP         FP       FN       TN precision      recall
## 1  0.1445087  0.8554913 51.24277 1592.757 0.1445087 0.001878306
## 2  0.2774566  1.7225434 51.10983 1591.890 0.1387283 0.005060270
## 3  0.3872832  2.6127168 51.00000 1591.000 0.1290944 0.006512879
## 4  0.5260116  3.4739884 50.86127 1590.139 0.1315029 0.008166858
## 5  0.6763006  4.3236994 50.71098 1589.289 0.1352601 0.010604141
## 6  0.8265896  5.1734104 50.56069 1588.439 0.1377649 0.013139478
## 7  1.0173410  5.9826590 50.36994 1587.630 0.1453344 0.018491418
## 8  1.1676301  6.8323699 50.21965 1586.780 0.1459538 0.020878963
## 9  1.3236994  7.6763006 50.06358 1585.936 0.1470777 0.023280548
## 10 1.5028902  8.4971098 49.88439 1585.116 0.1502890 0.026362514
## 11 1.6531792  9.3468208 49.73410 1584.266 0.1502890 0.029774974
## 12 1.7919075 10.2080925 49.59538 1583.405 0.1493256 0.032514188
## 13 1.9132948 11.0867052 49.47399 1582.526 0.1471765 0.040454450
## 14 2.0867052 11.9132948 49.30058 1581.699 0.1490504 0.043189358
## 15 2.1849711 12.8150289 49.20231 1580.798 0.1456647 0.044549765
## 16 2.2716763 13.7283237 49.11561 1579.884 0.1419798 0.045699128
## 17 2.3988439 14.6011561 48.98844 1579.012 0.1411085 0.048793446
## 18 2.5144509 15.4855491 48.87283 1578.127 0.1396917 0.052955319
## 19 2.6473988 16.3526012 48.73988 1577.260 0.1393368 0.055316151
##            TPR          FPR
## 1  0.001878306 0.0005341964
## 2  0.005060270 0.0010767094
## 3  0.006512879 0.0016331168
## 4  0.008166858 0.0021712132
## 5  0.010604141 0.0027018712
## 6  0.013139478 0.0032327214
## 7  0.018491418 0.0037381858
## 8  0.020878963 0.0042685580
## 9  0.023280548 0.0047954160
## 10 0.026362514 0.0053075706
## 11 0.029774974 0.0058392522
## 12 0.032514188 0.0063773351
## 13 0.040454450 0.0069278148
## 14 0.043189358 0.0074433326
## 15 0.044549765 0.0080077536
## 16 0.045699128 0.0085789668
## 17 0.048793446 0.0091249502
## 18 0.052955319 0.0096784036
## 19 0.055316151 0.0102201370
## 
## $IBCF_cor
##            TP        FP       FN       TN  precision       recall
## 1  0.01734104  0.982659 51.36994 1592.630 0.01734104 9.210218e-05
## 2  0.04046243  1.959538 51.34682 1591.653 0.02023121 4.341051e-04
## 3  0.05780347  2.942197 51.32948 1590.671 0.01926782 6.282449e-04
## 4  0.06936416  3.930636 51.31792 1589.682 0.01734104 6.914951e-04
## 5  0.08670520  4.913295 51.30058 1588.699 0.01734104 8.379175e-04
## 6  0.10982659  5.890173 51.27746 1587.723 0.01830443 1.235366e-03
## 7  0.11560694  6.884393 51.27168 1586.728 0.01651528 1.648248e-03
## 8  0.12138728  7.878613 51.26590 1585.734 0.01517341 1.808813e-03
## 9  0.14450867  8.855491 51.24277 1584.757 0.01605652 2.910571e-03
## 10 0.16184971  9.838150 51.22543 1583.775 0.01618497 3.079596e-03
## 11 0.19075145 10.809249 51.19653 1582.803 0.01734104 3.434129e-03
## 12 0.21965318 11.780347 51.16763 1581.832 0.01830443 4.343691e-03
## 13 0.22543353 12.774566 51.16185 1580.838 0.01734104 4.373334e-03
## 14 0.23121387 13.768786 51.15607 1579.844 0.01651528 4.431721e-03
## 15 0.25433526 14.745665 51.13295 1578.867 0.01695568 5.037578e-03
## 16 0.27167630 15.728324 51.11561 1577.884 0.01697977 5.148778e-03
## 17 0.28323699 16.716763 51.10405 1576.896 0.01666100 5.338390e-03
## 18 0.32369942 17.676301 51.06358 1575.936 0.01798330 6.710029e-03
## 19 0.32947977 18.670520 51.05780 1574.942 0.01734104 6.751614e-03
##             TPR          FPR
## 1  9.210218e-05 0.0006163095
## 2  4.341051e-04 0.0012295701
## 3  6.282449e-04 0.0018466051
## 4  6.914951e-04 0.0024669117
## 5  8.379175e-04 0.0030835593
## 6  1.235366e-03 0.0036969112
## 7  1.648248e-03 0.0043215875
## 8  1.808813e-03 0.0049462153
## 9  2.910571e-03 0.0055599590
## 10 3.079596e-03 0.0061766962
## 11 3.434129e-03 0.0067860346
## 12 4.343691e-03 0.0073954614
## 13 4.373334e-03 0.0080196953
## 14 4.431721e-03 0.0086441767
## 15 5.037578e-03 0.0092577435
## 16 5.148778e-03 0.0098742742
## 17 5.338390e-03 0.0104949046
## 18 6.710029e-03 0.0110961620
## 19 6.751614e-03 0.0117205441
## 
## $UBCF_cos
##           TP         FP       FN       TN precision     recall        TPR
## 1  0.4682081  0.5317919 50.91908 1593.081 0.4682081 0.02246923 0.02246923
## 2  0.9075145  1.0924855 50.47977 1592.520 0.4537572 0.04047730 0.04047730
## 3  1.2312139  1.7687861 50.15607 1591.844 0.4104046 0.05062487 0.05062487
## 4  1.6184971  2.3815029 49.76879 1591.231 0.4046243 0.05874304 0.05874304
## 5  1.9364162  3.0635838 49.45087 1590.549 0.3872832 0.06709824 0.06709824
## 6  2.2485549  3.7514451 49.13873 1589.861 0.3747592 0.07531121 0.07531121
## 7  2.5144509  4.4855491 48.87283 1589.127 0.3592073 0.08395585 0.08395585
## 8  2.8786127  5.1213873 48.50867 1588.491 0.3598266 0.10179692 0.10179692
## 9  3.2196532  5.7803468 48.16763 1587.832 0.3577392 0.11772699 0.11772699
## 10 3.5144509  6.4855491 47.87283 1587.127 0.3514451 0.12502742 0.12502742
## 11 3.8208092  7.1791908 47.56647 1586.434 0.3473463 0.13501989 0.13501989
## 12 4.0867052  7.9132948 47.30058 1585.699 0.3405588 0.14095333 0.14095333
## 13 4.3815029  8.6184971 47.00578 1584.994 0.3370387 0.14864368 0.14864368
## 14 4.6358382  9.3641618 46.75145 1584.249 0.3311313 0.15450147 0.15450147
## 15 4.8843931 10.1156069 46.50289 1583.497 0.3256262 0.15892277 0.15892277
## 16 5.1156069 10.8843931 46.27168 1582.728 0.3197254 0.16246285 0.16246285
## 17 5.3641618 11.6358382 46.02312 1581.977 0.3155389 0.16929592 0.16929592
## 18 5.5953757 12.4046243 45.79191 1581.208 0.3108542 0.17531440 0.17531440
## 19 5.8092486 13.1907514 45.57803 1580.422 0.3057499 0.17943886 0.17943886
##             FPR
## 1  0.0003301076
## 2  0.0006768676
## 3  0.0010984947
## 4  0.0014787833
## 5  0.0019032269
## 6  0.0023319408
## 7  0.0027896833
## 8  0.0031863671
## 9  0.0035964458
## 10 0.0040355015
## 11 0.0044672530
## 12 0.0049236114
## 13 0.0053621312
## 14 0.0058266676
## 15 0.0062941441
## 16 0.0067726608
## 17 0.0072402567
## 18 0.0077198858
## 19 0.0082096759
## 
## $UBCF_cor
##           TP         FP       FN       TN precision      recall
## 1  0.3410405  0.6589595 51.04624 1592.954 0.3410405 0.007364798
## 2  0.6705202  1.3294798 50.71676 1592.283 0.3352601 0.016236980
## 3  0.9075145  2.0924855 50.47977 1591.520 0.3025048 0.028265067
## 4  1.1213873  2.8786127 50.26590 1590.734 0.2803468 0.031858503
## 5  1.3468208  3.6531792 50.04046 1589.960 0.2693642 0.036326013
## 6  1.5664740  4.4335260 49.82081 1589.179 0.2610790 0.039403048
## 7  1.8092486  5.1907514 49.57803 1588.422 0.2584641 0.044078839
## 8  2.0231214  5.9768786 49.36416 1587.636 0.2528902 0.047738671
## 9  2.2023121  6.7976879 49.18497 1586.815 0.2447013 0.052213434
## 10 2.3641618  7.6358382 49.02312 1585.977 0.2364162 0.054643244
## 11 2.5722543  8.4277457 48.81503 1585.185 0.2338413 0.059489309
## 12 2.7456647  9.2543353 48.64162 1584.358 0.2288054 0.063096533
## 13 2.9132948 10.0867052 48.47399 1583.526 0.2240996 0.067251720
## 14 3.0867052 10.9132948 48.30058 1582.699 0.2204789 0.070006084
## 15 3.2427746 11.7572254 48.14451 1581.855 0.2161850 0.072641740
## 16 3.3930636 12.6069364 47.99422 1581.006 0.2120665 0.074764176
## 17 3.5780347 13.4219653 47.80925 1580.191 0.2104726 0.079241516
## 18 3.7745665 14.2254335 47.61272 1579.387 0.2096981 0.082429580
## 19 3.9248555 15.0751445 47.46243 1578.538 0.2065713 0.085160415
##            TPR          FPR
## 1  0.007364798 0.0004087752
## 2  0.016236980 0.0008253316
## 3  0.028265067 0.0013004986
## 4  0.031858503 0.0017904171
## 5  0.036326013 0.0022725042
## 6  0.039403048 0.0027583873
## 7  0.044078839 0.0032301030
## 8  0.047738671 0.0037199425
## 9  0.052213434 0.0042319761
## 10 0.054643244 0.0047552775
## 11 0.059489309 0.0052483119
## 12 0.063096533 0.0057641150
## 13 0.067251720 0.0062848357
## 14 0.070006084 0.0068009779
## 15 0.072641740 0.0073274611
## 16 0.074764176 0.0078575515
## 17 0.079241516 0.0083661729
## 18 0.082429580 0.0088666933
## 19 0.085160415 0.0093973748

Model Performance

From both plots below, it is evident that the best-performing model would be the user-user-based model with the cosine distance. From looking at the ROC curve, it can easily be inferred that the user-user-based model with the cosine distance yields the highest area under the curve. Now, the appropriate number of items to recommend need to be set.

Optimizing a Numeric Parameter

Because k was left to its default value of 30, higher values ranging between 100 and 200 were considered as well as lower values of 1 and 5. The default value of 30 was kept as a control. IBCF takes into account the k-closest items. k needs to be optimized here.

vector_k <- c(1,5,30,seq(50,70,10))

The models to evaluate are listed as follows with cosine as the distance metric.

models_to_evaluate <- lapply(vector_k, function(k){
list(name = "IBCF", param = list(method = "cosine", k = k))
})
names(models_to_evaluate) <- paste0("IBCF_k_", vector_k)

eval_results_2 <- evaluate(x = eval_sets, method = models_to_evaluate, n = 1:19)

## IBCF run fold/sample [model time/prediction time]
##   1  [81.754sec/0.12sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [81.54sec/0.135sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [81.313sec/0.333sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [99.72sec/0.251sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [99.522sec/0.417sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [99.361sec/0.303sec]

From both plots below, it is evident that as the value of k increases, the area under the curve decreases unless k is 1. Although k = 1 is a good candidate, it can never have a high TPR. The IBCF with this value recommends minimal items similar to the purchases.

It depends on what has to be achieved. According to the second graph, in order to achieve the highest recall, k has to be 1. If precision is more important, k has to be 5.

plot(eval_results_2, "prec/rec", annotate = T, legend = "bottomright") 
title("Precision-recall")

plot(eval_results_2, annotate = 1, legend = "topleft") 
title("ROC curve")

We can conclude that the best-performing model would be the user-user-based model with cosine distance. The optimal value of k would have to be very low.