Project 3

Load packages

library(tidyverse)
library(recommenderlab)
library(recosystem)

#load MovieLense data
data("MovieLense")

I will implement ALS and SVD on the movielense matrix. I will compare these results to last week’s optimized user-based models to find out which is a better fit.

ALS

ALS requires a binary matrix (1s and 0s), so here we binarize the data and take an image of it:

#binarize the data
binarized_matrix <- binarize(MovieLense, minRating=3)

image(binarized_matrix, main = "Binarized Matrix")

image(binarized_matrix[1:50, 1:50])

It’s one shate instead of different shades. This is also a sparser matrix, since we’ve lost ratings.

Limitations and pitfalls

It makes more sense to do ALS with a natively binary ratings matrix (for example, if someone has simply watched or not watched a movie or listened/not listened to a song). Already, I’ve chosen to include ratings of 3, 4, and 5 as 1s; 3 is arguably not a good rating, so, even if this has better results, it might be a worse system from a user perspective.

There’s also a possible (highly probable) failure with this system: It might recommend something the user has already watched and didn’t like that lines up with their preferences because it shows up in the matrix as a 0 (they rated it 1 or 2). The success metrics may also look inflated compared to SVF or UBCF and IBCF because the model only has to predict a 0 or 1 as opposed to the exact rating - it’s easier to get it right if there are only two choices.

Inspecting the binary matrix

#how many ratings in each row?

rating_tibble <- as_tibble(as(binarized_matrix, "matrix"))


rating_counts <- rowSums(rating_tibble != 0)

rating_counts

##   [1] 218  56  27  22 111 174 368  50  21 184 157  50 419  90  63 128  20 271
##  [19]  18  35  91  95 135  67  77  79  20  69  27  38  30  33  23  16  16  16
##  [37]  49  87  20  21  48 160 195 129  42  25  20  58 111  16  20  55  25  54
##  [55]  15 166  92 125 336 206  12 176  71 174  76  34  29  21  59 118  34 124
##  [73]  52  37  56  66  57  17  47  26  45 121 119  65 260  19 195  18  67 279
##  [91]  85 306  15 343 219  52  56  22 111  42  45 121  27  61  16  62  13  29
## [109] 182 103  19  41  42  44  80  93  78  69 163  21  57  56  49  21 144  34
## [127]  19 161  13 322  28  22  20  20  48  33  42  50  21  17  83  26  18 179
## [145] 224  25  19  56  19  26 278  98  12  48  10  35  42 156  81 109  26  39
## [163]  20  57  28  16  51  48  36  20  23  24  39 129  33  52  99 241  27  53
## [181]  55  27  40 233  47  71  53 100 166  47  26  27  93 212  73  34  90 151
## [199]  24 204 268  13  36  30  15  22 191  29  19 128  27  20 121 121  86 117
## [217]  47  50  21  19 131 260  71 105  25  46  45  15  16 116  20  86 108 362
## [235]  86  98  47  26 141  20  17  19  75 198  17 126  24  46 155 103  71  20
## [253]  90 119  38 195  52  18  44  21  26 128 112 109  36  17 177 219 202 128
## [271] 239  48  19  65  83 433  48  22 343 221  19  18  50  42  30 240  56  62
## [289]  18 126 253 121 268 130 188 128 162 125 240  17 236   8 360  23 179  28
## [307] 103 382  16  18 274 216 227 208  81  53  18 149  20 144 122  48  77  64
## [325] 115 139 217 250  54 145  59 172  24 289  18  90  30  70 239  39  18 159
## [343] 212 164 209 151 153  57  29  48  42  36  16 215  26  21  73  35  26  96
## [361] 108  15 198  15  43  30  54  33  19  68  48  65 213 200  25  29  30 331
## [379] 179 127 105  44  66  22 206  22 233  49 232  25 104  97 391 136  51  47
## [397]  81 154 242  20 111  63  40  33 179 287 197  24 160  19  56  49  42  22
## [415]  22 423 288  10  31  36  54  82  57  32 135  96  26  50 353  54  18  53
## [433]  29  36 295 133 218  29  28  46  18 109  17  21  33  21 120  24  63 507
## [451]  52 154 118 157 176 192 251 153  97  56  15  28  78  49  72  82  36 134
## [469]  43  53  21 251  30 318  15  67  35  87 161  54  53  22  44 127  17 144
## [487] 193 128  86  34  28  48 115  42 192  84 216 123  92 183  73  25 137 232
## [505]  99 205  56  79  14  18  20  20  22 180  28  21  28  66  36  18 124  29
## [523]  99 252  44  46 124  46  37  43  20 248 205  76 207 147 333  76  51  62
## [541] 115 120 180  18 147  52  19 129  24  32 269  75  99 108  51  37  44  19
## [559]  66  87 271  52  25  31  34 116 141  62  62  10  17  15  48  36  16  33
## [577] 180  14  65  38  26  48  25  21  72 140  66 188  32  40  79 293 132  24
## [595]  70  18  35  21  40  75 101  28  40  18  76 222  36 134  10  67  35  21
## [613]  26  27  87  38  63 170  69  98 147 180  40 130  98  13 138  26 110  86
## [631]  13 104  51 114  25  20  50  57 100 107  36 262 194  41 117  31  57 210
## [649]  21 256  15  15 151 128 497  12  22  67 180 106 112  20 138 152 125 230
## [667]  43  37  77  39 105  20  31  37  26  57  38  24  54  37  16 304  56  73
## [685]   6  68  17  23  32  89  28  33 112 157  26  26  86  73 128  21  32  13
## [703]  39  79  96  23 195  74 120  78 200 151  23  38 129 238  76  33  45  28
## [721] 149  38  19  33  22  15 213  23  13  26  62  16  71  55  44  16  30 138
## [739]  26  17  87  22  28  26  40  59 253 103 275  24 138  54  47  30  28  86
## [757] 147 319  29  30  37  13 117 102  19 155  37  46  21  59  55  31 114  64
## [775]  26  83  31  43  36  49  34 140  28  37  25 107  44 214  30 162  22  38
## [793]  51  38 128 303  11 202  23  27  21  63  19 310 205 125 194  22  16  25
## [811]  20  15  18  26 154  22  30  15  23  13  60  19 176   9 127 111  25  74
## [829]  55  92  55  18 173  48 100  41  31  90  43 193  29  21 115  75  22 348
## [847] 104 135  23  48 167  45  29 164  21  20  16  18  36  55  40 160  68 269
## [865]  24   9  90 119  28 215 108  61  11  33  85  20  73 101  25 306 200 126
## [883] 257  35  83 191 140  20 279 101  47 210  53 232  18 223 172  25 113  17
## [901] 111  40 120  42  35  40 143  70  26  46  90  47 110  16  17 271  27  83
## [919] 183  20  85 107  73  77  26  17 101  31  43  43  55 213 102 153  36 126
## [937]  31  81  45  90  20  76 124

#matrix with more than 30 ratings
MovieLense30 <- binarized_matrix[rowCounts(binarized_matrix) > 30, ]

#how many users do we have left?
MovieLense30

## 662 x 1664 rating matrix of class 'binaryRatingMatrix' with 76093 ratings.

ALS takes a binary matrix and decomposes it into two smaller latent-factor matrices through a process of iteration. These matrices represent latent factors, or categories like romance and comedy (or some combination of them) to predict whether a user will like an item.

#implement ALS as a recommender system
movielense_als <- Recommender(MovieLense30, method = 'ALS') 

set.seed(1122)
scheme <- evaluationScheme(MovieLense30,
                           method     = "split",
                           #train/test split
                           train      = 0.8,
                           #model can see all but 25 ratings for each user
                           given      = -25, 
                           goodRating = 1)

#what does ALS accept as paramaters? 
recommenderRegistry$get_entry("ALS", dataType = "binaryRatingMatrix")

## Recommender method: ALS_implicit for binaryRatingMatrix Description:
##   Recommender for implicit data based on latent factors, calculated by
##   alternating least squares algorithm. Reference: Yifan Hu, Yehuda
##   Koren, Chris Volinsky (2008). Collaborative Filtering for Implicit
##   Feedback Datasets, ICDM '08 Proceedings of the 2008 Eighth IEEE
##   International Conference on Data Mining, pages 263-272.
## Parameters:
##   lambda alpha n_factors n_iterations min_item_nr seed
## 1    0.1    10        10           10           1 NULL

ALS accepts lambda, alpha, n_factors, n_iterations, min_item_nr. Here’s what they mean:

alpha - not commonly tuned; a higher alpha means the model trusts observed interactions more.
n_factors - this is k, but it represents dimensionality (how big the resulting matrix is)
n_iterations - number of optimization cycles
lambda - a larger lambda penalizes larger weights assigned to latent factors. Too high and the model might overfit; too low and it may underfit.
min_item_nr - the minimum interactions an item needs to be included in the algorithm (so, if we set this to 5, movies with fewer than 5 positive reviews will be excluded). Not exactly a tuning parameter.
seed - setting the seed so the result is reproducible (can also be done just before implementing the algorithm)

#get the results for this first iteration
set.seed(1122)
results <- evaluate(scheme, 
                    method = "ALS",
                    type = "topNList", 
                    n = c(5, 10, 20, 40))

## ALS run fold/sample [model time/prediction time]
##   1  [0sec/26.21sec]

#check the results
avg(results)

##            TP        FP       FN       TN        N precision     recall
## [1,] 2.172932  2.827068 22.82707 1554.211 1582.038 0.4345865 0.08691729
## [2,] 3.887218  6.112782 21.11278 1550.925 1582.038 0.3887218 0.15548872
## [3,] 6.556391 13.443609 18.44361 1543.594 1582.038 0.3278195 0.26225564
## [4,] 9.962406 30.037594 15.03759 1527.000 1582.038 0.2490602 0.39849624
##             TPR         FPR  n
## [1,] 0.08691729 0.001828307  5
## [2,] 0.15548872 0.003950513 10
## [3,] 0.26225564 0.008679676 20
## [4,] 0.39849624 0.019380778 40

The true positive rate (total truths correclty detected - tp / tp + fn) increases with more recommendations. I tried running other accuracy metrics (MAE, RMSE, MSE), but they can’t be calculated for binary data., so we will look at precision (accuracy) and recall (TPR) .

I will try with different n_factors (dimensionality) and lambda to see how that affects the results. These are both factors that can contribute to over- and underfitting.

Modifying n_factors and lambda

Taking n_factors to 30 and lambda to .3

set.seed(1122)
results_2 <- evaluate(scheme, 
                    method = "ALS",
                    type = "topNList", 
                    n = c(5, 10, 20, 40), 
                    parameter = list(n_factors = 30, 
                                     lambda = .3))

## ALS run fold/sample [model time/prediction time]
##   1  [0sec/26.36sec]

avg(results_2)

##             TP        FP       FN       TN        N precision     recall
## [1,]  2.142857  2.857143 22.85714 1554.180 1582.038 0.4285714 0.08571429
## [2,]  3.894737  6.105263 21.10526 1550.932 1582.038 0.3894737 0.15578947
## [3,]  6.827068 13.172932 18.17293 1543.865 1582.038 0.3413534 0.27308271
## [4,] 10.571429 29.428571 14.42857 1527.609 1582.038 0.2642857 0.42285714
##             TPR         FPR  n
## [1,] 0.08571429 0.001850465  5
## [2,] 0.15578947 0.003940337 10
## [3,] 0.27308271 0.008492353 20
## [4,] 0.42285714 0.018973376 40

This model performs slightly better than the previous model (though it starts off a tiny bit worse). I will try with just a higher k and the same lambda, then the larger lambda and the same k.

results_3 <- evaluate(scheme, 
                    method = "ALS",
                    type = "topNList", 
                    n = c(5, 10, 20, 40), 
                    parameter = list(n_factors = 30, 
                                     lambda = .1))

## ALS run fold/sample [model time/prediction time]
##   1  [0sec/26sec]

avg(results_3)

##             TP        FP       FN       TN        N precision     recall
## [1,]  1.887218  3.112782 23.11278 1553.925 1582.038 0.3774436 0.07548872
## [2,]  3.639098  6.360902 21.36090 1550.677 1582.038 0.3639098 0.14556391
## [3,]  6.368421 13.631579 18.63158 1543.406 1582.038 0.3184211 0.25473684
## [4,] 10.105263 29.894737 14.89474 1527.143 1582.038 0.2526316 0.40421053
##             TPR         FPR  n
## [1,] 0.07548872 0.002010114  5
## [2,] 0.14556391 0.004113724 10
## [3,] 0.25473684 0.008802487 20
## [4,] 0.40421053 0.019291622 40

This is slightly worse than the previous version if we focus on precision and recall (though it gets better with more recommendations).

results_4 <- evaluate(scheme, 
                    method = "ALS",
                    type = "topNList", 
                    n = c(5, 10, 20, 40), 
                    parameter = list(n_factors = 10, 
                                     lambda = .3))

## ALS run fold/sample [model time/prediction time]
##   1  [0sec/24.39sec]

avg(results_4)

##             TP        FP       FN       TN        N precision    recall
## [1,]  2.203008  2.796992 22.79699 1554.241 1582.038 0.4406015 0.0881203
## [2,]  3.887218  6.112782 21.11278 1550.925 1582.038 0.3887218 0.1554887
## [3,]  6.511278 13.488722 18.48872 1543.549 1582.038 0.3255639 0.2604511
## [4,] 10.007519 29.992481 14.99248 1527.045 1582.038 0.2501880 0.4003008
##            TPR         FPR  n
## [1,] 0.0881203 0.001805590  5
## [2,] 0.1554887 0.003944082 10
## [3,] 0.2604511 0.008701780 20
## [4,] 0.4003008 0.019340577 40

This is a little more comparable to the initial model. Generally for ALS the TPR and precision are very high.

Interestingly, if I use a higher lambda, results start strong and get worse. Why? Perhaps because a higher lambda can make the results more generic, which may work for a small number of recommendations (e.g. most popular), but not for anything that accounts for taste/the long tail.

results_5 <- evaluate(scheme, 
                    method = "ALS",
                    type = "topNList", 
                    n = c(5, 10, 20, 40), 
                    parameter = list(n_factors = 10, 
                                     lambda = .6))

## ALS run fold/sample [model time/prediction time]
##   1  [0sec/24.63sec]

avg(results_5)

##             TP        FP       FN       TN        N precision     recall
## [1,]  2.127820  2.872180 22.87218 1554.165 1582.038 0.4255639 0.08511278
## [2,]  3.759398  6.240602 21.24060 1550.797 1582.038 0.3759398 0.15037594
## [3,]  6.496241 13.503759 18.50376 1543.534 1582.038 0.3248120 0.25984962
## [4,] 10.082707 29.917293 14.91729 1527.120 1582.038 0.2520677 0.40330827
##             TPR         FPR  n
## [1,] 0.08511278 0.001850411  5
## [2,] 0.15037594 0.004024590 10
## [3,] 0.25984962 0.008702120 20
## [4,] 0.40330827 0.019281418 40

Testing for optimal dimensionality (k)

factors_to_test <- c(5, 10, 20, 30, 40, 50)

set.seed(1122)
results_by_factors <- lapply(factors_to_test, function(k) {
  evaluate(scheme,
           method = "ALS",
           type   = "topNList",
           n      = c(5, 10, 20, 40),
           parameter = list(n_factors = k, lambda = 0.1, seed = 1))
})

## ALS run fold/sample [model time/prediction time]
##   1  [0sec/24.36sec] 
## ALS run fold/sample [model time/prediction time]
##   1  [0sec/24.53sec] 
## ALS run fold/sample [model time/prediction time]
##   1  [0sec/25.65sec] 
## ALS run fold/sample [model time/prediction time]
##   1  [0sec/26.63sec] 
## ALS run fold/sample [model time/prediction time]
##   1  [0sec/27.24sec] 
## ALS run fold/sample [model time/prediction time]
##   1  [0sec/29.38sec]

#check results
avg(results_by_factors[[1]])

##            TP        FP       FN       TN        N precision     recall
## [1,] 1.789474  3.210526 23.21053 1553.827 1582.038 0.3578947 0.07157895
## [2,] 3.300752  6.699248 21.69925 1550.338 1582.038 0.3300752 0.13203008
## [3,] 5.819549 14.180451 19.18045 1542.857 1582.038 0.2909774 0.23278195
## [4,] 9.165414 30.834586 15.83459 1526.203 1582.038 0.2291353 0.36661654
##             TPR         FPR  n
## [1,] 0.07157895 0.002071376  5
## [2,] 0.13203008 0.004321687 10
## [3,] 0.23278195 0.009146352 20
## [4,] 0.36661654 0.019877820 40

#5 suggestions has the highest precision but the lowest true positive rate

#Pull precision at each k into a plottable data frame


prec_df <- do.call(rbind, lapply(seq_along(factors_to_test), function(i) {
  a <- avg(results_by_factors[[i]])
  data.frame(n_factors = factors_to_test[i],
             n         = a[, "n"],
             precision = a[, "precision"])
}))

#graph results
ggplot(prec_df, aes(x = n, y = precision, color = factor(n_factors), group = factor(n_factors))) +
  geom_line() + geom_point() +
  labs(title = "ALS Precision by Dimensionality and Number of Suggestions",
       x = "Number of Suggestions", y = "Precision", 
       color = "Dimensionality (n_factors)") +
  theme_minimal()

Precision goes down with the number of suggestions (perhaps the first recommendations tend the be the most confident/on target). It’s generally highest for a dimensionality of 20 or 30 (however, it’s lowest for 20 with only 5 suggestions, and switches between 20 and 30 as suggestions increases).

Graphing true positive rate/recall:

tpr_df <- do.call(rbind, lapply(seq_along(factors_to_test), function(i) {
  a <- avg(results_by_factors[[i]])
  data.frame(n_factors = factors_to_test[i],
             n         = a[, "n"],
             TPR       = a[, "TPR"])
}))

ggplot(tpr_df, aes(x = n, y = TPR, color = factor(n_factors), group = factor(n_factors))) +
  geom_line() + geom_point() +
  labs(title = "ALS TPR (recall) by n_factors (dimensionality) and 
       List Length (suggestions)",
       x = "Number of Suggestions", y = "TPR (recall)", color = "Dimensionality") +
  theme_minimal()

Recall is generally highest (at higher volumes) for dimensionalities of 20 and 30 as well. 50 is too large, and perhaps overfitted.

SVD

Creating the real ratings matrix with min 30 ratings. This is also not exactly equivalent to the ALS grid (ALS will have fewer true ratings, since anything 1 or 2 was converted to a zero), so this creates issues with comparing the two. This may be evened out by the fact that ALS only has to predict a 0 or 1, whereas SVD must predict a score.

MovieLense30_nb <- MovieLense[rowCounts(MovieLense) > 30, ]

Unlike ALS, SVD requires imputation. Choosing an imputation type:

Mean imputation is the standard for the recommenderlab package. It imputes the mean and adjusts for the user and item biases (like the baseline predictor). Recommenderlab automatically uses mean imputation.

There’s also SVDF, a more advanced and computationally expensive method, that does not require prior imputation.

Similar to ALS, SVD creates three (not two) smaller matrices that account for latent factors in users and items. SVD does not take a long time to run in recommenderlab (it actually takes much less time than ALS).

Let’s look at the different parameters:

#checking parameters
recommenderRegistry$get_entry("SVD", dataType = "realRatingMatrix")

## Recommender method: SVD for realRatingMatrix Description: Recommender
##   based on SVD approximation with column-mean imputation. Reference: NA
## Parameters:
##    k maxiter normalize
## 1 10     100  "center"

Variables

k - dimensionality, as with ALS
maxiter - number of iterations
normalization - centered is the default (imputation, which is required for SVD). You can also use z-score. This imputes the expected rating for each user/item combo using a global baseline estimate, based on the user’s other ratings and the item’s other ratings.

I will go with centered, since it seems like the most neutral option.

Create a recommender:

svd_model <- Recommender(MovieLense30_nb, method = "SVD")

Test the default model’s results:

set.seed(1122)
scheme_svd <- evaluationScheme(MovieLense30_nb,
                           method     = "split",
                           train      = 0.8,
                           given      = -25, 
                           goodRating = 4)

#get the results for this first iteration
set.seed(1122)
results_5 <- evaluate(scheme_svd, 
                    method = "SVD",
                    type = "topNList", 
                    n = c(5, 10, 20, 40))

## SVD run fold/sample [model time/prediction time]
##   1  [0.08sec/0.05sec]

#check the results
avg(results_5)

##              TP        FP       FN       TN        N   precision       recall
## [1,] 0.03424658  4.965753 14.00000 1537.479 1556.479 0.006849315 0.0009728605
## [2,] 0.05479452  9.945205 13.97945 1532.500 1556.479 0.005479452 0.0020995070
## [3,] 0.13013699 19.869863 13.90411 1522.575 1556.479 0.006506849 0.0059093356
## [4,] 0.79452055 39.205479 13.23973 1503.240 1556.479 0.019863014 0.0487307809
##               TPR         FPR  n
## [1,] 0.0009728605 0.003226905  5
## [2,] 0.0020995070 0.006463861 10
## [3,] 0.0059093356 0.012916534 20
## [4,] 0.0487307809 0.025473557 40

Precision and TPR are much lower than ALS. Maybe this is expected because we are trying to predict exact ratings, not just a 1 or a 0. We will look at different dimensionalities, starting with 20.

results_6 <- evaluate(scheme_svd, 
                    method = "SVD",
                    type = "topNList", 
                    n = c(5, 10, 20, 40), 
                    parameter = list(k = 20))

## SVD run fold/sample [model time/prediction time]
##   1  [0.13sec/0.1sec]

avg(results_6)

##              TP        FP       FN       TN        N   precision      recall
## [1,] 0.02739726  4.972603 14.00685 1537.473 1556.479 0.005479452 0.001300753
## [2,] 0.06164384  9.938356 13.97260 1532.507 1556.479 0.006164384 0.004154536
## [3,] 0.17123288 19.828767 13.86301 1522.616 1556.479 0.008561644 0.010413355
## [4,] 0.84931507 39.150685 13.18493 1503.295 1556.479 0.021232877 0.050421223
##              TPR         FPR  n
## [1,] 0.001300753 0.003231646  5
## [2,] 0.004154536 0.006460176 10
## [3,] 0.010413355 0.012890114 20
## [4,] 0.050421223 0.025435820 40

The precision and recall are all slighlty better. Let’s check systematically:

k_to_test <- c(5, 10, 20, 30, 40, 50)

set.seed(1122)
results_by_k <- lapply(k_to_test, function(k) {
  evaluate(scheme_svd,
           method = "SVD",
           type   = "topNList",
           n      = c(5, 10, 20, 40),
           parameter = list(k = k))
})

## SVD run fold/sample [model time/prediction time]
##   1  [0.24sec/0.06sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.24sec/0.06sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.14sec/0.22sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.2sec/0.07sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.25sec/0.06sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.31sec/0.07sec]

# check results
avg(results_by_k[[1]])

##              TP        FP       FN       TN        N   precision       recall
## [1,] 0.02054795  4.979452 14.01370 1537.466 1556.479 0.004109589 0.0005291485
## [2,] 0.03424658  9.965753 14.00000 1532.479 1556.479 0.003424658 0.0014317807
## [3,] 0.13698630 19.863014 13.89726 1522.582 1556.479 0.006849315 0.0061330341
## [4,] 0.70547945 39.294521 13.32877 1503.151 1556.479 0.017636986 0.0468141330
##               TPR         FPR  n
## [1,] 0.0005291485 0.003236470  5
## [2,] 0.0014317807 0.006478552 10
## [3,] 0.0061330341 0.012912377 20
## [4,] 0.0468141330 0.025535730 40

# Precision data frame
prec_df_svd <- do.call(rbind, lapply(seq_along(k_to_test), function(i) {
  a <- avg(results_by_k[[i]])
  data.frame(k         = k_to_test[i],
             n         = a[, "n"],
             precision = a[, "precision"])
}))

ggplot(prec_df_svd, aes(x = n, y = precision, color = factor(k), group = factor(k))) +
  geom_line() + geom_point() +
  labs(title = "SVD Precision by Dimensionality (k) and Number of Suggestions",
       x = "Number of Suggestions", y = "Precision",
       color = "Dimensionality (k)") +
  theme_minimal()

# TPR data frame
tpr_df_svd <- do.call(rbind, lapply(seq_along(k_to_test), function(i) {
  a <- avg(results_by_k[[i]])
  data.frame(k   = k_to_test[i],
             n   = a[, "n"],
             TPR = a[, "TPR"])
}))

ggplot(tpr_df_svd, aes(x = n, y = TPR, color = factor(k), group = factor(k))) +
  geom_line() + geom_point() +
  labs(title = "SVD TPR (recall) by Dimensionality (k) and Number of Suggestions",
       x = "Number of Suggestions", y = "TPR (recall)",
       color = "Dimensionality (k)") +
  theme_minimal()

In general, a higher k means a higher recall or true positive rate. This is not as true for precision - for a smaller number of suggestions, higher ks tend to do worse; also, larger larger dimensions drop off after about 30 suggestions. This could be due to larger dimensionalities not capturing the long tail as well.

It’s worth noting that the drop-off is different from ALS’s - SVD’s metrics stays really low until about 20 suggestions, then increase dramatically. ALS increases sharply until about 20, then the returns diminish (precision declines). Also, ALS performs way better than SVD in terms of precision and recall. This, again, may be due to exact ratings vs. just having to capture a 0 or 1.

Comparing to UBCF

UBCF was the top-performing model for project 2. Let’s see how it compares to ALS and SVD.

set.seed(1122)
scheme_ubcf <- evaluationScheme(MovieLense30_nb,
                           method     = "split",
                           train      = 0.8,
                           given      = -25, 
                           goodRating = 4)

#get the results for this first iteration
set.seed(1122)
results_7 <- evaluate(scheme_ubcf, 
                    method = "UBCF",
                    type = "topNList", 
                    n = c(5, 10, 20, 40))

## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.81sec]

avg(results_7)

##             TP        FP       FN       TN        N  precision     recall
## [1,] 0.2054795  4.794521 13.82877 1537.651 1556.479 0.04109589 0.01576680
## [2,] 0.3424658  9.657534 13.69178 1532.788 1556.479 0.03424658 0.02471369
## [3,] 0.6369863 19.363014 13.39726 1523.082 1556.479 0.03184932 0.03914411
## [4,] 1.2191781 38.780822 12.81507 1503.664 1556.479 0.03047945 0.07583986
##             TPR         FPR  n
## [1,] 0.01576680 0.003113887  5
## [2,] 0.02471369 0.006273563 10
## [3,] 0.03914411 0.012572364 20
## [4,] 0.07583986 0.025183952 40

#with more optimized paramaters

results_8 <- evaluate(scheme_ubcf, method = "UBCF", 
                      type = "topNList", 
                    n = c(5, 10, 20, 40),
                    param = list(nn = 60))

## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.71sec]

avg(results_8)

##              TP        FP       FN       TN        N   precision      recall
## [1,] 0.04794521  4.952055 13.98630 1537.493 1556.479 0.009589041 0.007307777
## [2,] 0.12328767  9.876712 13.91096 1532.568 1556.479 0.012328767 0.013663760
## [3,] 0.28767123 19.712329 13.74658 1522.733 1556.479 0.014383562 0.025438753
## [4,] 0.76712329 39.232877 13.26712 1503.212 1556.479 0.019178082 0.052399706
##              TPR         FPR  n
## [1,] 0.007307777 0.003221674  5
## [2,] 0.013663760 0.006423368 10
## [3,] 0.025438753 0.012816554 20
## [4,] 0.052399706 0.025496313 40

UBCF does better than SVD, though the optimized model comes close to SVD and does worse in some places. It’s worth noting that the precision and recall are actually worse for the optimized model (I’d previously used RMSE and MAE) than for the default settings.

SVDF

SVDF (funk-SVD) - it uses stochastic gradient descent (an optimization algorithm) to systematically turn the rating matrix into three matrices. This is supposed to minimize error and prevent the algorithm from overfitting. Imputation and centering are not required because SGD fills in the matrix.

It takes forever to run despite stochastic gradient descent’s efficiency, so we will only try it once with the default settings.

#check the params
recommenderRegistry$get_entry("SVDF", dataType = "realRatingMatrix")

## Recommender method: SVDF for realRatingMatrix Description: Recommender
##   based on Funk SVD with gradient descend
##   (https://sifter.org/~simon/journal/20061211.html). Reference: NA
## Parameters:
##    k gamma lambda min_epochs max_epochs min_improvement normalize verbose
## 1 10 0.015  0.001         50        200           1e-06  "center"   FALSE

set.seed(1122)
results_9 <- evaluate(scheme_svd, 
                    method = "SVDF",
                    type = "topNList", 
                    n = c(5, 10, 20, 40))

## SVDF run fold/sample [model time/prediction time]
##   1  [42.23sec/10.54sec]

#check the results
avg(results_9)

##             TP        FP       FN       TN        N  precision     recall
## [1,] 0.6780822  4.321918 13.35616 1538.123 1556.479 0.13561644 0.04524586
## [2,] 1.1438356  8.856164 12.89041 1533.589 1556.479 0.11438356 0.08083067
## [3,] 1.9520548 18.047945 12.08219 1524.397 1556.479 0.09760274 0.13833136
## [4,] 3.0068493 36.993151 11.02740 1505.452 1556.479 0.07517123 0.20528458
##             TPR         FPR  n
## [1,] 0.04524586 0.002791825  5
## [2,] 0.08083067 0.005727922 10
## [3,] 0.13833136 0.011685514 20
## [4,] 0.20528458 0.023974498 40

it does much better than SVD with no paramater adjustments. Metrics are about half the best version of ALS (like ALS, precision declines with more suggestions while recall increases). Compared to UBCF, precision for 40 recommendations is .075, recall is .21; UBCF’s optimized model shows .057 for recall and .016 for precision. Without optimizing anything, this model beats UBCF.

Conclusion and recommendations

ALS has better metrics for this data set, but I don’t recommend it. I would recommend it for a data set with implicit ratings (whether someone watched or didn’t watch something), but not this data set, which I converted to a binary matrix. It may have the best metrics because we are measuring slightly different things - whether the rating was 3 or higher (it’s now a 1 in the binary matrix) vs. what exact score the model thinks the user would give something. With this data set, it also runs the risk of recommending something the user has already seen and does not like. It’s probable that someone saw something but didn’t like it in a genre they enjoy.

SVD beats UBCF’s more optimized model in precision, and is very similar for recall. Without even changing the parameters, SVDF has a .2 recall and .075 precision at 40 recommendations. The drawback is the amount of time it takes to run the model. However, a 20% success rate might be worth it. SVDF and SVD may fail worse than other models with a cold start - imputation means many, many imputed ratings on not a lot of data.

Sources

Michael Hahlser, recommenderlab: An R Framework for Developing and Testing Recommendation Algorithms. May 2022. https://arxiv.org/abs/2205.12371

Claude Sonnet, 4.6. [Large language model] Accessed June 2026. Claude.ai (to make the k vs TPR and precision graphs)

Project 3

Sam Barbaro

2026-06-17

Project 3

ALS

Limitations and pitfalls

Inspecting the binary matrix

Modifying n_factors and lambda

Testing for optimal dimensionality (k)

SVD

Variables

Comparing to UBCF

SVDF

Conclusion and recommendations

Sources