Load packages
library(tidyverse)
library(recommenderlab)
library(recosystem)
#load MovieLense data
data("MovieLense")
I will implement ALS and SVD on the movielense matrix. I will compare these results to last week’s optimized user-based models to find out which is a better fit.
ALS requires a binary matrix (1s and 0s), so here we binarize the data and take an image of it:
#binarize the data
binarized_matrix <- binarize(MovieLense, minRating=3)
image(binarized_matrix, main = "Binarized Matrix")
image(binarized_matrix[1:50, 1:50])
It’s one shate instead of different shades. This is also a sparser matrix, since we’ve lost ratings.
It makes more sense to do ALS with a natively binary ratings matrix (for example, if someone has simply watched or not watched a movie or listened/not listened to a song). Already, I’ve chosen to include ratings of 3, 4, and 5 as 1s; 3 is arguably not a good rating, so, even if this has better results, it might be a worse system from a user perspective.
There’s also a possible (highly probable) failure with this system: It might recommend something the user has already watched and didn’t like that lines up with their preferences because it shows up in the matrix as a 0 (they rated it 1 or 2). The success metrics may also look inflated compared to SVF or UBCF and IBCF because the model only has to predict a 0 or 1 as opposed to the exact rating - it’s easier to get it right if there are only two choices.
#how many ratings in each row?
rating_tibble <- as_tibble(as(binarized_matrix, "matrix"))
rating_counts <- rowSums(rating_tibble != 0)
rating_counts
## [1] 218 56 27 22 111 174 368 50 21 184 157 50 419 90 63 128 20 271
## [19] 18 35 91 95 135 67 77 79 20 69 27 38 30 33 23 16 16 16
## [37] 49 87 20 21 48 160 195 129 42 25 20 58 111 16 20 55 25 54
## [55] 15 166 92 125 336 206 12 176 71 174 76 34 29 21 59 118 34 124
## [73] 52 37 56 66 57 17 47 26 45 121 119 65 260 19 195 18 67 279
## [91] 85 306 15 343 219 52 56 22 111 42 45 121 27 61 16 62 13 29
## [109] 182 103 19 41 42 44 80 93 78 69 163 21 57 56 49 21 144 34
## [127] 19 161 13 322 28 22 20 20 48 33 42 50 21 17 83 26 18 179
## [145] 224 25 19 56 19 26 278 98 12 48 10 35 42 156 81 109 26 39
## [163] 20 57 28 16 51 48 36 20 23 24 39 129 33 52 99 241 27 53
## [181] 55 27 40 233 47 71 53 100 166 47 26 27 93 212 73 34 90 151
## [199] 24 204 268 13 36 30 15 22 191 29 19 128 27 20 121 121 86 117
## [217] 47 50 21 19 131 260 71 105 25 46 45 15 16 116 20 86 108 362
## [235] 86 98 47 26 141 20 17 19 75 198 17 126 24 46 155 103 71 20
## [253] 90 119 38 195 52 18 44 21 26 128 112 109 36 17 177 219 202 128
## [271] 239 48 19 65 83 433 48 22 343 221 19 18 50 42 30 240 56 62
## [289] 18 126 253 121 268 130 188 128 162 125 240 17 236 8 360 23 179 28
## [307] 103 382 16 18 274 216 227 208 81 53 18 149 20 144 122 48 77 64
## [325] 115 139 217 250 54 145 59 172 24 289 18 90 30 70 239 39 18 159
## [343] 212 164 209 151 153 57 29 48 42 36 16 215 26 21 73 35 26 96
## [361] 108 15 198 15 43 30 54 33 19 68 48 65 213 200 25 29 30 331
## [379] 179 127 105 44 66 22 206 22 233 49 232 25 104 97 391 136 51 47
## [397] 81 154 242 20 111 63 40 33 179 287 197 24 160 19 56 49 42 22
## [415] 22 423 288 10 31 36 54 82 57 32 135 96 26 50 353 54 18 53
## [433] 29 36 295 133 218 29 28 46 18 109 17 21 33 21 120 24 63 507
## [451] 52 154 118 157 176 192 251 153 97 56 15 28 78 49 72 82 36 134
## [469] 43 53 21 251 30 318 15 67 35 87 161 54 53 22 44 127 17 144
## [487] 193 128 86 34 28 48 115 42 192 84 216 123 92 183 73 25 137 232
## [505] 99 205 56 79 14 18 20 20 22 180 28 21 28 66 36 18 124 29
## [523] 99 252 44 46 124 46 37 43 20 248 205 76 207 147 333 76 51 62
## [541] 115 120 180 18 147 52 19 129 24 32 269 75 99 108 51 37 44 19
## [559] 66 87 271 52 25 31 34 116 141 62 62 10 17 15 48 36 16 33
## [577] 180 14 65 38 26 48 25 21 72 140 66 188 32 40 79 293 132 24
## [595] 70 18 35 21 40 75 101 28 40 18 76 222 36 134 10 67 35 21
## [613] 26 27 87 38 63 170 69 98 147 180 40 130 98 13 138 26 110 86
## [631] 13 104 51 114 25 20 50 57 100 107 36 262 194 41 117 31 57 210
## [649] 21 256 15 15 151 128 497 12 22 67 180 106 112 20 138 152 125 230
## [667] 43 37 77 39 105 20 31 37 26 57 38 24 54 37 16 304 56 73
## [685] 6 68 17 23 32 89 28 33 112 157 26 26 86 73 128 21 32 13
## [703] 39 79 96 23 195 74 120 78 200 151 23 38 129 238 76 33 45 28
## [721] 149 38 19 33 22 15 213 23 13 26 62 16 71 55 44 16 30 138
## [739] 26 17 87 22 28 26 40 59 253 103 275 24 138 54 47 30 28 86
## [757] 147 319 29 30 37 13 117 102 19 155 37 46 21 59 55 31 114 64
## [775] 26 83 31 43 36 49 34 140 28 37 25 107 44 214 30 162 22 38
## [793] 51 38 128 303 11 202 23 27 21 63 19 310 205 125 194 22 16 25
## [811] 20 15 18 26 154 22 30 15 23 13 60 19 176 9 127 111 25 74
## [829] 55 92 55 18 173 48 100 41 31 90 43 193 29 21 115 75 22 348
## [847] 104 135 23 48 167 45 29 164 21 20 16 18 36 55 40 160 68 269
## [865] 24 9 90 119 28 215 108 61 11 33 85 20 73 101 25 306 200 126
## [883] 257 35 83 191 140 20 279 101 47 210 53 232 18 223 172 25 113 17
## [901] 111 40 120 42 35 40 143 70 26 46 90 47 110 16 17 271 27 83
## [919] 183 20 85 107 73 77 26 17 101 31 43 43 55 213 102 153 36 126
## [937] 31 81 45 90 20 76 124
#matrix with more than 30 ratings
MovieLense30 <- binarized_matrix[rowCounts(binarized_matrix) > 30, ]
#how many users do we have left?
MovieLense30
## 662 x 1664 rating matrix of class 'binaryRatingMatrix' with 76093 ratings.
ALS takes a binary matrix and decomposes it into two smaller latent-factor matrices through a process of iteration. These matrices represent latent factors, or categories like romance and comedy (or some combination of them) to predict whether a user will like an item.
#implement ALS as a recommender system
movielense_als <- Recommender(MovieLense30, method = 'ALS')
set.seed(1122)
scheme <- evaluationScheme(MovieLense30,
method = "split",
#train/test split
train = 0.8,
#model can see all but 25 ratings for each user
given = -25,
goodRating = 1)
#what does ALS accept as paramaters?
recommenderRegistry$get_entry("ALS", dataType = "binaryRatingMatrix")
## Recommender method: ALS_implicit for binaryRatingMatrix Description:
## Recommender for implicit data based on latent factors, calculated by
## alternating least squares algorithm. Reference: Yifan Hu, Yehuda
## Koren, Chris Volinsky (2008). Collaborative Filtering for Implicit
## Feedback Datasets, ICDM '08 Proceedings of the 2008 Eighth IEEE
## International Conference on Data Mining, pages 263-272.
## Parameters:
## lambda alpha n_factors n_iterations min_item_nr seed
## 1 0.1 10 10 10 1 NULL
ALS accepts lambda, alpha, n_factors, n_iterations, min_item_nr. Here’s what they mean:
alpha - not commonly tuned; a higher alpha means the model trusts observed interactions more.
n_factors - this is k, but it represents dimensionality (how big the resulting matrix is)
n_iterations - number of optimization cycles
lambda - a larger lambda penalizes larger weights assigned to latent factors. Too high and the model might overfit; too low and it may underfit.
min_item_nr - the minimum interactions an item needs to be included in the algorithm (so, if we set this to 5, movies with fewer than 5 positive reviews will be excluded). Not exactly a tuning parameter.
seed - setting the seed so the result is reproducible (can also be done just before implementing the algorithm)
#get the results for this first iteration
set.seed(1122)
results <- evaluate(scheme,
method = "ALS",
type = "topNList",
n = c(5, 10, 20, 40))
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/26.21sec]
#check the results
avg(results)
## TP FP FN TN N precision recall
## [1,] 2.172932 2.827068 22.82707 1554.211 1582.038 0.4345865 0.08691729
## [2,] 3.887218 6.112782 21.11278 1550.925 1582.038 0.3887218 0.15548872
## [3,] 6.556391 13.443609 18.44361 1543.594 1582.038 0.3278195 0.26225564
## [4,] 9.962406 30.037594 15.03759 1527.000 1582.038 0.2490602 0.39849624
## TPR FPR n
## [1,] 0.08691729 0.001828307 5
## [2,] 0.15548872 0.003950513 10
## [3,] 0.26225564 0.008679676 20
## [4,] 0.39849624 0.019380778 40
The true positive rate (total truths correclty detected - tp / tp + fn) increases with more recommendations. I tried running other accuracy metrics (MAE, RMSE, MSE), but they can’t be calculated for binary data., so we will look at precision (accuracy) and recall (TPR) .
I will try with different n_factors (dimensionality) and lambda to see how that affects the results. These are both factors that can contribute to over- and underfitting.
Taking n_factors to 30 and lambda to .3
set.seed(1122)
results_2 <- evaluate(scheme,
method = "ALS",
type = "topNList",
n = c(5, 10, 20, 40),
parameter = list(n_factors = 30,
lambda = .3))
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/26.36sec]
avg(results_2)
## TP FP FN TN N precision recall
## [1,] 2.142857 2.857143 22.85714 1554.180 1582.038 0.4285714 0.08571429
## [2,] 3.894737 6.105263 21.10526 1550.932 1582.038 0.3894737 0.15578947
## [3,] 6.827068 13.172932 18.17293 1543.865 1582.038 0.3413534 0.27308271
## [4,] 10.571429 29.428571 14.42857 1527.609 1582.038 0.2642857 0.42285714
## TPR FPR n
## [1,] 0.08571429 0.001850465 5
## [2,] 0.15578947 0.003940337 10
## [3,] 0.27308271 0.008492353 20
## [4,] 0.42285714 0.018973376 40
This model performs slightly better than the previous model (though it starts off a tiny bit worse). I will try with just a higher k and the same lambda, then the larger lambda and the same k.
results_3 <- evaluate(scheme,
method = "ALS",
type = "topNList",
n = c(5, 10, 20, 40),
parameter = list(n_factors = 30,
lambda = .1))
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/26sec]
avg(results_3)
## TP FP FN TN N precision recall
## [1,] 1.887218 3.112782 23.11278 1553.925 1582.038 0.3774436 0.07548872
## [2,] 3.639098 6.360902 21.36090 1550.677 1582.038 0.3639098 0.14556391
## [3,] 6.368421 13.631579 18.63158 1543.406 1582.038 0.3184211 0.25473684
## [4,] 10.105263 29.894737 14.89474 1527.143 1582.038 0.2526316 0.40421053
## TPR FPR n
## [1,] 0.07548872 0.002010114 5
## [2,] 0.14556391 0.004113724 10
## [3,] 0.25473684 0.008802487 20
## [4,] 0.40421053 0.019291622 40
This is slightly worse than the previous version if we focus on precision and recall (though it gets better with more recommendations).
results_4 <- evaluate(scheme,
method = "ALS",
type = "topNList",
n = c(5, 10, 20, 40),
parameter = list(n_factors = 10,
lambda = .3))
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/24.39sec]
avg(results_4)
## TP FP FN TN N precision recall
## [1,] 2.203008 2.796992 22.79699 1554.241 1582.038 0.4406015 0.0881203
## [2,] 3.887218 6.112782 21.11278 1550.925 1582.038 0.3887218 0.1554887
## [3,] 6.511278 13.488722 18.48872 1543.549 1582.038 0.3255639 0.2604511
## [4,] 10.007519 29.992481 14.99248 1527.045 1582.038 0.2501880 0.4003008
## TPR FPR n
## [1,] 0.0881203 0.001805590 5
## [2,] 0.1554887 0.003944082 10
## [3,] 0.2604511 0.008701780 20
## [4,] 0.4003008 0.019340577 40
This is a little more comparable to the initial model. Generally for ALS the TPR and precision are very high.
Interestingly, if I use a higher lambda, results start strong and get worse. Why? Perhaps because a higher lambda can make the results more generic, which may work for a small number of recommendations (e.g. most popular), but not for anything that accounts for taste/the long tail.
results_5 <- evaluate(scheme,
method = "ALS",
type = "topNList",
n = c(5, 10, 20, 40),
parameter = list(n_factors = 10,
lambda = .6))
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/24.63sec]
avg(results_5)
## TP FP FN TN N precision recall
## [1,] 2.127820 2.872180 22.87218 1554.165 1582.038 0.4255639 0.08511278
## [2,] 3.759398 6.240602 21.24060 1550.797 1582.038 0.3759398 0.15037594
## [3,] 6.496241 13.503759 18.50376 1543.534 1582.038 0.3248120 0.25984962
## [4,] 10.082707 29.917293 14.91729 1527.120 1582.038 0.2520677 0.40330827
## TPR FPR n
## [1,] 0.08511278 0.001850411 5
## [2,] 0.15037594 0.004024590 10
## [3,] 0.25984962 0.008702120 20
## [4,] 0.40330827 0.019281418 40
factors_to_test <- c(5, 10, 20, 30, 40, 50)
set.seed(1122)
results_by_factors <- lapply(factors_to_test, function(k) {
evaluate(scheme,
method = "ALS",
type = "topNList",
n = c(5, 10, 20, 40),
parameter = list(n_factors = k, lambda = 0.1, seed = 1))
})
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/24.36sec]
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/24.53sec]
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/25.65sec]
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/26.63sec]
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/27.24sec]
## ALS run fold/sample [model time/prediction time]
## 1 [0sec/29.38sec]
#check results
avg(results_by_factors[[1]])
## TP FP FN TN N precision recall
## [1,] 1.789474 3.210526 23.21053 1553.827 1582.038 0.3578947 0.07157895
## [2,] 3.300752 6.699248 21.69925 1550.338 1582.038 0.3300752 0.13203008
## [3,] 5.819549 14.180451 19.18045 1542.857 1582.038 0.2909774 0.23278195
## [4,] 9.165414 30.834586 15.83459 1526.203 1582.038 0.2291353 0.36661654
## TPR FPR n
## [1,] 0.07157895 0.002071376 5
## [2,] 0.13203008 0.004321687 10
## [3,] 0.23278195 0.009146352 20
## [4,] 0.36661654 0.019877820 40
#5 suggestions has the highest precision but the lowest true positive rate
#Pull precision at each k into a plottable data frame
prec_df <- do.call(rbind, lapply(seq_along(factors_to_test), function(i) {
a <- avg(results_by_factors[[i]])
data.frame(n_factors = factors_to_test[i],
n = a[, "n"],
precision = a[, "precision"])
}))
#graph results
ggplot(prec_df, aes(x = n, y = precision, color = factor(n_factors), group = factor(n_factors))) +
geom_line() + geom_point() +
labs(title = "ALS Precision by Dimensionality and Number of Suggestions",
x = "Number of Suggestions", y = "Precision",
color = "Dimensionality (n_factors)") +
theme_minimal()
Precision goes down with the number of suggestions (perhaps the first recommendations tend the be the most confident/on target). It’s generally highest for a dimensionality of 20 or 30 (however, it’s lowest for 20 with only 5 suggestions, and switches between 20 and 30 as suggestions increases).
Graphing true positive rate/recall:
tpr_df <- do.call(rbind, lapply(seq_along(factors_to_test), function(i) {
a <- avg(results_by_factors[[i]])
data.frame(n_factors = factors_to_test[i],
n = a[, "n"],
TPR = a[, "TPR"])
}))
ggplot(tpr_df, aes(x = n, y = TPR, color = factor(n_factors), group = factor(n_factors))) +
geom_line() + geom_point() +
labs(title = "ALS TPR (recall) by n_factors (dimensionality) and
List Length (suggestions)",
x = "Number of Suggestions", y = "TPR (recall)", color = "Dimensionality") +
theme_minimal()
Recall is generally highest (at higher volumes) for dimensionalities of 20 and 30 as well. 50 is too large, and perhaps overfitted.
Creating the real ratings matrix with min 30 ratings. This is also not exactly equivalent to the ALS grid (ALS will have fewer true ratings, since anything 1 or 2 was converted to a zero), so this creates issues with comparing the two. This may be evened out by the fact that ALS only has to predict a 0 or 1, whereas SVD must predict a score.
MovieLense30_nb <- MovieLense[rowCounts(MovieLense) > 30, ]
Unlike ALS, SVD requires imputation. Choosing an imputation type:
Mean imputation is the standard for the recommenderlab package. It imputes the mean and adjusts for the user and item biases (like the baseline predictor). Recommenderlab automatically uses mean imputation.
There’s also SVDF, a more advanced and computationally expensive method, that does not require prior imputation.
Similar to ALS, SVD creates three (not two) smaller matrices that account for latent factors in users and items. SVD does not take a long time to run in recommenderlab (it actually takes much less time than ALS).
Let’s look at the different parameters:
#checking parameters
recommenderRegistry$get_entry("SVD", dataType = "realRatingMatrix")
## Recommender method: SVD for realRatingMatrix Description: Recommender
## based on SVD approximation with column-mean imputation. Reference: NA
## Parameters:
## k maxiter normalize
## 1 10 100 "center"
k - dimensionality, as with ALS
maxiter - number of iterations
normalization - centered is the default (imputation, which is required for SVD). You can also use z-score. This imputes the expected rating for each user/item combo using a global baseline estimate, based on the user’s other ratings and the item’s other ratings.
I will go with centered, since it seems like the most neutral option.
Create a recommender:
svd_model <- Recommender(MovieLense30_nb, method = "SVD")
Test the default model’s results:
set.seed(1122)
scheme_svd <- evaluationScheme(MovieLense30_nb,
method = "split",
train = 0.8,
given = -25,
goodRating = 4)
#get the results for this first iteration
set.seed(1122)
results_5 <- evaluate(scheme_svd,
method = "SVD",
type = "topNList",
n = c(5, 10, 20, 40))
## SVD run fold/sample [model time/prediction time]
## 1 [0.08sec/0.05sec]
#check the results
avg(results_5)
## TP FP FN TN N precision recall
## [1,] 0.03424658 4.965753 14.00000 1537.479 1556.479 0.006849315 0.0009728605
## [2,] 0.05479452 9.945205 13.97945 1532.500 1556.479 0.005479452 0.0020995070
## [3,] 0.13013699 19.869863 13.90411 1522.575 1556.479 0.006506849 0.0059093356
## [4,] 0.79452055 39.205479 13.23973 1503.240 1556.479 0.019863014 0.0487307809
## TPR FPR n
## [1,] 0.0009728605 0.003226905 5
## [2,] 0.0020995070 0.006463861 10
## [3,] 0.0059093356 0.012916534 20
## [4,] 0.0487307809 0.025473557 40
Precision and TPR are much lower than ALS. Maybe this is expected because we are trying to predict exact ratings, not just a 1 or a 0. We will look at different dimensionalities, starting with 20.
results_6 <- evaluate(scheme_svd,
method = "SVD",
type = "topNList",
n = c(5, 10, 20, 40),
parameter = list(k = 20))
## SVD run fold/sample [model time/prediction time]
## 1 [0.13sec/0.1sec]
avg(results_6)
## TP FP FN TN N precision recall
## [1,] 0.02739726 4.972603 14.00685 1537.473 1556.479 0.005479452 0.001300753
## [2,] 0.06164384 9.938356 13.97260 1532.507 1556.479 0.006164384 0.004154536
## [3,] 0.17123288 19.828767 13.86301 1522.616 1556.479 0.008561644 0.010413355
## [4,] 0.84931507 39.150685 13.18493 1503.295 1556.479 0.021232877 0.050421223
## TPR FPR n
## [1,] 0.001300753 0.003231646 5
## [2,] 0.004154536 0.006460176 10
## [3,] 0.010413355 0.012890114 20
## [4,] 0.050421223 0.025435820 40
The precision and recall are all slighlty better. Let’s check systematically:
k_to_test <- c(5, 10, 20, 30, 40, 50)
set.seed(1122)
results_by_k <- lapply(k_to_test, function(k) {
evaluate(scheme_svd,
method = "SVD",
type = "topNList",
n = c(5, 10, 20, 40),
parameter = list(k = k))
})
## SVD run fold/sample [model time/prediction time]
## 1 [0.24sec/0.06sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.24sec/0.06sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.14sec/0.22sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.2sec/0.07sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.25sec/0.06sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.31sec/0.07sec]
# check results
avg(results_by_k[[1]])
## TP FP FN TN N precision recall
## [1,] 0.02054795 4.979452 14.01370 1537.466 1556.479 0.004109589 0.0005291485
## [2,] 0.03424658 9.965753 14.00000 1532.479 1556.479 0.003424658 0.0014317807
## [3,] 0.13698630 19.863014 13.89726 1522.582 1556.479 0.006849315 0.0061330341
## [4,] 0.70547945 39.294521 13.32877 1503.151 1556.479 0.017636986 0.0468141330
## TPR FPR n
## [1,] 0.0005291485 0.003236470 5
## [2,] 0.0014317807 0.006478552 10
## [3,] 0.0061330341 0.012912377 20
## [4,] 0.0468141330 0.025535730 40
# Precision data frame
prec_df_svd <- do.call(rbind, lapply(seq_along(k_to_test), function(i) {
a <- avg(results_by_k[[i]])
data.frame(k = k_to_test[i],
n = a[, "n"],
precision = a[, "precision"])
}))
ggplot(prec_df_svd, aes(x = n, y = precision, color = factor(k), group = factor(k))) +
geom_line() + geom_point() +
labs(title = "SVD Precision by Dimensionality (k) and Number of Suggestions",
x = "Number of Suggestions", y = "Precision",
color = "Dimensionality (k)") +
theme_minimal()
# TPR data frame
tpr_df_svd <- do.call(rbind, lapply(seq_along(k_to_test), function(i) {
a <- avg(results_by_k[[i]])
data.frame(k = k_to_test[i],
n = a[, "n"],
TPR = a[, "TPR"])
}))
ggplot(tpr_df_svd, aes(x = n, y = TPR, color = factor(k), group = factor(k))) +
geom_line() + geom_point() +
labs(title = "SVD TPR (recall) by Dimensionality (k) and Number of Suggestions",
x = "Number of Suggestions", y = "TPR (recall)",
color = "Dimensionality (k)") +
theme_minimal()
In general, a higher k means a higher recall or true positive rate. This is not as true for precision - for a smaller number of suggestions, higher ks tend to do worse; also, larger larger dimensions drop off after about 30 suggestions. This could be due to larger dimensionalities not capturing the long tail as well.
It’s worth noting that the drop-off is different from ALS’s - SVD’s metrics stays really low until about 20 suggestions, then increase dramatically. ALS increases sharply until about 20, then the returns diminish (precision declines). Also, ALS performs way better than SVD in terms of precision and recall. This, again, may be due to exact ratings vs. just having to capture a 0 or 1.
UBCF was the top-performing model for project 2. Let’s see how it compares to ALS and SVD.
set.seed(1122)
scheme_ubcf <- evaluationScheme(MovieLense30_nb,
method = "split",
train = 0.8,
given = -25,
goodRating = 4)
#get the results for this first iteration
set.seed(1122)
results_7 <- evaluate(scheme_ubcf,
method = "UBCF",
type = "topNList",
n = c(5, 10, 20, 40))
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.81sec]
avg(results_7)
## TP FP FN TN N precision recall
## [1,] 0.2054795 4.794521 13.82877 1537.651 1556.479 0.04109589 0.01576680
## [2,] 0.3424658 9.657534 13.69178 1532.788 1556.479 0.03424658 0.02471369
## [3,] 0.6369863 19.363014 13.39726 1523.082 1556.479 0.03184932 0.03914411
## [4,] 1.2191781 38.780822 12.81507 1503.664 1556.479 0.03047945 0.07583986
## TPR FPR n
## [1,] 0.01576680 0.003113887 5
## [2,] 0.02471369 0.006273563 10
## [3,] 0.03914411 0.012572364 20
## [4,] 0.07583986 0.025183952 40
#with more optimized paramaters
results_8 <- evaluate(scheme_ubcf, method = "UBCF",
type = "topNList",
n = c(5, 10, 20, 40),
param = list(nn = 60))
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.71sec]
avg(results_8)
## TP FP FN TN N precision recall
## [1,] 0.04794521 4.952055 13.98630 1537.493 1556.479 0.009589041 0.007307777
## [2,] 0.12328767 9.876712 13.91096 1532.568 1556.479 0.012328767 0.013663760
## [3,] 0.28767123 19.712329 13.74658 1522.733 1556.479 0.014383562 0.025438753
## [4,] 0.76712329 39.232877 13.26712 1503.212 1556.479 0.019178082 0.052399706
## TPR FPR n
## [1,] 0.007307777 0.003221674 5
## [2,] 0.013663760 0.006423368 10
## [3,] 0.025438753 0.012816554 20
## [4,] 0.052399706 0.025496313 40
UBCF does better than SVD, though the optimized model comes close to SVD and does worse in some places. It’s worth noting that the precision and recall are actually worse for the optimized model (I’d previously used RMSE and MAE) than for the default settings.
SVDF (funk-SVD) - it uses stochastic gradient descent (an optimization algorithm) to systematically turn the rating matrix into three matrices. This is supposed to minimize error and prevent the algorithm from overfitting. Imputation and centering are not required because SGD fills in the matrix.
It takes forever to run despite stochastic gradient descent’s efficiency, so we will only try it once with the default settings.
#check the params
recommenderRegistry$get_entry("SVDF", dataType = "realRatingMatrix")
## Recommender method: SVDF for realRatingMatrix Description: Recommender
## based on Funk SVD with gradient descend
## (https://sifter.org/~simon/journal/20061211.html). Reference: NA
## Parameters:
## k gamma lambda min_epochs max_epochs min_improvement normalize verbose
## 1 10 0.015 0.001 50 200 1e-06 "center" FALSE
set.seed(1122)
results_9 <- evaluate(scheme_svd,
method = "SVDF",
type = "topNList",
n = c(5, 10, 20, 40))
## SVDF run fold/sample [model time/prediction time]
## 1 [42.23sec/10.54sec]
#check the results
avg(results_9)
## TP FP FN TN N precision recall
## [1,] 0.6780822 4.321918 13.35616 1538.123 1556.479 0.13561644 0.04524586
## [2,] 1.1438356 8.856164 12.89041 1533.589 1556.479 0.11438356 0.08083067
## [3,] 1.9520548 18.047945 12.08219 1524.397 1556.479 0.09760274 0.13833136
## [4,] 3.0068493 36.993151 11.02740 1505.452 1556.479 0.07517123 0.20528458
## TPR FPR n
## [1,] 0.04524586 0.002791825 5
## [2,] 0.08083067 0.005727922 10
## [3,] 0.13833136 0.011685514 20
## [4,] 0.20528458 0.023974498 40
it does much better than SVD with no paramater adjustments. Metrics are about half the best version of ALS (like ALS, precision declines with more suggestions while recall increases). Compared to UBCF, precision for 40 recommendations is .075, recall is .21; UBCF’s optimized model shows .057 for recall and .016 for precision. Without optimizing anything, this model beats UBCF.
ALS has better metrics for this data set, but I don’t recommend it. I would recommend it for a data set with implicit ratings (whether someone watched or didn’t watch something), but not this data set, which I converted to a binary matrix. It may have the best metrics because we are measuring slightly different things - whether the rating was 3 or higher (it’s now a 1 in the binary matrix) vs. what exact score the model thinks the user would give something. With this data set, it also runs the risk of recommending something the user has already seen and does not like. It’s probable that someone saw something but didn’t like it in a genre they enjoy.
SVD beats UBCF’s more optimized model in precision, and is very similar for recall. Without even changing the parameters, SVDF has a .2 recall and .075 precision at 40 recommendations. The drawback is the amount of time it takes to run the model. However, a 20% success rate might be worth it. SVDF and SVD may fail worse than other models with a cold start - imputation means many, many imputed ratings on not a lot of data.
Michael Hahlser, recommenderlab: An R Framework for Developing and Testing Recommendation Algorithms. May 2022. https://arxiv.org/abs/2205.12371
Claude Sonnet, 4.6. [Large language model] Accessed June 2026. Claude.ai (to make the k vs TPR and precision graphs)