Implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.
I begin this project with a discussion of Singular Value Decomposition and how it can be used in recommender systems which is then followed by a Toy example. I then implement the SVD on real data. Finally, I compare the two models and evualate their performance.
For any matrix, M, in the set of complex number of dimensions mxn: \(\\ \) \(M \in \mathbb{C}^{mxn}\) \(\\ \)
Can be factorized into its three main component parts:\(\\ \)
\(M = U\Sigma V^{*}\) \(\\ \)
Where \(U \text{= mxm unitary matrix, } \Sigma \text{= an mxn diagonal matrix, and } V^{*} \text{= a transpose of V which is a nxn unitary matix.}\)
The dot product of these three components returns the original matrix. For recommender systems, SVD is a collaborative filtering technique. User Based Collaborative filtering techniques seeks to find relationships between users and user-rated items in a matrix in order to intelligently recommend items to other users based on these relationships.
Using the example above, “U is a m x r orthogonal left singular matrix, which represents the relationship between users and latent factors, S is a r x r diagonal matrix, which describes the strength of each latent factor and V is a r x n diagonal right singular matrix, which indicates the similarity between items and latent factors. The latent factors here are the characteristics of the items, for example, the genre of the music. The SVD decreases the dimension of the utility matrix A by extracting its latent factors. It maps each user and each item into a r-dimensional latent space. This mapping facilitates a clear representation of relationships between users and items.” Citation: SINGULAR VALUE DECOMPOSITION (SVD) & ITS APPLICATION IN RECOMMENDER SYSTEM by DR. VAIBHAV KUMAR, 3/25/2020
In the toy example below, I created a 20x10 matrix of 20 users giving ratings to 10 movies, 5 comedies and 5 dramas. This will be a dense matrix where each user rates a movie on a scale of 1 to 5. This also means that each user has seen all 20 movies.Secondly, I assigned the ratings randomly which assumes that each user does not have a particular taste for one genre over another.
| Yesterday | Knives Out | Jo Jo Rabbit | Good Boys | Zombieland | Joker | The Irishman | Marriage Story | Parasite | Ad Astra |
|---|---|---|---|---|---|---|---|---|---|
| 2 | 5 | 2 | 4 | 2 | 2 | 5 | 2 | 4 | 2 |
| 4 | 4 | 3 | 1 | 4 | 4 | 4 | 3 | 1 | 4 |
| 3 | 4 | 3 | 3 | 3 | 3 | 4 | 3 | 3 | 3 |
| 5 | 5 | 2 | 2 | 4 | 5 | 5 | 2 | 2 | 4 |
| 5 | 4 | 2 | 4 | 1 | 5 | 4 | 2 | 4 | 1 |
| 1 | 4 | 2 | 3 | 3 | 1 | 4 | 2 | 3 | 3 |
| 3 | 3 | 2 | 4 | 5 | 3 | 3 | 2 | 4 | 5 |
| 5 | 3 | 3 | 4 | 5 | 5 | 3 | 3 | 4 | 5 |
| 3 | 2 | 2 | 4 | 5 | 3 | 2 | 2 | 4 | 5 |
| 3 | 2 | 4 | 3 | 2 | 3 | 2 | 4 | 3 | 2 |
| 5 | 5 | 1 | 4 | 2 | 5 | 5 | 1 | 4 | 2 |
| 3 | 5 | 3 | 4 | 4 | 3 | 5 | 3 | 4 | 4 |
| 4 | 4 | 4 | 4 | 2 | 4 | 4 | 4 | 4 | 2 |
| 3 | 4 | 1 | 1 | 4 | 3 | 4 | 1 | 1 | 4 |
| 1 | 1 | 3 | 3 | 2 | 1 | 1 | 3 | 3 | 2 |
| 5 | 3 | 2 | 2 | 2 | 5 | 3 | 2 | 2 | 2 |
| 2 | 4 | 2 | 3 | 4 | 2 | 4 | 2 | 3 | 4 |
| 1 | 2 | 4 | 3 | 1 | 1 | 2 | 4 | 3 | 1 |
| 2 | 2 | 5 | 2 | 3 | 2 | 2 | 5 | 2 | 3 |
| 5 | 2 | 2 | 1 | 3 | 5 | 2 | 2 | 1 | 3 |
Using the R function, SVD, I created the SVD matrix from A and stored each component into the variables, U, Sigma, and V
Sigma is a special matrix with zeroes except on the diagonal.
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 8.264242 0.000000 0.000000 0.000000 0.000000e+00 0.000000e+00
## [2,] 0.000000 6.880135 0.000000 0.000000 0.000000e+00 0.000000e+00
## [3,] 0.000000 0.000000 6.549364 0.000000 0.000000e+00 0.000000e+00
## [4,] 0.000000 0.000000 0.000000 4.633776 0.000000e+00 0.000000e+00
## [5,] 0.000000 0.000000 0.000000 0.000000 2.062719e-15 0.000000e+00
## [6,] 0.000000 0.000000 0.000000 0.000000 0.000000e+00 9.345221e-16
## [7,] 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000e+00
## [8,] 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000e+00
## [9,] 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000e+00
## [10,] 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000e+00
## [,7] [,8] [,9] [,10]
## [1,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [2,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [3,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [4,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [5,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [6,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [7,] 4.771221e-16 0.000000e+00 0.000000e+00 0.000000e+00
## [8,] 0.000000e+00 4.299478e-16 0.000000e+00 0.000000e+00
## [9,] 0.000000e+00 0.000000e+00 4.638853e-17 0.000000e+00
## [10,] 0.000000e+00 0.000000e+00 0.000000e+00 4.172817e-31
U is the Left Singular Matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.12488869 0.31615487 0.253203128 -0.145294363 0.31570695
## [2,] 0.21158159 -0.16884147 -0.025775501 0.461069432 0.12006594
## [3,] 0.20142369 0.21863714 0.259807608 0.223328859 0.36430385
## [4,] 0.32693663 -0.14054134 0.077605437 0.153588875 0.19824291
## [5,] 0.09697744 -0.12128183 0.353545274 -0.327278503 0.09113037
## [6,] 0.12544501 0.40755375 -0.002214306 0.053939009 0.02375480
## [7,] 0.16706288 0.10130947 -0.346017308 -0.261859794 -0.06105016
## [8,] 0.13753108 -0.25139470 -0.275806239 -0.260190333 0.13399714
## [9,] 0.07700449 0.01360064 -0.391696633 -0.305590914 0.24445613
## [10,] -0.33041467 -0.14623695 0.111295965 0.001648263 -0.28381272
## [11,] 0.23989723 -0.04235175 0.268534575 -0.296793679 -0.25334000
## [12,] 0.22025012 0.34242399 0.056372883 0.002916405 -0.57044012
## [13,] -0.11387204 -0.03141331 0.424789094 -0.123522223 0.09437724
## [14,] 0.33331941 -0.01378286 -0.080422408 0.228956532 -0.12010697
## [15,] -0.31357425 0.14677715 -0.141786801 -0.136872435 0.11693824
## [16,] 0.16398852 -0.33500526 0.203443906 -0.059289444 -0.24693447
## [17,] 0.23519960 0.30054092 -0.142806520 0.057680857 -0.09501954
## [18,] -0.31434841 0.16871814 0.140898427 0.049283640 0.14489816
## [19,] -0.27257591 -0.01344144 -0.101213833 0.402447609 -0.10802648
## [20,] 0.14603303 -0.39392145 -0.030107286 0.086123873 0.11736681
## [,6] [,7] [,8] [,9] [,10]
## [1,] -0.779445222 -0.157176215 0.100146806 0.010267283 0.002684606
## [2,] -0.164425705 0.495312555 0.046318702 0.092314918 -0.031282677
## [3,] 0.465057585 -0.359007646 0.505593354 0.075316635 0.013797395
## [4,] 0.062007913 -0.459900586 -0.702456705 -0.256559362 -0.008677261
## [5,] 0.110837562 0.274012117 0.054570018 -0.408802059 -0.672721791
## [6,] -0.057716049 0.075863669 -0.031074502 -0.002966979 -0.033234840
## [7,] 0.025475996 -0.147230447 0.162896994 -0.137427010 0.080831047
## [8,] -0.105206020 -0.064795178 0.108028730 0.036475973 0.042039056
## [9,] 0.086502979 -0.072451108 0.095848544 -0.002635708 -0.188236923
## [10,] -0.111521490 -0.331644171 0.137644685 -0.096875613 -0.093857180
## [11,] 0.052488501 -0.097568609 -0.155869829 0.772212460 -0.208009184
## [12,] 0.050442936 -0.008363157 0.009937926 -0.253690383 0.025873605
## [13,] 0.143415737 0.114860414 -0.027190109 -0.049199037 0.313682236
## [14,] -0.069772194 0.007646291 0.089203960 -0.007748251 -0.126134128
## [15,] 0.090908732 -0.027995163 -0.135769814 0.067883047 -0.062249879
## [16,] -0.126085341 -0.152957236 0.257969106 -0.177057445 0.210163791
## [17,] 0.117309008 0.100959937 -0.082451459 0.036321224 -0.140487215
## [18,] 0.074803673 0.081468140 -0.162756448 0.044456853 -0.032607753
## [19,] -0.154249090 -0.321561765 0.098155909 0.075286595 -0.511720553
## [20,] 0.002257128 -0.019061988 0.086476455 0.122752450 -0.060006421
V is the Right Singular Matrix
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.1606829 -0.57792941 0.15341176 -0.1290094 -0.1862738 -0.67309897
## [2,] 0.3509315 0.31712440 0.35872341 0.2181668 -0.2981999 -0.04035824
## [3,] -0.5276481 -0.03413488 0.05969463 0.3418462 -0.3732062 -0.33727279
## [4,] -0.1823604 0.24937615 0.01468802 -0.5516706 -0.3328740 -0.05210360
## [5,] 0.1983942 0.04556374 -0.58651782 0.1206671 -0.3098544 -0.03822333
## [6,] 0.1606829 -0.57792941 0.15341176 -0.1290094 -0.4334349 0.59665232
## [7,] 0.3509315 0.31712440 0.35872341 0.2181668 -0.3215089 -0.03608841
## [8,] -0.5276481 -0.03413488 0.05969463 0.3418462 -0.2465025 0.26082614
## [9,] -0.1823604 0.24937615 0.01468802 -0.5516706 -0.2868347 -0.02434305
## [10,] 0.1983942 0.04556374 -0.58651782 0.1206671 -0.3098544 -0.03822333
## [,7] [,8] [,9] [,10]
## [1,] -0.25549383 0.21665464 -0.0048491063 -3.653051e-15
## [2,] -0.18678983 -0.15550006 -0.6711015892 3.344319e-15
## [3,] 0.41360107 -0.41896880 0.0190656271 -2.889688e-15
## [4,] 0.42125243 0.50506483 -0.2322412609 6.754800e-16
## [5,] -0.04358234 0.02495286 -0.0026795854 -7.071068e-01
## [6,] 0.16832915 -0.16674892 -0.0005100645 1.925451e-15
## [7,] 0.09962515 0.20540577 0.6657424184 -5.172332e-15
## [8,] -0.50076575 0.46887452 -0.0244247978 1.106110e-15
## [9,] -0.50841711 -0.45515912 0.2268820902 -2.389677e-15
## [10,] -0.04358234 0.02495286 -0.0026795854 7.071068e-01
Columns of U and V are orthogonal.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 0 0 0 0 0 0 0 0 0
## [2,] 0 1 0 0 0 0 0 0 0 0
## [3,] 0 0 1 0 0 0 0 0 0 0
## [4,] 0 0 0 1 0 0 0 0 0 0
## [5,] 0 0 0 0 1 0 0 0 0 0
## [6,] 0 0 0 0 0 1 0 0 0 0
## [7,] 0 0 0 0 0 0 1 0 0 0
## [8,] 0 0 0 0 0 0 0 1 0 0
## [9,] 0 0 0 0 0 0 0 0 1 0
## [10,] 0 0 0 0 0 0 0 0 0 1
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 0 0 0 0 0 0 0 0 0
## [2,] 0 1 0 0 0 0 0 0 0 0
## [3,] 0 0 1 0 0 0 0 0 0 0
## [4,] 0 0 0 1 0 0 0 0 0 0
## [5,] 0 0 0 0 1 0 0 0 0 0
## [6,] 0 0 0 0 0 1 0 0 0 0
## [7,] 0 0 0 0 0 0 1 0 0 0
## [8,] 0 0 0 0 0 0 0 1 0 0
## [9,] 0 0 0 0 0 0 0 0 1 0
## [10,] 0 0 0 0 0 0 0 0 0 1
Taking the dot product of the three matrixes returns us back to the original matrix.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] -1 1 -1 1 -1 -1 1 -1 1 -1
## [2,] 1 1 0 -2 1 1 1 0 -2 1
## [3,] 0 2 0 0 0 0 2 0 0 0
## [4,] 1 1 -1 -1 0 1 1 -1 -1 0
## [5,] 1 1 -1 1 -1 1 1 -1 1 -1
## [6,] -1 1 -1 0 0 -1 1 -1 0 0
## [7,] 0 0 -1 1 1 0 0 -1 1 1
## [8,] 1 -1 -1 0 1 1 -1 -1 0 1
## [9,] 0 -1 -1 1 1 0 -1 -1 1 1
## [10,] 0 -1 2 0 -1 0 -1 2 0 -1
## [11,] 1 1 -1 0 -1 1 1 -1 0 -1
## [12,] -1 2 -1 0 0 -1 2 -1 0 0
## [13,] 0 0 0 0 -2 0 0 0 0 -2
## [14,] 0 1 -1 -1 1 0 1 -1 -1 1
## [15,] -1 -1 1 1 0 -1 -1 1 1 0
## [16,] 2 0 -1 -1 -1 2 0 -1 -1 -1
## [17,] -1 1 -1 0 1 -1 1 -1 0 1
## [18,] -1 0 1 1 -1 -1 0 1 1 -1
## [19,] -1 -1 2 -1 0 -1 -1 2 -1 0
## [20,] 2 0 0 -1 0 2 0 0 -1 0
In the custom function below, I reduced the number of dimensions in order to achieve 80% of the “energy” of the original diagonal matrix Sigma.
get_reduced_dimensions <- function(d){
#capture the number of dimensions
n <- length(d)
#compute the energy of all dimensions
energy <- sum((d)^2)
for (i in 1:n-1){
reduced_dimensions <- n-i
new_energy <- round((sum((d[1:reduced_dimensions])^2) / energy),2)
if (new_energy < .85){
break
}
}
return (reduced_dimensions)
}For this example, the number of dimensions is reduced to 2.
## [1] 2
Below, we see that we can reduce the number of dimensions down to just 1 and still retain .933 of the energy of the original diagonal matrix.
## [1] 0.6424109
In the code blocks below we see the three component matrixes reduced.
## [,1] [,2]
## [1,] 0.12488869 0.31615487
## [2,] 0.21158159 -0.16884147
## [3,] 0.20142369 0.21863714
## [4,] 0.32693663 -0.14054134
## [5,] 0.09697744 -0.12128183
## [6,] 0.12544501 0.40755375
## [7,] 0.16706288 0.10130947
## [8,] 0.13753108 -0.25139470
## [9,] 0.07700449 0.01360064
## [10,] -0.33041467 -0.14623695
## [11,] 0.23989723 -0.04235175
## [12,] 0.22025012 0.34242399
## [13,] -0.11387204 -0.03141331
## [14,] 0.33331941 -0.01378286
## [15,] -0.31357425 0.14677715
## [16,] 0.16398852 -0.33500526
## [17,] 0.23519960 0.30054092
## [18,] -0.31434841 0.16871814
## [19,] -0.27257591 -0.01344144
## [20,] 0.14603303 -0.39392145
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.1606829 -0.5779294 0.1534118 -0.1290094 -0.1862738 -0.67309897
## [2,] 0.3509315 0.3171244 0.3587234 0.2181668 -0.2981999 -0.04035824
## [,7] [,8] [,9] [,10]
## [1,] -0.2554938 0.2166546 -0.004849106 -3.653051e-15
## [2,] -0.1867898 -0.1555001 -0.671101589 3.344319e-15
In the code blocks below, we map two new users onto the concept space. Even though both users rated a comed(ies) higher, we do not see a similarity in terms of the two dimensional concept space between the two users.
This may be due to how the original matrix in this example was created by assigning the ratings randomly, not accounting for personal tastes of the users. Also, there’s sparsity in these two matrixes.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 5 0 0 0 0 0 1 0 1 1
## [,1] [,2]
## [1,] 0.5430716 0.8967659
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0 4 5 4 0 0 0 0 0 1
## [,1] [,2]
## [1,] -2.060696 3.934782
In the next section, I applied the SVD to actual movie ratings data using the MovieLens dataset.
I began by downing the the MovieLense rating matrix which is within the Recommenderlab R package Link
## 943 x 1664 rating matrix of class 'realRatingMatrix' with 99392 ratings.
The movie_ratings_matrix is comprised of 147 users and movies that have been watched least 116 movies.
ratings_movies <- MovieLense[rowCounts(MovieLense) > 200, colCounts(MovieLense) > 200]
ratings_movies_norm <- recommenderlab::normalize(ratings_movies)
movie_ratings_matrix <- as.matrix(ratings_movies_norm@data)
dim(movie_ratings_matrix)## [1] 147 116
I put the movie_ratings_matrix through the same process as the Toy matrix from part one. The key differences are that the movie_ratings_matrix won’t be a densely populated as the Toy matrix. Also, the ratings are not assigned randomly. Here we will have real user sentiment.
## [1] 116
## [1] 147 116
## [1] 116 116
I was able to reduce the number of dimensions of the movie_ratings_matrix from 116 to 46.
## [1] 46
## [1] 0.8415052
I reduced the dimensions of the components of the main matrix.
Here, I put the SVD recommender to the task of recommending movies to user 35. First, I projected user 35’s ratings onto the movies concept space by taking the dot product of user 35’s ratings with the concept space denoted by the transpose of A1V.
I found 11 movies that the SVD recommender would recommend (>1) to user 35
#Project the ratings of user 35 onto the concept space.
svd_rec <- movie_ratings_matrix[35,] %*% t(A1V_prime)
svd_rec_df <- as.data.frame(t(svd_rec))
svd_rec_df["Titles"] <- colnames(movie_ratings_matrix)[1:reduced_dimensions2]
colnames(svd_rec_df) <- c("SVD_Ratings", "Titles")
svd_rec_df <- svd_rec_df[,c(2,1)]
rownames(svd_rec_df) <- c(1:nrow(svd_rec_df))
svd_rec_df <- svd_rec_df %>%
arrange(desc(SVD_Ratings))
svd_rec_df[svd_rec_df$SVD_Ratings >1,]## Titles SVD_Ratings
## 1 Fish Called Wanda, A (1988) 2.444057
## 2 Aladdin (1992) 2.339122
## 3 Toy Story (1995) 2.126828
## 4 Monty Python and the Holy Grail (1974) 2.022419
## 5 Twister (1996) 1.858616
## 6 Wizard of Oz, The (1939) 1.762824
## 7 Usual Suspects, The (1995) 1.600618
## 8 Sleepless in Seattle (1993) 1.269091
## 9 2001: A Space Odyssey (1968) 1.267198
## 10 Mr. Holland's Opus (1995) 1.256991
## 11 Seven (Se7en) (1995) 1.148647
Of these 11 movies that SVD recommended, User 35 had rated three of them. Not rated 4, and rated another 4 negatively.
userSample35 <- as.data.frame(movie_ratings_matrix[35,])
userSample35["Titles"] <- rownames(userSample35)
userSample35 <- userSample35[,c(2,1)]
colnames(userSample35)<- c("Titles", "User35_Rating")
rownames(userSample35) <- c(1:nrow(userSample35))
userSample35 <- userSample35 %>%
arrange(desc(User35_Rating))
recommended_to_35 <- userSample35[userSample35$Titles %in% recommended_titles,]
recommended_to_35## [1] Titles User35_Rating
## <0 rows> (or 0-length row.names)
Below are the 4 movies that the SVD recommended to User 35 that User 35 had rated very lowly.
user35_dislikes <- userSample35[userSample35$Titles %in% recommended_titles & userSample35$User35_Rating < 0,]
user35_dislikes## [1] Titles User35_Rating
## <0 rows> (or 0-length row.names)
In summary, we can say that a manual implementation of the SVD yielded about a 27% recommendation success rate with a 36% failure rate. We can also say that of the 4 movies that the SVD recommended to User 35, s/he will like only one of them. More than likely, “Usual Suspects, The (1995)”.
In the final section, I take a more formal evaluation approach and compare the SVD model to the User Based Collaborative Filterin model.
k_fold_split <- function(data, method, n_fold, items_to_keep, rating_threshold){
return(evaluationScheme(data = data, method = method, k = n_fold, given = items_to_keep, goodRating = rating_threshold))
}## [1] 126 126 126 126 126 126 126 126
To evaluate the two models, I created a list of UBCF and SVD models with different parameters (pearson, cosine, and null for the SVD model). This list was fed into the evaluate function and below. The visualizations below both show that the UBCF using the “pearson” method had the larger area under the curve.
recommender_models <- recommenderRegistry$get_entries(dataType ="realRatingMatrix")
recommender_models$SVD_realRatingMatrix$parameters## $k
## [1] 10
##
## $maxiter
## [1] 100
##
## $normalize
## [1] "center"
## $method
## [1] "cosine"
##
## $nn
## [1] 25
##
## $sample
## [1] FALSE
##
## $normalize
## [1] "center"
models_to_evaluate <- list(UBCF_cos = list(name = "UBCF", param = list(method ="cosine")),
UBCF_cor = list(name = "UBCF", param = list(method ="pearson")),
SVD = list(name = "SVD", param=NULL))
n_recommendations <- c(1, 5, seq(10, 100, 10))## UBCF run fold/sample [model time/prediction time]
## 1 [0.02sec/0.05sec]
## 2 [0sec/0.06sec]
## 3 [0sec/0.02sec]
## 4 [0sec/0.01sec]
## 5 [0sec/0.02sec]
## 6 [0.02sec/0.01sec]
## 7 [0sec/0.02sec]
## 8 [0sec/0.02sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.01sec]
## 2 [0sec/0.01sec]
## 3 [0sec/0.02sec]
## 4 [0sec/0.02sec]
## 5 [0sec/0.01sec]
## 6 [0sec/0.01sec]
## 7 [0sec/0.02sec]
## 8 [0sec/0.02sec]
## SVD run fold/sample [model time/prediction time]
## 1 [0.01sec/0sec]
## 2 [0sec/0.01sec]
## 3 [0sec/0sec]
## 4 [0sec/0.01sec]
## 5 [0sec/0.01sec]
## 6 [0.02sec/0sec]
## 7 [0sec/0.02sec]
## 8 [0sec/0sec]
## integer(0)
## $UBCF_cos
## TP FP FN TN precision recall TPR
## 1 0.7321429 0.2678571 57.107143 47.892857 0.7321429 0.01309719 0.01309719
## 5 3.6190476 1.3809524 54.220238 46.779762 0.7238095 0.06308192 0.06308192
## 10 6.9166667 3.0833333 50.922619 45.077381 0.6916667 0.12159588 0.12159588
## 20 13.1488095 6.8511905 44.690476 41.309524 0.6574405 0.23069135 0.23069135
## 30 19.0773810 10.9226190 38.761905 37.238095 0.6359127 0.33373189 0.33373189
## 40 25.0773810 14.9226190 32.761905 33.238095 0.6269345 0.43763359 0.43763359
## 50 30.6130952 19.3869048 27.226190 28.773810 0.6122619 0.53223866 0.53223866
## 60 36.0892857 23.9107143 21.750000 24.250000 0.6014881 0.62635088 0.62635088
## 70 41.6488095 28.3511905 16.190476 19.809524 0.5949830 0.72389868 0.72389868
## 80 46.4702381 33.5297619 11.369048 14.630952 0.5808780 0.80631695 0.80631695
## 90 51.0178571 38.9821429 6.821429 9.178571 0.5668651 0.88450125 0.88450125
## 100 55.3630952 44.6369048 2.476190 3.523810 0.5536310 0.95929082 0.95929082
## FPR
## 1 0.004920696
## 5 0.025619306
## 10 0.060597975
## 20 0.138903446
## 30 0.223048226
## 40 0.305263160
## 50 0.396820943
## 60 0.489769377
## 70 0.582047823
## 80 0.690164542
## 90 0.806528233
## 100 0.926202762
##
## $UBCF_cor
## TP FP FN TN precision recall TPR
## 1 0.7797619 0.2202381 57.059524 47.940476 0.7797619 0.01362783 0.01362783
## 5 3.7916667 1.2083333 54.047619 46.952381 0.7583333 0.06715826 0.06715826
## 10 7.1726190 2.8273810 50.666667 45.333333 0.7172619 0.12661884 0.12661884
## 20 13.6488095 6.3511905 44.190476 41.809524 0.6824405 0.24138699 0.24138699
## 30 19.6845238 10.3154762 38.154762 37.845238 0.6561508 0.34569971 0.34569971
## 40 25.5952381 14.4047619 32.244048 33.755952 0.6398810 0.44867168 0.44867168
## 50 31.3214286 18.6785714 26.517857 29.482143 0.6264286 0.54785881 0.54785881
## 60 36.8154762 23.1845238 21.023810 24.976190 0.6135913 0.64162625 0.64162625
## 70 42.0357143 27.9642857 15.803571 20.196429 0.6005102 0.73149773 0.73149773
## 80 46.8154762 33.1845238 11.023810 14.976190 0.5851935 0.81319244 0.81319244
## 90 51.4107143 38.5892857 6.428571 9.571429 0.5712302 0.89170461 0.89170461
## 100 55.5238095 44.4761905 2.315476 3.684524 0.5552381 0.96120372 0.96120372
## FPR
## 1 0.004050474
## 5 0.023419391
## 10 0.055886420
## 20 0.128333107
## 30 0.208123748
## 40 0.293958117
## 50 0.383662801
## 60 0.476357845
## 70 0.574105896
## 80 0.684013372
## 90 0.797297860
## 100 0.922120660
##
## $SVD
## TP FP FN TN precision recall TPR
## 1 0.7678571 0.2321429 57.071429 47.928571 0.7678571 0.01356337 0.01356337
## 5 3.6428571 1.3571429 54.196429 46.803571 0.7285714 0.06492456 0.06492456
## 10 6.9880952 3.0119048 50.851190 45.148810 0.6988095 0.12423557 0.12423557
## 20 13.0654762 6.9345238 44.773810 41.226190 0.6532738 0.22984463 0.22984463
## 30 18.9464286 11.0535714 38.892857 37.107143 0.6315476 0.33305743 0.33305743
## 40 24.6428571 15.3571429 33.196429 32.803571 0.6160714 0.43157439 0.43157439
## 50 29.9880952 20.0119048 27.851190 28.148810 0.5997619 0.52335405 0.52335405
## 60 35.2202381 24.7797619 22.619048 23.380952 0.5870040 0.61315416 0.61315416
## 70 40.4404762 29.5595238 17.398810 18.601190 0.5777211 0.70228899 0.70228899
## 80 45.4404762 34.5595238 12.398810 13.601190 0.5680060 0.78746813 0.78746813
## 90 50.4583333 39.5416667 7.380952 8.619048 0.5606481 0.87325001 0.87325001
## 100 55.0892857 44.9107143 2.750000 3.250000 0.5508929 0.95232488 0.95232488
## FPR
## 1 0.004418973
## 5 0.027713119
## 10 0.061456056
## 20 0.141856640
## 30 0.227157736
## 40 0.314263089
## 50 0.410835990
## 60 0.510507924
## 70 0.608971849
## 80 0.714144416
## 90 0.817527646
## 100 0.931696091
set.seed(123)
model_to_evaluate <- "SVD"
model_parameters <- NULL
SVD_eval_recommender <- Recommender(data = getData(eval_sets, "train"),method = model_to_evaluate, parameter = model_parameters)
items_to_recommend <- 5
SVD_eval_prediction <- predict(object = SVD_eval_recommender, newdata=getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")SVD_eval_accuracy <- calcPredictionAccuracy(x = SVD_eval_prediction, data = getData(eval_sets, "unknown"), byUser =TRUE)
head(SVD_eval_accuracy)## RMSE MSE MAE
## 7 0.8652095 0.7485875 0.7185775
## 13 1.2740816 1.6232839 1.0979437
## 59 1.1829367 1.3993392 0.7697918
## 145 1.1176664 1.2491781 0.9455625
## 194 1.0487255 1.0998253 0.8717113
## 314 1.2985812 1.6863131 0.9187730
SVD_eval_accuracy_full <- calcPredictionAccuracy(x = SVD_eval_prediction, data = getData(eval_sets, "unknown"), byUser =FALSE)
SVD_eval_accuracy_full## RMSE MSE MAE
## 1.0195877 1.0395591 0.8050596
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
set.seed(123)
model_to_evaluate <- "UBCF"
model_parameters <- "pearson"
UBCF_eval_recommender <- Recommender(data = getData(eval_sets, "train"),method = model_to_evaluate, parameter = model_parameters)
items_to_recommend <- 5
UBCF_eval_prediction <- predict(object = UBCF_eval_recommender, newdata=getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")UBCF_eval_accuracy <- calcPredictionAccuracy(x = UBCF_eval_prediction, data = getData(eval_sets, "unknown"), byUser =TRUE)
head(UBCF_eval_accuracy,20)## RMSE MSE MAE
## 7 0.8172493 0.6678964 0.6736455
## 13 1.1810835 1.3949581 0.9845381
## 59 1.1551622 1.3343997 0.7479544
## 145 1.0628515 1.1296534 0.8978653
## 194 0.9384819 0.8807483 0.7610926
## 314 1.3267070 1.7601516 0.9592185
## 363 1.2236546 1.4973306 1.0022238
## 393 0.8385594 0.7031819 0.6577984
## 437 1.0567320 1.1166825 0.8219672
## 457 0.5728772 0.3281883 0.4261889
## 524 0.9582465 0.9182363 0.8226710
## 618 0.8456415 0.7151095 0.6352287
## 682 0.8131040 0.6611380 0.6452044
## 716 1.1388042 1.2968750 0.8874681
## 807 0.7694216 0.5920096 0.6513251
## 843 1.1527737 1.3288873 1.0204700
## 854 0.9375588 0.8790165 0.7359421
## 880 0.7721467 0.5962105 0.5873766
## 916 0.9253646 0.8562997 0.7636788
## 919 0.9315090 0.8677091 0.7501819
UBCF_eval_accuracy_full <- calcPredictionAccuracy(x = UBCF_eval_prediction, data = getData(eval_sets, "unknown"), byUser =FALSE)
UBCF_eval_accuracy_full## RMSE MSE MAE
## 0.9799797 0.9603602 0.7649969
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This project looked at the SVD model against randomly assigned user ratings, user ratings from the MovieLens dataset, and did a formal comparison between SVD and the User Based Collaborative filters. We saw that the SVD model performed poorly against randomly assigned ratings, improved somewhat against actual user ratings, and did not perform as well as the UBCF models.
Singular Value Decomposition (SVD) tutorial - MIT
Lectures 47 through 50 Stanford University Singular Value Decomposition