Rather than using some of the methods explored in previous projects, we can make recommendations to users by estimating the blank entries (unrated items) by using Singular Value Decomposition (SVD). Because most utility matrices used in recommendation systems are quite large with a high level of sparsity, using a dimensionality reduction technique like SVD allows for an approximate representation of of the matrix while retaining the most important parts.
Performing SVD on the original ratings matrix (\(M\)) will result in three different matrices:
For this project, we’ll continue to look at the included MovieLense
data from the recommenderlab
package.
data("MovieLense")
#dense_ratings <- MovieLense[rowCounts(MovieLense) > 25, colCounts(MovieLense) > 100]
#dense_ratings
Using the custom function below, we can see that the data is quite sparse in the MovieLense
set, with only approximately 6.3% of the data containing ratings. One of the issues with SVD is that the matrix cannot contain any missing values, so different methods of imputation (filling with mean or zeros, for example) must be used.
sparsity <- function(ratings){
nratings(ratings) / (dim(ratings)[1] * dim(ratings)[2])
}
sparsity(MovieLense)
## [1] 0.06334122
Utilzing the normalize
function, and then filling in the missing values with 0s (which are now the mean value), we can use SVD on the MovieLense data:
movies_normalized <- normalize(MovieLense)
movies_norm <- as(movies_normalized, "matrix")
movies_norm[is.na(movies_norm)] <- 0
movies_norm[1:5, 1:5]
## Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995)
## 1 1.394834 -0.6051661 0.3948339 -0.6051661
## 2 0.295082 0.0000000 0.0000000 0.0000000
## 3 0.000000 0.0000000 0.0000000 0.0000000
## 4 0.000000 0.0000000 0.0000000 0.0000000
## 5 1.125714 0.1257143 0.0000000 0.0000000
## Copycat (1995)
## 1 -0.6051661
## 2 0.0000000
## 3 0.0000000
## 4 0.0000000
## 5 0.0000000
Using base R’s svd
function, we can run it on the matrix above, and obtain the \(U\), \(V\), and \(\Sigma\) matrices.
movies_svd <- svd(movies_norm)
u <- movies_svd$u
v <- movies_svd$v
sigma <- Diagonal(x = movies_svd$d) # svd$d is only a vector, need Diagonal to turn into a matrix
To obtain the predicted values of user/item pairs, the following equation is used:
\(U_k(\sqrt(\Sigma)^T)(\sqrt(\Sigma)V_k^T\)
The first part of the equation results in the \(m \times k\) matrix, and the second part, the \(n \times k\) matrix. Again, \(k\) is the number of singular values, and in the example below, the full rank (all singular values) matrix is used. According to the text (Mining of Massive Datasets, pp. 424), the sum of squares of the singular values that equal 90% of the sum of all the squares of singular values hsould be used.
R <- (u %*% t(sqrt(sigma))) %*% (sqrt(sigma) %*% t(v))
R[1:5, 1:5]
## 5 x 5 Matrix of class "dgeMatrix"
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.394834e+00 -6.051661e-01 3.948339e-01 -6.051661e-01 -6.051661e-01
## [2,] 2.950820e-01 -5.378603e-13 -1.931648e-14 6.537811e-15 -7.209699e-15
## [3,] -3.779684e-15 5.839553e-15 2.401795e-14 -7.312265e-15 1.516144e-15
## [4,] 6.444840e-15 1.627828e-15 1.540465e-14 2.900976e-14 2.855274e-15
## [5,] 1.125714e+00 1.257143e-01 1.327418e-14 -3.542753e-14 6.944948e-14
In order to predict ratings for a new user, the vector of ratings for the new user would be mapped to the \(V\) matrix given by SVD, rather than comparing the new user to all of the other users as done with previous methods.
The RMSE of the matrix created by SVD can be calculated by using the same formula as before, but just subtracting the difference of the original matrix.
RMSE <- function(rec_mat, orig_mat){
size <- dim(rec_mat)[1] * dim(rec_mat)[2]
se <- sum((rec_mat - orig_mat)^2)
e_val <- se / size
rmse <- sqrt(e_val)
return(rmse)
}
Here, we find the RMSE to be very low. However, this is probably a result of how the matrix was normalized, and the method of imputation for the missing values that was used, so we may have some overfitting of the data. Using different combinations of normalizing (centering or z-score), imputation of misisng values (mean of user scores, mean of item scores, overall mean, or zeros), and the \(k\) value used for the optimal number of singular values for the \(Sigma\) (diagonal) matrix should result in the best estimation of missing values to make recommendations.