DATA 643: Project 3

Singular Value Decomposition Overview
MovieLense Example
- SVD and Sparse Matrices
SVD with Normalized Matrix
RMSE of SVD

Singular Value Decomposition Overview

Rather than using some of the methods explored in previous projects, we can make recommendations to users by estimating the blank entries (unrated items) by using Singular Value Decomposition (SVD). Because most utility matrices used in recommendation systems are quite large with a high level of sparsity, using a dimensionality reduction technique like SVD allows for an approximate representation of of the matrix while retaining the most important parts.

Performing SVD on the original ratings matrix (\(M\)) will result in three different matrices:

\(U\), an \(m \times r\) matrix, where \(r\) is the rank of the utility matrix and the values correspond to the rows. \(U\) is also made up of the eigenvectors of \(MM^T\).
\(\Sigma\), a diagonal matrix where all elements not on the diagonal are 0, and those on the diagonal are the singular values of the original matrix, sorted in descending order. These singular values are also the square roots of the eigenvalues of \(M^TM\) or \(MM^T\). The size of the matrix is \(r \times r\), however, a reduced number of “concepts” can be used by using smaller rank value (\(k\)).
\(V\), an \(n \times r\) matrix, whole values correspond to the columns of the utility matrix, and is made up of the eigenvectors of \(M^TM\). \(V\) is always used in its transposed form, \(V^T\).

MovieLense Example

For this project, we’ll continue to look at the included MovieLense data from the recommenderlab package.

data("MovieLense")
#dense_ratings <- MovieLense[rowCounts(MovieLense) > 25, colCounts(MovieLense) > 100]
#dense_ratings

SVD and Sparse Matrices

Using the custom function below, we can see that the data is quite sparse in the MovieLense set, with only approximately 6.3% of the data containing ratings. One of the issues with SVD is that the matrix cannot contain any missing values, so different methods of imputation (filling with mean or zeros, for example) must be used.

sparsity <- function(ratings){
  nratings(ratings) / (dim(ratings)[1] * dim(ratings)[2])
}

sparsity(MovieLense)

## [1] 0.06334122

Utilzing the normalize function, and then filling in the missing values with 0s (which are now the mean value), we can use SVD on the MovieLense data:

movies_normalized <- normalize(MovieLense)
movies_norm <- as(movies_normalized, "matrix")

movies_norm[is.na(movies_norm)] <- 0

movies_norm[1:5, 1:5]

##   Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995)
## 1         1.394834       -0.6051661         0.3948339        -0.6051661
## 2         0.295082        0.0000000         0.0000000         0.0000000
## 3         0.000000        0.0000000         0.0000000         0.0000000
## 4         0.000000        0.0000000         0.0000000         0.0000000
## 5         1.125714        0.1257143         0.0000000         0.0000000
##   Copycat (1995)
## 1     -0.6051661
## 2      0.0000000
## 3      0.0000000
## 4      0.0000000
## 5      0.0000000

SVD with Normalized Matrix

Using base R’s svd function, we can run it on the matrix above, and obtain the \(U\), \(V\), and \(\Sigma\) matrices.

movies_svd <- svd(movies_norm)

u <- movies_svd$u
v <- movies_svd$v
sigma <- Diagonal(x = movies_svd$d)  # svd$d is only a vector, need Diagonal to turn into a matrix

To obtain the predicted values of user/item pairs, the following equation is used:

\(U_k(\sqrt(\Sigma)^T)(\sqrt(\Sigma)V_k^T\)

The first part of the equation results in the \(m \times k\) matrix, and the second part, the \(n \times k\) matrix. Again, \(k\) is the number of singular values, and in the example below, the full rank (all singular values) matrix is used. According to the text (Mining of Massive Datasets, pp. 424), the sum of squares of the singular values that equal 90% of the sum of all the squares of singular values hsould be used.

R <- (u %*% t(sqrt(sigma))) %*% (sqrt(sigma) %*% t(v))
R[1:5, 1:5]

## 5 x 5 Matrix of class "dgeMatrix"
##               [,1]          [,2]          [,3]          [,4]          [,5]
## [1,]  1.394834e+00 -6.051661e-01  3.948339e-01 -6.051661e-01 -6.051661e-01
## [2,]  2.950820e-01 -5.378603e-13 -1.931648e-14  6.537811e-15 -7.209699e-15
## [3,] -3.779684e-15  5.839553e-15  2.401795e-14 -7.312265e-15  1.516144e-15
## [4,]  6.444840e-15  1.627828e-15  1.540465e-14  2.900976e-14  2.855274e-15
## [5,]  1.125714e+00  1.257143e-01  1.327418e-14 -3.542753e-14  6.944948e-14

In order to predict ratings for a new user, the vector of ratings for the new user would be mapped to the \(V\) matrix given by SVD, rather than comparing the new user to all of the other users as done with previous methods.

RMSE of SVD

The RMSE of the matrix created by SVD can be calculated by using the same formula as before, but just subtracting the difference of the original matrix.

RMSE <- function(rec_mat, orig_mat){
  size <- dim(rec_mat)[1] * dim(rec_mat)[2]
  se <- sum((rec_mat - orig_mat)^2)
  e_val <- se / size
  rmse <- sqrt(e_val)
  return(rmse)
}

Here, we find the RMSE to be very low. However, this is probably a result of how the matrix was normalized, and the method of imputation for the missing values that was used, so we may have some overfitting of the data. Using different combinations of normalizing (centering or z-score), imputation of misisng values (mean of user scores, mean of item scores, overall mean, or zeros), and the \(k\) value used for the optimal number of singular values for the \(Sigma\) (diagonal) matrix should result in the best estimation of missing values to make recommendations.