Singular Value Decomposition (SVD) is a valuable tool in the context of recommendation engines since it allows us to uncover latent features in a ratings matrix which are generated from the collective behavior of users. Additionally, we can quantify the relative strength of these features and create approximations with only a subset. This dimensional reduction operation can reduce the problem of overfitting, leading to more robust and compact representations of the data.
In this project, we’ll exemplify the utility of SVD by first decomposing a ratings matrix and later creating approximations using different singular values. By comparing the RMSE of the approximations vs the original matrix, we’ll see how the number of singular values affects the error.
library(recommenderlab)
library(tidyverse)
As a first step, we’ll create a hypothetical ratings matrix, consisting of 5 items and 5 users.
set.seed(1)
sample_m <- matrix(sample(1:20,20),5,5)
sample_m
[,1] [,2] [,3] [,4] [,5]
[1,] 4 19 18 15 4
[2,] 7 11 5 12 7
[3,] 1 17 9 10 1
[4,] 2 14 16 20 2
[5,] 13 3 6 8 13
We can now decompose the matrix. We see that the values of the diagonal matrix sigma (here d) are ranked and vary widely. This magnitude represents the strength of each latent feature and will serve as a guide to determine which could be removed.
sample_m.svd <- svd(sample_m)
sample_m.svd
$d
[1] 5.204189e+01 1.788957e+01 7.289431e+00 5.241076e+00 4.369955e-16
$u
[,1] [,2] [,3] [,4] [,5]
[1,] -0.5828116 0.1819515 -0.08129541 0.63589168 -0.4650346
[2,] -0.3560074 -0.2915283 -0.33147937 -0.65266871 -0.5024091
[3,] -0.3956197 0.2960241 -0.61185202 -0.07738601 0.6127832
[4,] -0.5486571 0.2169155 0.71071652 -0.32036354 0.2101721
[5,] -0.2757714 -0.8644202 0.06349492 0.24707070 0.3341442
$v
[,1] [,2] [,3] [,4] [,5]
[1,] -0.1902556 -0.66074795 -0.1386284 0.08942665 -7.071068e-01
[2,] -0.5807542 0.32008771 -0.7479122 -0.02992771 0.000000e+00
[3,] -0.5046774 0.15460563 0.4287099 0.73321597 -2.914335e-16
[4,] -0.5793366 -0.02156999 0.4673322 -0.66746118 2.220446e-16
[5,] -0.1902556 -0.66074795 -0.1386284 0.08942665 7.071068e-01
If we were to obtain the dot product of u,d and v’ we could reconstruct the original matrix precisely but this wouldn’t be very helpful. Instead, we can limit or reconstruction the largest n values in sigma (d). In this first example, we’ll use only the first two terms.
ds <- diag(sample_m.svd$d[1:2])
us <- as.matrix(sample_m.svd$u[, 1:2])
vs <- as.matrix(sample_m.svd$v[, 1:2])
sample_m.1 <- us %*% ds %*% t(vs)
sample_m.1
[,1] [,2] [,3] [,4] [,5]
[1,] 3.6198119 18.656531 15.810422 17.501425 3.6198119
[2,] 6.9709332 9.090449 8.543992 10.846038 6.9709332
[3,] 0.4179801 13.652134 11.209451 11.813614 0.4179801
[4,] 2.8683467 17.824473 15.010080 16.458183 2.8683467
[5,] 12.9483630 3.384919 4.852122 8.648005 12.9483630
Eventhough we are using only two features to reconstruct the data, we can see that the values are very similar similar. If we instead use the 3 largest singular values, the values appriximate the original matrix even closer.
ds <- diag(sample_m.svd$d[1:3])
us <- as.matrix(sample_m.svd$u[, 1:3])
vs <- as.matrix(sample_m.svd$v[, 1:3])
sample_m.2 <- us %*% ds %*% t(vs)
sample_m.2
[,1] [,2] [,3] [,4] [,5]
[1,] 3.701963 19.099742 15.556370 17.224485 3.701963
[2,] 7.305901 10.897627 7.508102 9.716825 7.305901
[3,] 1.036270 16.987862 9.297382 9.729287 1.036270
[4,] 2.150152 13.949750 17.231106 18.879300 2.150152
[5,] 12.884200 3.038754 5.050547 8.864306 12.884200
As we expect, the RMSE decreases as we add more features and would decrease to 0 if we were to include all of them, effectively reconstrucitng the origiinal matrix.
RMSE(sample_m, sample_m.1)
[1] 1.795602
RMSE(sample_m, sample_m.2)
[1] 1.048215
More importantly, we see that SVD allows us to uncover latent features from the data. We can measure their relative strength and reconstruct the ratings only using those with the most predictive power.
Deciding how many features to keep will depend on the particular application but a rule of thumb is to conserve about 90% of the energy in sigma. That is, the sum of the squares in the retained sigma values should be about 90% of the total sum of squares (Leskovec, Rajaraman, & Ullman, 2014).
References
Gorakala, S. K., & Usuelli, M. (2015). Building a recommendation system with R. Retrieved from https://learning.oreilly.com/library/view/building-a-recommendation/9781783554492/
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of Massive Datasets. https://doi.org/10.1017/CBO9781139924801
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 30–37. https://doi.org/10.1109/MC.2009.263