Objective

• Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

• You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else. Remember as always to cite your sources, so that you can be graded on what you added, not what you found.

Introduction

SVD can be thought of as a pre-processing step for feature engineering. You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).

• This project is based on the work done in Project 2

• In this project we will add SVD to further explore the recommender system. I have used the recommenderlab package.

The goal of this assignment is to implement a matrix factorization method such as singular value decomposition (SVD) in the context of a recommender system. We will work with an existing recommender system build by us for previous assignment.

Loading Libraries

library(tidyverse)
library(kableExtra)
library(knitr)
library(recommenderlab)

Overview of dataset

Although there are lots of dataset for recommender systems but for sake of simplicity I will include MovieLens dataset. This data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998. The data set contains about 100,000 ratings (1-5) from 943 users on 1664 movies. Movie metadata is also provided in MovieLenseMeta. It is a built-in dataset in “recommenderlab” library.

# ratings of first five users and five movies
data("MovieLense")
data.frame(as(MovieLense, "matrix")[1:5, 1:5]) %>% 
  kable(col.names = colnames(MovieLense)[1:5]) %>% 
  kable_styling(full_width = T)
Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
5 3 4 3 3
4 NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
4 3 NA NA NA

Data Exploration

In this section, We will explore the dataset by visualizing it through graphs. By looking at the distribution of the ratings we can observer the ratings of 4 has the highest count. We plot the heat map of the rating matrix for easy identification.

# distribution of ratings
MovieLense@data %>% 
  as.vector() %>% 
  as_tibble() %>% 
  filter_all(any_vars(. != 0)) %>% 
  ggplot(aes(value)) + 
  geom_bar() +
  labs(title = "Distribution of the ratings", y = "", x = "Ratings") +
  theme_minimal()
## Warning: Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `tibble::enframe(name = NULL)` instead.
## This warning is displayed once per session.

# heatmap of the rating matrix
image(MovieLense@data, main = "Heat map of the rating matrix")

Data Preparation

In this section we will prepare data for most accurate model. First, we will normalize the data because having users who give high (or low) ratings to all their movies might bias the results. We can remove this effect by normalizing the data in such a way that the average rating of each user is 0. Next, We impute the missing values of each column using the mean of the respective column. Finally, We select the most relevant data about movies that have been viewed only a number of times because otherwise their ratings might be biased because of lack of data. Also users who rated only a few movies, their ratings might be biased as well.

# normalize the data
ratings_movies <- normalize(MovieLense)
rcm <- colMeans(ratings_movies)
ratings_movies <- as(ratings_movies, "matrix")
# imputing missing values
ratings_movies[is.na(ratings_movies)] <- rcm
## Warning in ratings_movies[is.na(ratings_movies)] <- rcm: number of items to
## replace is not a multiple of replacement length
ratings_movies <-  as(ratings_movies, "realRatingMatrix")
# selecting the most relevant data
ratings_movies <- ratings_movies[rowCounts(ratings_movies) > 50, 
                                 colCounts(ratings_movies) > 100]

Feature Engineering

A matrix factorization involves describing a given matrix using its constituent elements. Perhaps the most known and widely used matrix decomposition method is the Singular-Value Decomposition, or SVD. A popular application of SVD is for dimensionality reduction. Data with a large number of features, such as more features (columns) than observations (rows) may be reduced to a smaller subset of features that are most relevant to the prediction problem.

# compute SVD of the rating matrix 
ratings_svd <- as(ratings_movies, "matrix")
ratings_svd <- svd(ratings_svd)

The singular value decomposition of a rating matrix is the factorization of the matrix into the product of three matrices \(A = U\Sigma V^T\), where:
- A is an \(m × n\) matrix (rating matrix)
- U is an \(m × n\) orthogonal matrix (users-concepts similarity matrix)
- \(\Sigma\) is an \(n × n\) diagonal matrix (items-concepts similarity matrix)
- V is an \(n × n\) orthogonal matrix (strength of each concepts)

# get U, V, Sigma
u <- ratings_svd$u
v <- ratings_svd$v
sigma <- ratings_svd$d

We can think of U as columns of concepts or genres, and rows as users. Each cell tells us how strongly or weakly the user corresponds to that concept.

# inspect U
u[1:5, 1:5] %>% 
  kable(col.names = c("Concept 1", "Concept 2", "Concept 3", "Concept 4", "Concept 5")) %>%
  kable_styling(full_width = F)
Concept 1 Concept 2 Concept 3 Concept 4 Concept 5
-0.0317931 0.0367800 -0.0758131 0.0308862 -0.0127029
-0.0294429 0.0431055 0.0077715 0.0425499 -0.0083487
-0.0297605 0.0389936 0.0004914 0.0442306 -0.0016387
-0.0289623 0.0428377 0.0170361 0.0440233 -0.0087771
-0.0304693 0.0392753 -0.0492876 0.0581139 0.0253540

We can think of V as columns of concepts or genres, and rows as items, Each cell tells us how strongly or weakly the item corresponds to that concept.

# inspect v
v[1:5, 1:5] %>% 
  kable(col.names = c("Concept 1", "Concept 2", "Concept 3", "Concept 4", "Concept 5")) %>%
  kable_styling(full_width = F)
Concept 1 Concept 2 Concept 3 Concept 4 Concept 5
-0.0084781 0.0102579 -0.0397447 -0.0051186 0.0090736
0.0282665 0.0162188 0.0055088 0.0052262 0.0136006
0.0219882 -0.0287801 0.0318839 -0.0483796 0.0702317
0.0218338 0.0201228 -0.0573249 0.0222450 0.0024390
0.0234690 -0.0398573 0.0155170 -0.0460988 0.0151125

A limitation of SVD is that features may not map neatly to items. I think we can improve the mapping by sorting the items by group (genre) as seen in the video.

We can think of \(\Sigma\) as strength of concepts. The value is in descending order and every number tells us how strong the concepts are. Based on this value, we can reduce the dimension of our dataset.

# inspect sigma
head(sigma, 5) %>% kable(col.names = "Strength of Sigma") %>% kable_styling(full_width = F)
Strength of Sigma
338.98022
110.20649
85.36032
74.24059
66.56139
# visualize the sigma
plot(sigma)

The best way to reduce the dimensionality of the three matrices is to set the smallest of the singular values to zero. If we set the \(s\) smallest singular values to 0, then we can also eliminate the corresponding \(s\) columns of U and V. A useful rule of thumb is to retain enough singular values to make up 90% of the energy in \(\Sigma\). That is, the sum of the squares of the retained singular values should be at least 90% of the sum of the squares of all the singular values. As we can see below, setting 300 smallest singular values to 0 still retains 94.7% of the energy in \(\Sigma\).

# reduce dimension
a <- sum(sigma ^ 2)
b <- sum((sigma[1:(length(sigma) - 300)])^2)
print(paste0("The energy in sigma after dimension reduction is ", round((b/a)*100, 2), "%"))
## [1] "The energy in sigma after dimension reduction is 94.72%"

Finally, we use the RMSE equation to calculate the difference between the new matrix and the original matrix.

# reduced matrix
u1 <- as.matrix(u[, 1:643])
d1 <- as.matrix(diag(sigma)[1:643, 1:643])
v1 <- as.matrix(v[, 1:643])
ratings_reduced <- u1 %*% d1 %*% t(v1)
# rmse of the matrices
sqrt(mean((ratings_reduced - as(ratings_movies, "matrix"))^2, na.rm = T))
## [1] 0.1561126