Instruction

In this assignment, we will implement a matrix factorization method - such as singular value decomposition (SVD) or Alternating Least Squares (ALS) - in the context of a recommender system by starting with an existing recommender system written by ourselves.

Introduction

In this project, we will implement a matrix factorization method, singular value decomposition (SVD), in the context of a recommender system, and compare its performance with the User-Based Collaborative Filtering (UBCF) model and Item-Based Collaborative Filtering (IBCF) model evaluated with different approaches from our previous assignment.

From the brief explanation of SVD by Standford University [https://youtu.be/yLdOS6xyM_Q], in linear algebra, the singular value decomposition (SVD) is a factorization of a matrix that generalizes the eigendecomposition of a square normal matrix to any \(m*n\) matrix via an extension of the polar decomposition, which requires no missing values. The singular value decomposition of an \(m*n\) matrix \(A\) is a factorization of the form \(U \Sigma V^T\), i.e. \(A = U \Sigma V^T\), where \(U\) is an \(m*r\) orthogonal matrix, \(\Sigma\) is an \(r*r\) diagonal matrix with the \(\sigma_{ii}\) values in descending order, and \(V\) is an \(n*r\) orthogonal matrix, where \(r \le min(m,n)\) is the rank of the original matrix \(A\).

From the dimensionality reduction introduced by Standford University [https://youtu.be/c7e-D2tmRE0], SVD gives the best low rank approximation matrix \(B = USV^T\) from matrix \(A\) by keeping 80% to 90% of the sum of the squares of the singular values (\(\sum \sigma_{ii}^2\)), where \(S\) is a diagonal \(r*r\) matrix with its diagonal values \(s_{ii} = \sigma_{ii} (i=1...k)\) else \(s_{ii}=0\), which implies the formula \[\frac {\sum_{i=1}^{k}\sigma_{ii}^2} {\sum_{i=1}^{r} \sigma_{ii}^2} \approx 0.9\].

We will use one of the Jester datasets for Recommender Systems and Collaborative Filtering Research by Ken Goldberg, AUTOLab, UC Berkeley [http://eigentaste.berkeley.edu/dataset/] as our raw dataset.

Read Data

Only the first five rows of data are displayed in order to compress the size of the output file.

Data Exploration

The dataset is a matrix with dimensions 24,938 rows x 101 columns. It contains ratings from 24,938 users who have rated between 15 and 35 jokes ranging from -10.00 to +10.00, with ‘99’ corresponds to ‘null’ value. One row represents one user, while the first column gives the number of jokes rated by that user. The next 100 columns give the ratings for jokes 01-100. As the dataset is relatively large, we will randomly select 10,000 rows from the original dataset for our studies below. The first 5 rows are displayed.

After replacing the ‘99’ values with ‘NA’, we got the summary statistics of the User_Rating_Cnt and Rating after eliminating the ‘NA’ values. From the summary statistics shown below, we can see that the users have rated about 26.05 jokes in average with range (15,35) and the ratings are in the range of (-9.95, 10) with mean 0.2863.

##  User_Rating_Cnt     Rating       
##  Min.   :15.00   Min.   :-9.9500  
##  1st Qu.:21.00   1st Qu.:-4.2200  
##  Median :26.00   Median : 0.8700  
##  Mean   :26.05   Mean   : 0.2863  
##  3rd Qu.:31.00   3rd Qu.: 4.7600  
##  Max.   :35.00   Max.   :10.0000

The histogram also shows the distributions of the ratings of our dataset. It also shows the mode rating of the dataset as -0.29.

Handle the Missing Values

As singular value decomposition (SVD) requires no missing values from the dataset, we will first handle the missing values with baseline predictors.

By calculating the mean, user biases, and item biases, we can fill in the missing values with baseline predictors and have a full dataset. The first 5 rows are displayed.

We have the summary statistics of ratings of our baseline predictors dataset as below.

##  User_Rating_Cnt     Rating        
##  Min.   :15.00   Min.   :-10.0000  
##  1st Qu.:20.00   1st Qu.: -2.0170  
##  Median :24.00   Median :  0.5522  
##  Mean   :24.72   Mean   :  0.3397  
##  3rd Qu.:30.00   3rd Qu.:  2.9262  
##  Max.   :35.00   Max.   : 10.0000

From the boxplots which showing the user ratings from the original dataset with NA values omitted and the new dataset with NA values being replaced by baseline predictors, we can see that the interquartile range of the new dataset has narrowed down relatively from the original dataset. Also, the distribution of the new dataset is more normal than the original dataset by looking at their histograms of user ratings.

Singular Value Decomposition (SVD)

The singular value decomposition (SVD) of an \(m*n\) matrix \(A\) is a factorization of the form \(U \Sigma V^T\), i.e. \(A = U \Sigma V^T\), where \(U\) is an \(m*r\) orthogonal matrix, \(\Sigma\) is an \(r*r\) diagonal matrix with the \(\sigma_{ii}\) values in descending order, and \(V\) is an \(n*r\) orthogonal matrix, where \(r \le min(m,n)\) is the rank of the original matrix \(A\). In order to apply SVD to our dataset, we will first convert the dataset to matrix format.

## [1] "The rank r of our matrix is: 100"

As the rank 100 equals to our number of columns, we have \(m=10,000\) and \(r=n=100\) for our matrix \(A = U \Sigma V^T\), where \(U\) is an \(m*r\) orthogonal matrix, \(\Sigma\) is an \(n*n\) diagonal matrix with the \(\sigma_{ii}\) values in descending order, and \(V\) is an \(n*n\) orthogonal matrix.

We can then calculate the SVD using the svd function. The singular values \(\sigma_{ii}\) are plotted in a scatterplot below, which shows that the singular values are in descending order.

Dimensionality Reduction

Find k

From the dimensionality reduction introduced by Standford University [https://youtu.be/c7e-D2tmRE0], SVD gives the best low rank approximation matrix \(B = USV^T\) from matrix \(A\) by keeping 80% to 90% of the sum of the squares of the singular values (\(\sum \sigma_{ii}^2\)), where \(S\) is a diagonal \(r*r\) matrix with its diagonal values \(s_{ii} = \sigma_{ii} (i=1...k)\) else \(s_{ii}=0\), which implies the formula \[\frac {\sum_{i=1}^{k}\sigma_{ii}^2} {\sum_{i=1}^{r} \sigma_{ii}^2} \approx 0.9\]

By calculation, the value k that keeps approximately 90% of the sum of the squares of the singular values is 31. The scatterplot below shows the ratios of the changing k to all squares of singular values.

## [1] "The value k is: 31"

Reduce Matrices’ Dimensionality

As we have the value \(k=31\) by calculation, we will reduce the matrices’ dimensionality of \(A = U \Sigma V^T\) by crossing out the (k+1 and after) columns of \(U\), the (k+1 and after) singular values of \(\Sigma\), and the (k+1 and after) rows for \(V^T\) to produce \(B = USV^T\).

Best Low Rank Approximation

From the above section, we used SVD and Dimensionality Reduction to find the best rank-k approximation \(B = USV^T\) to \(A\), with rank(\(B\))=k. In this part, we will generate the matrix \(B\) using the matrices u_k, s_k, and v_k. The dimension of matrix \(B\) should match the dimension of matrix \(A\), which is 10,000 rows x 100 columns.

## [1] 10000   100

Let’s see some summary statistics and the distribution of the predicted matrix \(B\) and compare them with our original dataset w/o NA and the dataset with NA replaced by baseline predictors.

Summary Statistics
Original Dataset
Dataset with Baseline Predictors
Predicted Matrix B
Metrics Value Metrics Value Metrics Value
Min. -9.9500 Min. -10.0000 Min. -10.0000
1st Qu. -4.2200 1st Qu. -2.0170 1st Qu. -1.8845
Median 0.8700 Median 0.5522 Median 0.5120
Mean 0.2863 Mean 0.3397 Mean 0.3396
3rd Qu. 4.7600 3rd Qu. 2.9262 3rd Qu. 2.7566
Max. 10.0000 Max. 10.0000 Max. 10.0000

From the summary statistics and the boxplots below, it is obvious that the predictor matrix \(B\)’s interquartile range has narrowed down relatively from the previous two datasets. Also, the distribution of the new dataset is similar to the dataset with baseline predictors but more normal than it.

Build Recommendation Models with UBCF and IBCF

By splitting our original dataset into train and test datasets, we will implement the User-Based Collaborative Filtering (UBCF) and Item-Based Collaborative Filtering (IBCF) algorithms to the original dataset. We will also use different normalization techniques (centering and Z-score) and similarity measures (Cosine distance, Pearson correlation, and Euclidean distance).

User-Based Collaborative Filtering Models

We will create 6 models of User-Based Collaborative Filtering algorithm by using the Recommender function from the recommenderlab package with two normalization techniques (center and Z-score) and three similarity measures (Cosine distance, Pearson correlation, and Euclidean distance).

After restricting the rating boundary to (-10, 10), we calculate the accuracies of the predictions with the actual ratings given by users. The result is sorted by RMSE in ascending order.

#UBCF models
model_UBCF_CC <- Recommender(data = ui_mtrx_train, method = 'UBCF', parameter = list(normalize = "center", method="Cosine"))
model_UBCF_CP <- Recommender(data = ui_mtrx_train, method = 'UBCF', parameter = list(normalize = "center", method="Pearson"))
model_UBCF_CE <- Recommender(data = ui_mtrx_train, method = 'UBCF', parameter = list(normalize = "center", method="Euclidean"))
model_UBCF_ZC <- Recommender(data = ui_mtrx_train, method = 'UBCF', parameter = list(normalize = "Z-score", method="Cosine"))
model_UBCF_ZP <- Recommender(data = ui_mtrx_train, method = 'UBCF', parameter = list(normalize = "Z-score", method="Pearson"))
model_UBCF_ZE <- Recommender(data = ui_mtrx_train, method = 'UBCF', parameter = list(normalize = "Z-score", method="Euclidean"))

suppress_rating <- function(x, min = -10, max = 10){
  return(pmax(pmin(x, 10),-10))
  }

#predictions with boundaries set
p_UBCF_CC <- predict(model_UBCF_CC, ui_mtrx_test_known, type='ratings')
p_UBCF_CC@data@x <- pmax(pmin(p_UBCF_CC@data@x, 10),-10)

p_UBCF_CP <- predict(model_UBCF_CP, ui_mtrx_test_known, type='ratings') 
p_UBCF_CP@data@x <- pmax(pmin(p_UBCF_CP@data@x, 10),-10)

p_UBCF_CE <- predict(model_UBCF_CE, ui_mtrx_test_known, type='ratings') 
p_UBCF_CE@data@x <- pmax(pmin(p_UBCF_CE@data@x, 10),-10)

p_UBCF_ZC <- predict(model_UBCF_ZC, ui_mtrx_test_known, type='ratings') 
p_UBCF_ZC@data@x <- pmax(pmin(p_UBCF_ZC@data@x, 10),-10)

p_UBCF_ZP <- predict(model_UBCF_ZP, ui_mtrx_test_known, type='ratings') 
p_UBCF_ZP@data@x <- pmax(pmin(p_UBCF_ZP@data@x, 10),-10)

p_UBCF_ZE <- predict(model_UBCF_ZE, ui_mtrx_test_known, type='ratings') 
p_UBCF_ZE@data@x <- pmax(pmin(p_UBCF_ZE@data@x, 10),-10)

#accuracies
UBCF_Model_Metrics <- rbind(
  'UBCF_CC' = calcPredictionAccuracy(p_UBCF_CC, ui_mtrx_test_unknown),
  'UBCF_CP' = calcPredictionAccuracy(p_UBCF_CP, ui_mtrx_test_unknown),
  'UBCF_CE' = calcPredictionAccuracy(p_UBCF_CE, ui_mtrx_test_unknown),
  'UBCF_ZC' = calcPredictionAccuracy(p_UBCF_ZC, ui_mtrx_test_unknown),
  'UBCF_ZP' = calcPredictionAccuracy(p_UBCF_ZP, ui_mtrx_test_unknown),
  'UBCF_ZE' = calcPredictionAccuracy(p_UBCF_ZE, ui_mtrx_test_unknown)
) %>%
  data.frame() %>%
  rownames_to_column('Model') %>%
  arrange(RMSE)

UBCF_Model_Metrics %>%
  mutate_if(is.numeric, ~round(.,6)) %>%
  mutate(RMSE = cell_spec(RMSE, bold  = ifelse(RMSE == min(RMSE),TRUE,FALSE)),
         MSE = cell_spec(MSE, bold  = ifelse(MSE == min(MSE),TRUE,FALSE)),
         MAE = cell_spec(MAE, bold  = ifelse(MAE == min(MAE),TRUE,FALSE))
         ) %>%
  kable(escape = FALSE) %>%
  kable_styling(bootstrap_options = c('striped','bordered'), full_width = FALSE) %>%
  add_header_above(c('Comparison of User-Based Collaborative Filtering Models' = 4)) 
Comparison of User-Based Collaborative Filtering Models
Model RMSE MSE MAE
UBCF_ZE 4.575687 20.93691 3.605343
UBCF_CE 4.586729 21.038087 3.631914
UBCF_ZP 4.601771 21.176299 3.649253
UBCF_CP 4.615412 21.302023 3.675552
UBCF_ZC 4.639071 21.520984 3.671835
UBCF_CC 4.649169 21.614776 3.693617

Item-Based Collaborative Filtering Models

We will then create 6 models of Item-Based Collaborative Filtering algorithm with the same method: by using the Recommender function from the recommenderlab package with two normalization techniques (center and Z-score) and three similarity measures (Cosine distance, Pearson correlation, and Euclidean distance).

After restricting the rating boundary to (-10, 10), we calculate the accuracies of the predictions with the actual ratings given by users. The result is sorted by RMSE in ascending order.

#IBCF models
model_IBCF_CC <- Recommender(data = ui_mtrx_train, method = 'IBCF', parameter = list(normalize = "center", method="Cosine"))

model_IBCF_CP <- Recommender(data = ui_mtrx_train, method = 'IBCF', parameter = list(normalize = "center", method="Pearson"))

model_IBCF_CE <- Recommender(data = ui_mtrx_train, method = 'IBCF', parameter = list(normalize = "center", method="Euclidean"))

model_IBCF_ZC <- Recommender(data = ui_mtrx_train, method = 'IBCF', parameter = list(normalize = "Z-score", method="Cosine"))

model_IBCF_ZP <- Recommender(data = ui_mtrx_train, method = 'IBCF', parameter = list(normalize = "Z-score", method="Pearson"))

model_IBCF_ZE <- Recommender(data = ui_mtrx_train, method = 'IBCF', parameter = list(normalize = "Z-score", method="Euclidean"))


#predictions with boundaries set
p_IBCF_CC <- predict(model_IBCF_CC, ui_mtrx_test_known, type='ratings')
p_IBCF_CC@data@x <- pmax(pmin(p_IBCF_CC@data@x, 10),-10)

p_IBCF_CP <- predict(model_IBCF_CP, ui_mtrx_test_known, type='ratings') 
p_IBCF_CP@data@x <- pmax(pmin(p_IBCF_CP@data@x, 10),-10)

p_IBCF_CE <- predict(model_IBCF_CE, ui_mtrx_test_known, type='ratings') 
p_IBCF_CE@data@x <- pmax(pmin(p_IBCF_CE@data@x, 10),-10)

p_IBCF_ZC <- predict(model_IBCF_ZC, ui_mtrx_test_known, type='ratings') 
p_IBCF_ZC@data@x <- pmax(pmin(p_IBCF_ZC@data@x, 10),-10)

p_IBCF_ZP <- predict(model_IBCF_ZP, ui_mtrx_test_known, type='ratings') 
p_IBCF_ZP@data@x <- pmax(pmin(p_IBCF_ZP@data@x, 10),-10)

p_IBCF_ZE <- predict(model_IBCF_ZE, ui_mtrx_test_known, type='ratings') 
p_IBCF_ZE@data@x <- pmax(pmin(p_IBCF_ZE@data@x, 10),-10)


#accuracies
IBCF_Model_Metrics <- rbind(
  'IBCF_CC' = calcPredictionAccuracy(p_IBCF_CC, ui_mtrx_test_unknown),
  'IBCF_CP' = calcPredictionAccuracy(p_IBCF_CP, ui_mtrx_test_unknown),
  'IBCF_CE' = calcPredictionAccuracy(p_IBCF_CE, ui_mtrx_test_unknown),
  'IBCF_ZC' = calcPredictionAccuracy(p_IBCF_ZC, ui_mtrx_test_unknown),
  'IBCF_ZP' = calcPredictionAccuracy(p_IBCF_ZP, ui_mtrx_test_unknown),
  'IBCF_ZE' = calcPredictionAccuracy(p_IBCF_ZE, ui_mtrx_test_unknown)
) %>%
  data.frame() %>%
  rownames_to_column('Model') %>%
  arrange(RMSE)


IBCF_Model_Metrics %>%
  mutate_if(is.numeric, ~round(.,6)) %>%
  mutate(RMSE = cell_spec(RMSE, bold  = ifelse(RMSE == min(RMSE),TRUE,FALSE)),
         MSE = cell_spec(MSE, bold  = ifelse(MSE == min(MSE),TRUE,FALSE)),
         MAE = cell_spec(MAE, bold  = ifelse(MAE == min(MAE),TRUE,FALSE))
         ) %>%
  kable(escape = FALSE) %>%
  kable_styling(bootstrap_options = c('striped','bordered'), full_width = FALSE) %>%
  add_header_above(c('Comparison of Item-Based Collaborative Filtering Models' = 4))
Comparison of Item-Based Collaborative Filtering Models
Model RMSE MSE MAE
IBCF_CP 4.757738 22.636069 3.598812
IBCF_CE 4.777298 22.82258 3.584753
IBCF_ZP 4.790433 22.94825 3.614372
IBCF_ZE 4.793924 22.981705 3.595296
IBCF_ZC 5.79627 33.596741 4.653197
IBCF_CC 5.821297 33.887494 4.670235

Summary

The barplot below compares the accuracies between our SVD model, 6 UBCF models, and 6 IBCF models. It is sorted by RMSE in ascending order. The lower the RMSE value, the better the performance of the model. Although the UBCF models generally performs better than the IBCF models, our SVD-dimensionality-reduction model (with NA values replaced by baseline predictors) still outperforms all those 12 models, which indicates that SVD-dimensionality-reduction recommender system provides more accurate user rating predictions than collaborative filtering models.