Introduction

The goal of this assignment is to implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

Matrix Factorization or matrix decomposition is factorization of that matrix into three matrices. The product of these matrices will produce the original matrix. Matrix Factorization is one of the most popular algorithms to solve co-clustering problem.

Singular Value Decomposition (SVD) is a matrix factorization technique to reduce the dimension of impute data.The idea is to find the direction of maximum variances and retain only those who can considerably explain variation in data. While SVD can achieve a very good result on non-sparse data, in real life SVD doesn’t work so well on data - as real-life data is significantly sparse. SVD function in recommenderLab package uses column-mean as default method for missing values impute. This usually works OK, but results are usually biased.well.

Data loading

Funk SVD - implements matrix decomposition by the stochastic gradient descent optimization popularized by Simon Funk to minimize the error on the known values. Funk SVD ignores the missing values and compute latent factors only using the values we know. Conceptually, it is a simple iterative optimization process that assumes the existence of a cost function and arbitrary initial values for the optimization variables. if the dimensinality of the rating matrix is high, usually the gradient decent does not perform I will be working with the MovieLens dataset.The MovieLens datasets were collected by GroupLens Research at the University of Minnesota. This particular data set has 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. It is the data set the GroupLens group recommends for research and education. It is not a static data set. The data I will be using was last updated in September of 2018.

library(recommenderlab)
library(kableExtra)
library(ggplot2)
library(dplyr)
library(tidyr)
# reading Data
data(MovieLense)
movieMatrix<-MovieLense

data.frame(as(MovieLense, "matrix")[1:5, 1:5]) %>% kable(col.names = colnames(MovieLense)[1:5]) %>% kable_styling(full_width = T)
Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
5 3 4 3 3
4 NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
4 3 NA NA NA

Data Exploration

Let’s look at the distribution of rating through visualization.

# distribution of ratings
MovieLense@data %>% 
  as.vector() %>% 
  as_tibble() %>% 
  filter_all(any_vars(. != 0)) %>% 
  ggplot(aes(value, fill='red')) + 
  geom_bar() +
  labs(title = "Distribution of the ratings", y = "", x = "Ratings") +
  theme_minimal()

Looks like 4 was the most popular rating value.

Modeling

Let’s do the evaluation scheme

# creating evaluation scheme
set.seed(12345)
# 5-fold CV; everything that is above 3 is considered a good rating; 5 neighbours will be find for a given user(item) to make recommendation
eval_scheme<- evaluationScheme(movieMatrix, method = "cross", train = 0.9, given = 5, goodRating = 3, k = 5)

Let’s do the user-based CF evaluation approach with Pearson Coefficient as a similarity measure**

testing user-based CF using centered data and Pearson coefficient as a similarity measure to find neighbours

param_ubcf<-list(normalize="center", method = "Cosine")
result1<- evaluate(eval_scheme, method = "UBCF", type = "ratings", param = param_ubcf)
## UBCF run fold/sample [model time/prediction time]
##   1  [0.02sec/1.51sec] 
##   2  [0.04sec/1.16sec] 
##   3  [0.02sec/1.12sec] 
##   4  [0.01sec/1.03sec] 
##   5  [0.02sec/1sec]
avg(result1)
##        RMSE      MSE       MAE
## res 1.19497 1.428133 0.9394618

testing SVD method using the following parametrs: k = 10, maxiter = 100, normalize = center

param_svd = list(normalize="center", maxiter = 100, k =100)
result2<- evaluate(eval_scheme, method = "SVD",  param = param_svd, type = "ratings")
## SVD run fold/sample [model time/prediction time]
##   1  [1.65sec/0.32sec] 
##   2  [1.71sec/0.28sec] 
##   3  [1.76sec/0.18sec] 
##   4  [1.97sec/0.24sec] 
##   5  [1.73sec/0.31sec]
avg(result2)
##         RMSE      MSE       MAE
## res 1.127554 1.271983 0.8997781

testing funk SVD method using the following paramentrs: k = 10, gamma = 0.015, lambda = 0.001, normalize = center, min_epochs = 50, max_epochs

 # testing funk SVD method using the following paramentrs: k     =  10, gamma    =  0.015, lambda    =  0.001, normalize     =  center, min_epochs   =  50, max_epochs   =  200
param_svdfunk<- list(normalize="center", k = 10, gamma   =  0.015,lambda     =  0.001, min_epochs    =  50, max_epochs   =  200)
result3<- evaluate(eval_scheme, method = "SVDF", type = "ratings", param = param_svdfunk)
## SVDF run fold/sample [model time/prediction time]
##   1  [92.41sec/17.16sec] 
##   2  [95.44sec/16.25sec] 
##   3  [107.05sec/19.33sec] 
##   4  [115.39sec/18.21sec] 
##   5  [92.21sec/16.37sec]
avg(result3)
##         RMSE      MSE      MAE
## res 1.079125 1.165111 0.854065

Models Performance summary:

model1<-cbind(RMSE=avg(result1))
model2<-cbind(RMSE=avg(result2))
model3<-cbind(RMSE=avg(result3))
summary = rbind(model1, model2, model3)
rownames(summary) <- c("UBCF","SVD","FUNK SVD")
summary
##              RMSE      MSE       MAE
## UBCF     1.194970 1.428133 0.9394618
## SVD      1.127554 1.271983 0.8997781
## FUNK SVD 1.079125 1.165111 0.8540650
#summary %>% kable(col.names = c("UBCF","SVD","FUNK SVD")) %>% kable_styling(full_width = F)

Looks like Funk SVD has the lowest RMSE compare to user-based collaboration filtering and SVD methods, with a small difference in comparison.

Let’s look at the ROC curve of all three methods for 5, 10, 15 and 20 recommendations.

algorithms <- list(
  "USER-BASED" = list(name="UBCF", param=list(normalize="center", method = "Cosine")),
  "SVD" = list(name="SVD", param=list(normalize="center", maxiter = 100, k =100)),
  "FUNK SVD" = list(name = "SVDF", param = list(normalize="center", k = 10, gamma    =  0.015,lambda     =  0.001, min_epochs    =  50, max_epochs   =  200))
  
)
results <- evaluate(eval_scheme, algorithms, n=c(5, 10, 15, 20))
## UBCF run fold/sample [model time/prediction time]
##   1  [0.01sec/1.25sec] 
##   2  [0sec/1.28sec] 
##   3  [0.01sec/13.04sec] 
##   4  [0sec/1.17sec] 
##   5  [0.02sec/1.08sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [1.7sec/0.33sec] 
##   2  [1.66sec/0.27sec] 
##   3  [1.65sec/0.2sec] 
##   4  [1.59sec/0.33sec] 
##   5  [1.65sec/0.37sec] 
## SVDF run fold/sample [model time/prediction time]
##   1  [96.75sec/16.59sec] 
##   2  [94.99sec/27.91sec] 
##   3  [99.68sec/18.32sec] 
##   4  [119.87sec/17.53sec] 
##   5  [80.85sec/14.95sec]
plot(results, annotate = 1:4, legend="topleft", main = "ROC")

As we see user-based outperformed SVD and SVD Funk around 5 times in accuracy predictions.

Predicting

Finally let’s build the entire model using SVD method and make recommendations.

# splitting our data to training and testing sets
King_evaluation_scheme<- evaluationScheme(movieMatrix, method = "split", train = 0.9, given = 5, goodRating = 3)
data.training <-getData(King_evaluation_scheme, "train")
data.testing <-getData(King_evaluation_scheme, "unknown")
data.testing.known <- getData(King_evaluation_scheme, "known")

#  building user-based recommendation model
param_final<- list(normalize = "center", maxiter = 100, k =100)
king_model <- Recommender(data.training, method = "SVD", param = param_final)
king_model
## Recommender of type 'SVD' for 'realRatingMatrix' 
## learned using 848 users.
# getting recommendations (top 10)
final_prediction<- predict (king_model, data.testing, n = 10, type = "topNList")
final_prediction@items[1]
## $`2`
##  [1] 181 745 564  15 187 182 222 504 511 420
final_prediction@ratings[1]
## $`2`
##  [1] 4.129448 4.082534 4.051517 4.041946 4.036961 4.025895 4.012446 4.003680
##  [9] 3.991674 3.972872

Conclusion

Collaborative filtering is very slow with respect to SVD. SVD is much faster.

SVD handles the problem of scalability and sparsity posed by Collborative filtering.

However the How and why for the VSD on recommending an item to a user is above what we do here. A whole high statistical mathematics knwledge will be required to demystify the theory behind it.