The goal of this assignment is to implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.
Matrix Factorization or matrix decomposition is factorization of that matrix into three matrices. The product of these matrices will produce the original matrix. Matrix Factorization is one of the most popular algorithms to solve co-clustering problem.
Singular Value Decomposition (SVD) is a matrix factorization technique to reduce the dimension of impute data.The idea is to find the direction of maximum variances and retain only those who can considerably explain variation in data. While SVD can achieve a very good result on non-sparse data, in real life SVD doesn’t work so well on data - as real-life data is significantly sparse. SVD function in recommenderLab package uses column-mean as default method for missing values impute. This usually works OK, but results are usually biased.well.
Funk SVD - implements matrix decomposition by the stochastic gradient descent optimization popularized by Simon Funk to minimize the error on the known values. Funk SVD ignores the missing values and compute latent factors only using the values we know. Conceptually, it is a simple iterative optimization process that assumes the existence of a cost function and arbitrary initial values for the optimization variables. if the dimensinality of the rating matrix is high, usually the gradient decent does not perform I will be working with the MovieLens dataset.The MovieLens datasets were collected by GroupLens Research at the University of Minnesota. This particular data set has 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. It is the data set the GroupLens group recommends for research and education. It is not a static data set. The data I will be using was last updated in September of 2018.
library(recommenderlab)
library(kableExtra)
library(ggplot2)
library(dplyr)
library(tidyr)
# reading Data
data(MovieLense)
movieMatrix<-MovieLense
data.frame(as(MovieLense, "matrix")[1:5, 1:5]) %>% kable(col.names = colnames(MovieLense)[1:5]) %>% kable_styling(full_width = T)
Toy Story (1995) | GoldenEye (1995) | Four Rooms (1995) | Get Shorty (1995) | Copycat (1995) |
---|---|---|---|---|
5 | 3 | 4 | 3 | 3 |
4 | NA | NA | NA | NA |
NA | NA | NA | NA | NA |
NA | NA | NA | NA | NA |
4 | 3 | NA | NA | NA |
Let’s look at the distribution of rating through visualization.
# distribution of ratings
MovieLense@data %>%
as.vector() %>%
as_tibble() %>%
filter_all(any_vars(. != 0)) %>%
ggplot(aes(value, fill='red')) +
geom_bar() +
labs(title = "Distribution of the ratings", y = "", x = "Ratings") +
theme_minimal()
Looks like 4 was the most popular rating value.
Let’s do the evaluation scheme
# creating evaluation scheme
set.seed(12345)
# 5-fold CV; everything that is above 3 is considered a good rating; 5 neighbours will be find for a given user(item) to make recommendation
eval_scheme<- evaluationScheme(movieMatrix, method = "cross", train = 0.9, given = 5, goodRating = 3, k = 5)
testing user-based CF using centered data and Pearson coefficient as a similarity measure to find neighbours
param_ubcf<-list(normalize="center", method = "Cosine")
result1<- evaluate(eval_scheme, method = "UBCF", type = "ratings", param = param_ubcf)
## UBCF run fold/sample [model time/prediction time]
## 1 [0.02sec/1.51sec]
## 2 [0.04sec/1.16sec]
## 3 [0.02sec/1.12sec]
## 4 [0.01sec/1.03sec]
## 5 [0.02sec/1sec]
avg(result1)
## RMSE MSE MAE
## res 1.19497 1.428133 0.9394618
testing SVD method using the following parametrs: k = 10, maxiter = 100, normalize = center
param_svd = list(normalize="center", maxiter = 100, k =100)
result2<- evaluate(eval_scheme, method = "SVD", param = param_svd, type = "ratings")
## SVD run fold/sample [model time/prediction time]
## 1 [1.65sec/0.32sec]
## 2 [1.71sec/0.28sec]
## 3 [1.76sec/0.18sec]
## 4 [1.97sec/0.24sec]
## 5 [1.73sec/0.31sec]
avg(result2)
## RMSE MSE MAE
## res 1.127554 1.271983 0.8997781
testing funk SVD method using the following paramentrs: k = 10, gamma = 0.015, lambda = 0.001, normalize = center, min_epochs = 50, max_epochs
# testing funk SVD method using the following paramentrs: k = 10, gamma = 0.015, lambda = 0.001, normalize = center, min_epochs = 50, max_epochs = 200
param_svdfunk<- list(normalize="center", k = 10, gamma = 0.015,lambda = 0.001, min_epochs = 50, max_epochs = 200)
result3<- evaluate(eval_scheme, method = "SVDF", type = "ratings", param = param_svdfunk)
## SVDF run fold/sample [model time/prediction time]
## 1 [92.41sec/17.16sec]
## 2 [95.44sec/16.25sec]
## 3 [107.05sec/19.33sec]
## 4 [115.39sec/18.21sec]
## 5 [92.21sec/16.37sec]
avg(result3)
## RMSE MSE MAE
## res 1.079125 1.165111 0.854065
model1<-cbind(RMSE=avg(result1))
model2<-cbind(RMSE=avg(result2))
model3<-cbind(RMSE=avg(result3))
summary = rbind(model1, model2, model3)
rownames(summary) <- c("UBCF","SVD","FUNK SVD")
summary
## RMSE MSE MAE
## UBCF 1.194970 1.428133 0.9394618
## SVD 1.127554 1.271983 0.8997781
## FUNK SVD 1.079125 1.165111 0.8540650
#summary %>% kable(col.names = c("UBCF","SVD","FUNK SVD")) %>% kable_styling(full_width = F)
Looks like Funk SVD has the lowest RMSE compare to user-based collaboration filtering and SVD methods, with a small difference in comparison.
algorithms <- list(
"USER-BASED" = list(name="UBCF", param=list(normalize="center", method = "Cosine")),
"SVD" = list(name="SVD", param=list(normalize="center", maxiter = 100, k =100)),
"FUNK SVD" = list(name = "SVDF", param = list(normalize="center", k = 10, gamma = 0.015,lambda = 0.001, min_epochs = 50, max_epochs = 200))
)
results <- evaluate(eval_scheme, algorithms, n=c(5, 10, 15, 20))
## UBCF run fold/sample [model time/prediction time]
## 1 [0.01sec/1.25sec]
## 2 [0sec/1.28sec]
## 3 [0.01sec/13.04sec]
## 4 [0sec/1.17sec]
## 5 [0.02sec/1.08sec]
## SVD run fold/sample [model time/prediction time]
## 1 [1.7sec/0.33sec]
## 2 [1.66sec/0.27sec]
## 3 [1.65sec/0.2sec]
## 4 [1.59sec/0.33sec]
## 5 [1.65sec/0.37sec]
## SVDF run fold/sample [model time/prediction time]
## 1 [96.75sec/16.59sec]
## 2 [94.99sec/27.91sec]
## 3 [99.68sec/18.32sec]
## 4 [119.87sec/17.53sec]
## 5 [80.85sec/14.95sec]
plot(results, annotate = 1:4, legend="topleft", main = "ROC")
As we see user-based
outperformed SVD
and SVD Funk
around 5 times in accuracy predictions.
Finally let’s build the entire model using SVD method and make recommendations.
# splitting our data to training and testing sets
King_evaluation_scheme<- evaluationScheme(movieMatrix, method = "split", train = 0.9, given = 5, goodRating = 3)
data.training <-getData(King_evaluation_scheme, "train")
data.testing <-getData(King_evaluation_scheme, "unknown")
data.testing.known <- getData(King_evaluation_scheme, "known")
# building user-based recommendation model
param_final<- list(normalize = "center", maxiter = 100, k =100)
king_model <- Recommender(data.training, method = "SVD", param = param_final)
king_model
## Recommender of type 'SVD' for 'realRatingMatrix'
## learned using 848 users.
# getting recommendations (top 10)
final_prediction<- predict (king_model, data.testing, n = 10, type = "topNList")
final_prediction@items[1]
## $`2`
## [1] 181 745 564 15 187 182 222 504 511 420
final_prediction@ratings[1]
## $`2`
## [1] 4.129448 4.082534 4.051517 4.041946 4.036961 4.025895 4.012446 4.003680
## [9] 3.991674 3.972872
Collaborative filtering is very slow with respect to SVD. SVD is much faster.
SVD handles the problem of scalability and sparsity posed by Collborative filtering.
However the How and why for the VSD on recommending an item to a user is above what we do here. A whole high statistical mathematics knwledge will be required to demystify the theory behind it.