The goal of this assignment is give you practice working with Matrix Factorization techniques. Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.
You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else. Remember as always to cite your sources, so that you can be graded on what you added, not what you found.
SVD can be thought of as a pre-processing step for feature engineering. You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).
For this project, I'll still be using the dataset MovieLense.
data("MovieLense")
dim(MovieLense@data)
## [1] 943 1664
ratings <- as.vector(MovieLense@data)
unique(ratings)
## [1] 5 4 0 3 1 2
table_ratings <- table(ratings)
table_ratings
## ratings
## 0 1 2 3 4 5
## 1469760 6059 11307 27002 33947 21077
#remove 0s since these are missing data
ratings <- ratings[ratings != 0]
ratings <- factor(ratings)
working_data <- MovieLense[rowCounts(MovieLense) > 100, colCounts(MovieLense) > 100]
#Normalize the data
working_data <- normalize(working_data)
# split training and test data
set.seed(100)
train_index <- evaluationScheme(working_data, method = "split", train = 0.8, given = -1, goodRating = 4, k = 20)
# set train and test sets
train <- getData(train_index, "train")
test <- getData(train_index, "known")
# Prepare evaluation set**
evaluation <- getData(train_index, "unknown")
#create IBCF recommender
rec_IBCF <- Recommender(data = train, method = 'IBCF', parameter = NULL)
#predict
predict_IBCF <- predict(object = rec_IBCF, newdata = test, n = 5, type = "ratings")
calcPredictionAccuracy(x = predict_IBCF, data = evaluation, byUser = FALSE)
## RMSE MSE MAE
## 0.9927537 0.9855599 0.7845723
#create UBCF recommender
rec_UBCF <- Recommender(data = train, method = 'UBCF', parameter = NULL)
#predict
predict_UBCF <- predict(rec_UBCF, newdata = test, n = 5, type = "ratings")
calcPredictionAccuracy(x = predict_UBCF, data = evaluation, byUser = FALSE)
## RMSE MSE MAE
## 0.8865394 0.7859521 0.7166813
SVD_Model <- Recommender(train, method = "SVD", parameter = list(k = 20))
# Making Predictions with newdata = test
SVD_Predict <- predict(SVD_Model, newdata = test, type = "ratings")
# Model Evaluation - User Based
calcPredictionAccuracy(x = SVD_Predict, data = evaluation, byUser = FALSE)
## RMSE MSE MAE
## 0.8726825 0.7615747 0.6958724
The Singular-Value Decomposition (SVD), is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations are simpler.
The SVD uses dimensional reduction which can minimize the problem of overfitting to provide robust and compact representations of the data.
As we can see from the resulting calculation of the RMSE, SVD is slighly better than UBCF which is much better than IBCF.
Some of the disadvantages of using SVD is that, it may be unduly expensive computationally, and since it operates on a fixed matrix, it is not amenable to problems that require adaptive algorithms.
Brownlee, Jason. How to Calculate the SVD from Scratch with Python.2020 Machine Learning Mastery Pty. Ltd. https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/
Sonia Leach. Singular Value Decomposition - A Primer. Department of Computer Science. Brown University, Providence RI. https://pdfs.semanticscholar.org/2478/bffc2ac484695d6ce263cd36014c968e4909.pdf