Data 612 Project 3 | Matrix Factorization Methods
Assignment Instructions
Your task is to implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.
You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else. Remember as always to cite your sources, so that you can be graded on what you added, not what you found.
SVD can be thought of as a pre-processing step for feature engineering. You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).
Introduction
For this assignment, I am going to expand upon the MovieLense recommender system covered in Project 2, and compare the error rates of both user based collborative filtering, and item based collaborative filtering against data fitted to the SVN model.
Data Manipulation
Load the MovieLense dataset
set.seed(150)
data(MovieLense)
show(MovieLense)## 943 x 1664 rating matrix of class 'realRatingMatrix' with 99392 ratings.
Select only users who have rated at least 50 movies, and movies that have been watched at least 100 times
movie_ratings <- MovieLense[rowCounts(MovieLense) > 50, colCounts(MovieLense) > 100]
movie_ratings <- as.matrix(movie_ratings@data)Convert 0 values (missing data) to NA values, and then replace the NULL values with row mean values
movie_ratings[movie_ratings == 0] <- NA
value <- which(is.na(movie_ratings), arr.ind = TRUE)
movie_ratings[value] <- rowMeans(movie_ratings, na.rm = TRUE)[value[,1]]Split the data into training and test sets
train <- sample(x = c(TRUE, FALSE), size = nrow(movie_ratings), replace = TRUE, prob = c(0.8, 0.2))
training_data <- movie_ratings[train, ]
test_data <- movie_ratings[!train, ]
print(nrow(training_data))## [1] 449
print(nrow(test_data))## [1] 111
Normalize the data
normalized_data <- normalize(movie_ratings)Model Evaluation
movie_data <- as(normalized_data, 'realRatingMatrix')
e <- evaluationScheme(movie_data, method = 'split', train = 0.9, given = 15, goodRating = 3, k = 10)
ubcf <- Recommender(getData(e, 'train'), 'UBCF')
ibcf <- Recommender(getData(e, 'train'), 'IBCF')
svd_model <- Recommender(getData(e, 'train'), 'SVD')
predict_ubcf <- predict(ubcf, getData(e, 'known'), type = 'ratings')
predict_ibcf <- predict(ibcf, getData(e, 'known'), type = 'ratings')
predict_svd <- predict(svd_model, getData(e, 'known'), type = 'ratings')
ubcf_error <- calcPredictionAccuracy(predict_ubcf, getData(e, 'unknown'))
ibcf_error <- calcPredictionAccuracy(predict_ibcf, getData(e, 'unknown'))
svd_error <- calcPredictionAccuracy(predict_svd, getData(e, 'unknown'))
error <- rbind(ubcf_error, ibcf_error, svd_error)
rownames(error) <- c('UBCF','IBCF', 'SVD')
error## RMSE MSE MAE
## UBCF 1.029016 1.058874 0.6044413
## IBCF 1.294613 1.676023 0.7280877
## SVD 1.026054 1.052788 0.5576620
Prediction Error Comparision
barplot(error,
main = "Prediction Error Comparison",
ylab = 'Error Rate',
xlab = 'Error Type',
col=c('RoyalBlue','Tomato', 'YellowGreen'),
legend = rownames(error),
beside = TRUE)Summary
Singular Value Decomposition (SVD), is a matrix decomposition method that reduces a matrix to its constituent parts to make matrix calculations simpler. As per the above results, we can see that The recommender system using the SVN model has a slightly lower RMSE, MSE, and MAE than the unaltered systems, but it is only marginal. This may be due to an error in calculations somewhere along the line, or the fact that the dataset is relatively small.