DATA 612 - Project 3

PROJECT 3 - Matrix Factorization Methods

The goal of this assignment is give you practice working with Matrix Factorization techniques. Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else. Remember as always to cite your sources, so that you can be graded on what you added, not what you found.

SVD can be thought of as a pre-processing step for feature engineering. You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).

Load Data

For this project, I'll still be using the dataset MovieLense.

data("MovieLense")

dim(MovieLense@data)

## [1]  943 1664

Explore and Clean Ratings

ratings <- as.vector(MovieLense@data)
unique(ratings)

## [1] 5 4 0 3 1 2

table_ratings <- table(ratings)
table_ratings

## ratings
##       0       1       2       3       4       5 
## 1469760    6059   11307   27002   33947   21077

#remove 0s since these are missing data
ratings <- ratings[ratings != 0]
ratings <- factor(ratings)

Prepare and Normalize the Recommender Model

working_data <- MovieLense[rowCounts(MovieLense) > 100, colCounts(MovieLense) > 100]

#Normalize the data 
working_data <- normalize(working_data)

Split Data Into Training and Test Sets

# split training and test data
set.seed(100)

train_index <- evaluationScheme(working_data, method = "split", train = 0.8, given = -1, goodRating = 4, k = 20)


# set train and test sets  

train <- getData(train_index, "train")
test <- getData(train_index, "known")

# Prepare evaluation set**
evaluation <- getData(train_index, "unknown")

IBCF - item-based collaborative filtering.

#create IBCF recommender
rec_IBCF <- Recommender(data = train, method = 'IBCF', parameter = NULL)

#predict
predict_IBCF <- predict(object = rec_IBCF, newdata = test, n = 5, type = "ratings")

calcPredictionAccuracy(x = predict_IBCF, data = evaluation, byUser = FALSE)

##      RMSE       MSE       MAE 
## 0.9927537 0.9855599 0.7845723

UBCF - user-based collaborative filtering.

#create UBCF recommender
rec_UBCF <- Recommender(data = train, method = 'UBCF', parameter = NULL)

#predict
predict_UBCF <- predict(rec_UBCF, newdata = test, n = 5, type = "ratings") 

calcPredictionAccuracy(x = predict_UBCF, data = evaluation, byUser = FALSE)

##      RMSE       MSE       MAE 
## 0.8865394 0.7859521 0.7166813

Build SVD Model

SVD_Model <- Recommender(train, method = "SVD", parameter = list(k = 20))

# Making Predictions with newdata = test
SVD_Predict <- predict(SVD_Model, newdata = test, type = "ratings")

# Model Evaluation - User Based
calcPredictionAccuracy(x = SVD_Predict, data = evaluation, byUser = FALSE)

##      RMSE       MSE       MAE 
## 0.8726825 0.7615747 0.6958724

Summary:

The Singular-Value Decomposition (SVD), is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations are simpler.

The SVD uses dimensional reduction which can minimize the problem of overfitting to provide robust and compact representations of the data.

As we can see from the resulting calculation of the RMSE, SVD is slighly better than UBCF which is much better than IBCF.

Some of the disadvantages of using SVD is that, it may be unduly expensive computationally, and since it operates on a fixed matrix, it is not amenable to problems that require adaptive algorithms.

References:

Brownlee, Jason. How to Calculate the SVD from Scratch with Python.2020 Machine Learning Mastery Pty. Ltd. https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/

Sonia Leach. Singular Value Decomposition - A Primer. Department of Computer Science. Brown University, Providence RI. https://pdfs.semanticscholar.org/2478/bffc2ac484695d6ce263cd36014c968e4909.pdf