Data 612 Project 3 | Matrix Factorization Methods

Assignment Instructions
Introduction
Data Manipulation
Model Evaluation

Assignment Instructions

Your task is to implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else. Remember as always to cite your sources, so that you can be graded on what you added, not what you found.

SVD can be thought of as a pre-processing step for feature engineering. You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).

Introduction

For this assignment, I am going to expand upon the MovieLense recommender system covered in Project 2, and compare the error rates of both user based collborative filtering, and item based collaborative filtering against data fitted to the SVN model.

Data Manipulation

Load the MovieLense dataset

set.seed(150)
data(MovieLense)
show(MovieLense)

## 943 x 1664 rating matrix of class 'realRatingMatrix' with 99392 ratings.

Select only users who have rated at least 50 movies, and movies that have been watched at least 100 times

movie_ratings <- MovieLense[rowCounts(MovieLense) > 50, colCounts(MovieLense) > 100]
movie_ratings <- as.matrix(movie_ratings@data)

Convert 0 values (missing data) to NA values, and then replace the NULL values with row mean values

movie_ratings[movie_ratings == 0] <- NA
value <- which(is.na(movie_ratings), arr.ind = TRUE)
movie_ratings[value] <- rowMeans(movie_ratings, na.rm = TRUE)[value[,1]]

Split the data into training and test sets

train <- sample(x = c(TRUE, FALSE), size = nrow(movie_ratings), replace = TRUE, prob = c(0.8, 0.2))
training_data <- movie_ratings[train, ]
test_data <- movie_ratings[!train, ]

print(nrow(training_data))

## [1] 449

print(nrow(test_data))

## [1] 111

Normalize the data

normalized_data <- normalize(movie_ratings)

Model Evaluation

movie_data <- as(normalized_data, 'realRatingMatrix')

e <- evaluationScheme(movie_data, method = 'split', train = 0.9, given = 15, goodRating = 3, k = 10)

ubcf <- Recommender(getData(e, 'train'), 'UBCF')
ibcf <- Recommender(getData(e, 'train'), 'IBCF')
svd_model <- Recommender(getData(e, 'train'), 'SVD')

predict_ubcf <- predict(ubcf, getData(e, 'known'), type = 'ratings')
predict_ibcf <- predict(ibcf, getData(e, 'known'), type = 'ratings')
predict_svd <- predict(svd_model, getData(e, 'known'), type = 'ratings')

ubcf_error <- calcPredictionAccuracy(predict_ubcf, getData(e, 'unknown'))
ibcf_error <- calcPredictionAccuracy(predict_ibcf, getData(e, 'unknown'))
svd_error <- calcPredictionAccuracy(predict_svd, getData(e, 'unknown'))

error <- rbind(ubcf_error, ibcf_error, svd_error)
rownames(error) <- c('UBCF','IBCF', 'SVD')

error

##          RMSE      MSE       MAE
## UBCF 1.029016 1.058874 0.6044413
## IBCF 1.294613 1.676023 0.7280877
## SVD  1.026054 1.052788 0.5576620

Prediction Error Comparision

barplot(error,
        main = "Prediction Error Comparison",
        ylab = 'Error Rate',
        xlab = 'Error Type',
        col=c('RoyalBlue','Tomato', 'YellowGreen'),
        legend = rownames(error),
        beside = TRUE)

Summary

Singular Value Decomposition (SVD), is a matrix decomposition method that reduces a matrix to its constituent parts to make matrix calculations simpler. As per the above results, we can see that The recommender system using the SVN model has a slightly lower RMSE, MSE, and MAE than the unaltered systems, but it is only marginal. This may be due to an error in calculations somewhere along the line, or the fact that the dataset is relatively small.