Assignment

The goal of this assignment is give you practice working with Matrix Factorization techniques.

Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else. Remember as always, to cite your sources, so that you can be graded on what you added, not what you found.

SVD can be thought of as a pre-processing step for feature engineering. You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).

Introduction and Motivation

We will use Singular Value Decomposition to factor the user-item matrix and reduce the dimensionality of the latent factors. Based on the results from our last project, we will use the in-built user-based collaborative filtering system from the recommenderlab package to train and evaluate the data across varying reductions of size \(k\).

Technique

In Singular Value Decomposition (SVD) the user-item matrix is factored into three new matrices: \[A = U \Sigma V^T\]

where:
\(U\) is an \(m \times r\) user-to-concept similarity matrix that measures each user’s affinity to the latent factors (charactersitic)
\(\Sigma\) is an \(r \times r\) concept matrix that describes strength of each latent factor (charactersitic)
\(V\) is an \(r \times n\) item-to-concept similarty matrix that measures each item’s similarity to the latent factors (charactersitic)

The goal will be to reduce \(r\) - the number of latent factors used for developing the ratings to varying sizes of \(k\). For each value of \(k\), we will calculate and compare the RMSE in rating predictions. Ultimately, this reduction of the feature space will reduce the “noisy” components that do not contribute much to the final prediction.

Data Source

We will use the MovieLense dataset from the recommenderlab library. Each row represents a user and each column represents a movie. Since this dataset is very sparse (most users have seen a few of the movies), we will subset the data to include only those users who have rated at least 50 movies and only those movies that are rated by at least 100 users.

Since successful implementation of SVD requires no missing values, we will impute all nulls in the dataset with the median rating for that movie.

Data Cleansing

Before applying SVD to the user-item matrix, we will need to (1) load the data in (2) subset it to relevant records and (3) impute missing values.

Load Data

First, we will load the MovieLense dataset and subset to individuals that have ranked at least 50 movies and movies that have at least 100 ratings. The resulting dataset is a user-item matrix that has 560 users and 332 movies. There are 55,298 ratings, so a number of ratings in the matrix are left blank.

## 560 x 332 rating matrix of class 'realRatingMatrix' with 55298 ratings.

Data Transformation

Next, we will impute the missing values by using the median rating of each movie. Choosing the median keeps the ratings on a discrete scale. We will use the methodology for replacing nulls with the column medians from this stackoverflow post.

## 560 x 332 rating matrix of class 'realRatingMatrix' with 185920 ratings.

We can see that our matrix now contains 185,920 ratings, which means that our imputation successfully replaced all missing values.

Singular Value Decomposition - SVD

We can now perform SVD on our matrix and examine the effect of varying the number of latent factors \((k)\). We want to reduce the matrix to the factors that account for \(90\%\) of the variability in the data. In essence, the final reduced matrix will represent our “predictions” for the ratings. We can examine how good these predictions are by comparing them to the actual user-item matrix.

First, we’ll normalize the data and perform SVD on our matrix. We will also create a table of the variances for each value of \(k\).

Visualize RMSE & Variability

Next, we’ll define a function to calculate the RMSE between the reduced matrix and the actual ratings. For each value of \(k\), we can visualize the RMSE.

As expected, as the number of latent factors increases, the RMSE decreases.

We can also visualize the cumulative variance for increasing dimensionality of the latent factors.

We can see that the number of latent factors accounting for \(90\%\) of the variability lies between 100 and 150. We can verify that the actual value of \(k\) is 137.

## [1] 137

Build Recommender Systems

Now that we’ve performed SVD and identified the number of latent factors that account for \(90\%\) of the variance in the data, we can start to build the recommender systems.

Evaluation

We’ll evaluate our models based on prediction error and total run time.

Prediction Error

We can examine the differences in RMSE, MSE, and MAE between the reduced matrix and the unaltered matrix.

##                 RMSE       MSE       MAE
## error_ubcf 0.5757865 0.3315301 0.3550683
## error_trim 0.5231757 0.2737128 0.3502227

The recommender system using the reduced data has a lower RMSE, MSE, and MAE than the unaltered data.

Run Time

We can also visualize the run time for each recommender system.

Run Time
Reduced Data - Train: 0.1 sec elapsed
Reduced Data - Predict: 0.28 sec elapsed
Original Data - Train: 0.03 sec elapsed
Original Data - Predict: 0.31 sec elapsed

The original data took less time to train on. The predictions took about the same amount of time.

Discussion

Our analysis suggests 1) that as the number of latent factors increases, RMSE decreases, and 2) that SVD, by reducing dimensionality, yields recommender predictions with lower error than that of predictions made using non-reduced data. Future analyses could investigate additional recommenders beyond UBCF as well as factorization using alternating least squares.