Implementing Low-Rank Matrix Factorization with Alternating Least Squares Optimization for Collaborative Filtering Recommender System in R

In this article, the low rank matrix facotrization will be used to predict the unrated movie ratings for the users from MovieLense (100k) dataset (given that each user has rated some of the movies). The data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998. The data set contains about 100,000 ratings (1-5) from 943 users on 1664 movies.

Each cell \(R_{\{i,j\}}\) of the movie rating matrix contains the Rating given to the \(i^{th}\) movie by the \(j^{th}\) user (if any, otherwise NA), so that the rows represent the movies and the columns represents the users.
The intuition behind using matrix factorization is that there must be some latent features (much smaller than the number of users and the number of movies) that determine how a user rates a movie. We have a set \(n_m\) of movies and a set \(n_u\) of users and a rating matrix \(R\) of size \(n_m \times n_u\) be the matrix that contains all the ratings that the movies are assigned by the users. We are interested to discover \(n\) latent features (\(n=10\) latent features are used). Then our task, then, is to find two matrics \(\theta\) (a \(n_m \times n\) matrix) and \(X\) (a \(n_u \times n\) matrix) such that their product approximates \(R\), such that the error for the users/movie pairs where we know the correct ratings is minimized.
The couple of optimization problems are to be solved that can be done in two different ways:
- Solve with Alternate Least Squares Method (appears as an assigment in the BerkeleyX edX Course on Big Data Analysis with Spark): The ALS algorithm first randomly fills the users matrix with values and then optimizes the value of the movies such that the error is minimized. Then, it holds the movies matrix constant and optimizes the value of the user’s matrix. This alternation between which matrix to optimize is the reason for the “alternating” in the name. Given a fixed set of user factors (i.e., values in the users matrix), we use the known ratings to find the best values for the movie factors using the optimization written at the bottom of the figure. Then we “alternate” and pick the best user factors given fixed movie factors. The next figure shows the outline of this algorithm.
- Simultaneous optimization Method (from the Coursera ML Course by Stanford Prof. Andrew Ng.): By combining the two optimization problems and solving them simultaneously. Both the methods are show in the next figure.
The next figures show the results from both the algorithms: Conjugate Gradient method (with 100 iterations) is used in both cases and both the methods were iteratively called for 20 times, with regularization parameter \(\lambda=10\). Then the RMSE value between the original rating matrix and the estimated rating matrix is computed for both the cases. The following figures show that the Simultaneous Optimization method achieved lower RMSE value immediately, while ALS took some iterations to get to the same RMSE value.

Data File Format

The next figures show how the RMSE changed with the regularization parameter \(\lambda\), for both the methods. As expected, with lower values of \(\lambda\) the RMSE is much lower (due to possible overfitting).

Data File Format

The next figures show the original rating matrix and the estimated rating matrix with ALS matrix factorization (the rating matrix is shown as an adjacency matrix of a neighborhood graph).

Data File Format