In this article, the low rank matrix facotrization will be used to predict the unrated movie ratings for the users from MovieLense (100k) dataset (given that each user has rated some of the movies). The data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998. The data set contains about 100,000 ratings (1-5) from 943 users on 1664 movies.
Each cell \(R_{\{i,j\}}\) of the movie rating matrix contains the Rating given to the \(i^{th}\) movie by the \(j^{th}\) user (if any, otherwise NA), so that the rows represent the movies and the columns represents the users.
The intuition behind using matrix factorization is that there must be some latent features (much smaller than the number of users and the number of movies) that determine how a user rates a movie. We have a set \(n_m\) of movies and a set \(n_u\) of users and a rating matrix \(R\) of size \(n_m \times n_u\) be the matrix that contains all the ratings that the movies are assigned by the users. We are interested to discover \(n\) latent features (\(n=10\) latent features are used). Then our task, then, is to find two matrics \(\theta\) (a \(n_m \times n\) matrix) and \(X\) (a \(n_u \times n\) matrix) such that their product approximates \(R\), such that the error for the users/movie pairs where we know the correct ratings is minimized.
The couple of optimization problems are to be solved that can be done in two different ways:
The next figures show the results from both the algorithms: Conjugate Gradient method (with 100 iterations) is used in both cases and both the methods were iteratively called for 20 times, with regularization parameter \(\lambda=10\). Then the RMSE value between the original rating matrix and the estimated rating matrix is computed for both the cases. The following figures show that the Simultaneous Optimization method achieved lower RMSE value immediately, while ALS took some iterations to get to the same RMSE value.