1 - Description
MovieLens is a data set that contains ratings from the MovieLense website (http://movielens.org). This data set is broken into a number of different sizes for research purposes. We will b using the small data set containing 100,000 ratings applied to 9,000 movies by 700 users.
Using the MovieLense data set we will construct a simple recommender system to recommend movies to users. This system will be built using user based collaborative filtering using both hand-built algorithms and the recomenderlab package.
2 - DataSet
The MovieLens data is broken into two data sets that we are interested in using. The first is the movies data that contains the movie ID, the title, and genres. The second is the user ratings data that contains the user ID, the movie ID, and the rating, and a time-stamp in UNIX time. We will drop the time-stamp data for this project.
Reading in the required data sets.
Lets take a quick look at the data that we have loaded.
3 - Building the Recommender by Hand
The first thing that we need to do is construct the functions that we will use to generate recommendations. We will be using the cosine similarity between users to generate our recommendations so we first build a cosine similarity function. We then build our function to generate the recommendations. This function takes the similarity matrix generated by the cosine similarity function, the movies and ratings data, and info on the user, number of recommendations requested, and the number of nearest neighbors to use.
4 - Recommender with Normalization
We first try to build a recommendation system using normalization. There are some issues that we will see below. The first is the meaning of zero. In order for R to calculate the matrix multiplication used in our similarity function we need to fill the NA’s as 0. If we don’t we get a matrix with diagonals of 0 and the rest of the element is NA. However once normalized 0 means a centered review and this leads to issues with the list of movies that get recommended.
4.1 - Converting the Pairwise Ratings into a Sparse Matrix with Normalization
To calculate the user based collaborative filter we need to transform the data from our pairwise data to a sparse matrix. We will use the TidyR package to reshape the data.
dim(user_movie_mat)
[1] 671 9066
4.2 - Calculating Similaity using the Cosine Similarity
Now that we have the data loaded into a sparse matrix we want to calculate the similarity using the cosine distance of each of the users. We also plot the similarity values. We see that there may be issues already given that there is not a line indicating each user’s similarity with themselves.

4.3 Recomending a Movie Based on the Cosine Similarity with Normalization
We will now use the recommender and the normalized similarity matrix to recommend a set of 10 movies to user 100 using the 30 nearest neighbors.
norm_recs
[,1]
[1,] "Young Poisoner's Handbook, The (1995)"
[2,] "Addams Family Values (1993)"
[3,] "Contact (1997)"
[4,] "No Holds Barred (1989)"
[5,] "National Velvet (1944)"
[6,] "Dracula: Dead and Loving It (1995)"
[7,] "Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)"
[8,] "Friday (1995)"
[9,] "Misérables, Les (1995)"
[10,] "Screamers (1995)"
5 - Building a Recommendation System Using Non-Normalized Data
Given that we ran into a potential issue with the normalized data we will build out the recommender using the same data and functions with normalizing the data first.
5.1 - Generating the Similarity Matrix
We first need to recreate the user matrix without normalization.
5.2 - Generating the Cosine Similarities
Now that we have our non-normalized matrix we will generate the cosine similarities. We see that plot of the similarity matrix also lacks any real patterns and not surprisingly we get a different set of movies recommended to the user.

5.3 - Non-Normalized Movie Recommendations
We note that this is a different set then
recs
[,1]
[1,] "Crossing Guard, The (1995)"
[2,] "Murder in the First (1995)"
[3,] "On Golden Pond (1981)"
[4,] "Howling, The (1980)"
[5,] "Fifth Element, The (1997)"
[6,] "Hot Lead and Cold Feet (1978)"
[7,] "Metroland (1997)"
[8,] "Morning After, The (1986)"
[9,] "Sister Act 2: Back in the Habit (1993)"
[10,] "Where the Money Is (2000)"
6 - Building A Recommendation System Using recomenderlab
For the final portion of the project we will recommenderlab package to build a recommendation for users. The process is relatively straight forward as the package takes the sparse user-review matrix, converts it to a real rating matrix, and then builds the recommender model based on this information.
movie_lab <- function(user_mat, movies, user, num_recs = 10, neigh = 10){
#Convert rating matrix into a recommenderlab sparse matrix
user_mat <- as(user_mat, "realRatingMatrix")
#Create Recommender Model. "UBCF" stands for User-Based Collaborative Filtering
recommender_model <- Recommender(user_mat,
method = "UBCF",
param=list(method="Cosine",nn = neigh))
recom <- predict(recommender_model,
user_mat[user],
n=num_recs) #Obtain top 10 recommendations for 1st user in dataset
recom_list <- as(recom, "list") #convert recommenderlab object to readable list
recom_result <- matrix(0,num_recs)
for (i in c(1:num_recs)){
recom_result[i] <- movies$title[as.integer(recom_list[[1]][i])]
}h
Error: unexpected symbol in:
" recom_result[i] <- movies$title[as.integer(recom_list[[1]][i])]
}h"
6.1 - Recommenderlab Movie Recommendations
Using the function above we can now generate the top to recommendation for user 100 using the 30 nearest neighbors. We see that these recommendations are once again different from the ones that we generated by hand above.
lab_recs
[,1]
[1,] "Scarlet Letter, The (1926)"
[2,] "Only You (1994)"
[3,] "Crooklyn (1994)"
[4,] "Substitute, The (1996)"
[5,] "Mary Reilly (1996)"
[6,] "Month by the Lake, A (1995)"
[7,] "Top Hat (1935)"
[8,] "Eye for an Eye (1996)"
[9,] "Mighty Aphrodite (1995)"
[10,] "Tales from the Hood (1995)"
