library(tidyverse)
library(kableExtra)
library(knitr)
library(recommenderlab)
library(dplyr)
library(ggplot2)         
library(ggrepel)         
library(tictoc)

Introduction

The goal of this assignment is give you practice working with Matrix Factorization techniques.

Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else.

Remember as always to cite your sources, so that you can be graded on what you added, not what you found.

SVD can be thought of as a pre-processing step for feature engineering.

You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).

Load the Data

The data set is from MovieLens project and it was downloaded from Movie Lens

ratings <- read.csv(paste0("https://raw.githubusercontent.com/josephsimone/Data-612/master/project_2/Movie_Lens/ratings.csv"))
movies <- read.csv(paste0("https://raw.githubusercontent.com/josephsimone/Data-612/master/project_2/Movie_Lens/movies.csv"))

Convert to Matrix

m_m <- ratings %>%
  select(-timestamp) %>%
  spread(movieId, rating)
row.names(m_m) <- m_m[,1]
m_m <- m_m[-c(1)]
m_m <- as(as.matrix(m_m), "realRatingMatrix")
m_m
## 610 x 9724 rating matrix of class 'realRatingMatrix' with 100836 ratings.

Normalization

norm_films <- normalize(m_m)
avg_rating <- round(rowMeans(norm_films),5)
table(avg_rating)
## avg_rating
##   0 
## 610

Our movie matrix contains 610 users and 9,724 items/movies.

Train and Test Sets

Now we will split our data into train and test sets

set.seed(123)
eval <- evaluationScheme(norm_films, method = "split",
                         train = 0.8, given= 20, goodRating=3)
movie_train <- getData(eval, "train")
movie_known <- getData(eval, "known")
movie_unknown <- getData(eval, "unknown")

First, let’s compare the complexity between a User-Based Collaborative Filtering and a Singular Value Decomposition (SVD) Model.

User-Based Collaborative Filtering

tic("UBCF Model - Training")
UBCF_model <- Recommender(movie_train, method = "UBCF")
## Warning in .local(x, ...): x was already normalized by row!
toc(log = TRUE, quiet = TRUE)
tic("UBCF Model - Predicting")
UBCF_predict <- predict(UBCF_model, newdata = movie_known, type = "ratings")
toc(log = TRUE, quiet = TRUE)
(UBCF_accuracy <- calcPredictionAccuracy(UBCF_predict, movie_unknown) )
##      RMSE       MSE       MAE 
## 0.9041400 0.8174691 0.6956109

Singular Value Decomposition (SVD) Model

When building this SVD Model, it will consists of 50 concepts or categories.

tic("SVD Model - Training")
modelSVD <- Recommender(movie_train, method = "SVD", parameter = list(k = 50))
## Warning in .local(x, ...): x was already normalized by row!
toc(log = TRUE, quiet = TRUE)
tic("SVD Model - Predicting")
predSVD <- predict(modelSVD, newdata = movie_known, type = "ratings")
toc(log = TRUE, quiet = TRUE)
( accSVD <- calcPredictionAccuracy(predSVD, movie_unknown) )
##      RMSE       MSE       MAE 
## 0.9069787 0.8226103 0.6985789

At first glance, the difference between the SVD and UBCF Models are very similar.

Now comparing the run-time complexities.

Run-Time

Let’s explore the models’ log displays to to better understand their complexities..

log <- as.data.frame(unlist(tic.log(format = TRUE)))
colnames(log) <- c("Run Time")
knitr::kable(log, format = "html") %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Run Time
UBCF Model - Training: 0.07 sec elapsed
UBCF Model - Predicting: 8.05 sec elapsed
SVD Model - Training: 3.39 sec elapsed
SVD Model - Predicting: 1.52 sec elapsed

One major difference between SVD and UBCF Model is their run-times.

While the UBCF takes less time to build a model, it is more resource intensive in making predictions.

Evaluation

Let’s evaluate our predictions by seeing the prediction matrix of a specific user.

In this particular case, the \(3^{rd}\) User from this DataSet.

movie_rating <- as.data.frame(m_m@data[c("3"), ]) 
colnames(movie_rating) <- c("Rating")
movie_rating$movieId <- as.integer(rownames(movie_rating))
movie_rating <- movie_rating %>% filter(Rating != 0) %>% 
  inner_join (movies, by="movieId") %>%
  arrange(Rating) %>%
  select(Movie = "title", Rating)
knitr::kable(movie_rating, format = "html") %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Movie Rating
Dangerous Minds (1995) 0.5
Schindler’s List (1993) 0.5
Courage Under Fire (1996) 0.5
Operation Dumbo Drop (1995) 0.5
Wallace & Gromit: The Best of Aardman Animation (1996) 0.5
My Fair Lady (1964) 0.5
Doors, The (1991) 0.5
On Golden Pond (1981) 0.5
Deer Hunter, The (1978) 0.5
Patton (1970) 0.5
Field of Dreams (1989) 0.5
Bambi (1942) 0.5
Lady and the Tramp (1955) 0.5
Rescuers, The (1977) 0.5
You’ve Got Mail (1998) 0.5
Fast Times at Ridgemont High (1982) 0.5
Requiem for a Dream (2000) 0.5
Snow Dogs (2002) 0.5
Green Card (1990) 0.5
2012 (2009) 0.5
Tron (1982) 2.0
Star Trek: The Motion Picture (1979) 3.0
Highlander (1986) 3.5
Thing, The (1982) 4.0
Conan the Barbarian (1982) 4.5
Piranha (1978) 4.5
Looker (1981) 4.5
Master of the Flying Guillotine (Du bi quan wang da po xue di zi) (1975) 4.5
Clonus Horror, The (1979) 4.5
Escape from L.A. (1996) 5.0
Saturn 3 (1980) 5.0
Road Warrior, The (Mad Max 2) (1981) 5.0
The Lair of the White Worm (1988) 5.0
Hangar 18 (1980) 5.0
Galaxy of Terror (Quest) (1981) 5.0
Android (1982) 5.0
Alien Contamination (1980) 5.0
Death Race 2000 (1975) 5.0
Troll 2 (1990) 5.0

As we see that \(3^{rd}\) user movie likes comes under action , horror & some animation.

On the other hand, the genres rated romantic & dramatic film genres very low.

SVD Model for 3rd user

Exploring the movies suggested by SVD for the \(3^{rd}\) user.

recommend_movie <- as.data.frame(predSVD@data[c("3"), ]) 
colnames(recommend_movie) <- c("Rating")
recommend_movie$movieId <- as.integer(rownames(recommend_movie))
recommend_movie <- recommend_movie %>% arrange(desc(Rating)) %>% head(6) %>% 
  inner_join (movies, by="movieId") %>%
  select(Movie = "title")
knitr::kable(recommend_movie, format = "html") %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Movie
Dangerous Minds (1995)
Courage Under Fire (1996)
Operation Dumbo Drop (1995)
Wallace & Gromit: The Best of Aardman Animation (1996)
Escape from L.A. (1996)
My Fair Lady (1964)

When analyzing top 6 movies being recommended to the 3rd user , we see that they also are action, horror and animation genre movie categories.

Summary

User-Based Collaborative Filtering:

There are several problems that can occure during a USCF. First, is in regards for scalability. The computations increasingly grows with the amount customers and the products.

Singular Value Decomposition:

When running a SVD Model, this decreases the dimension of the matrix by extracting latent factors. Thereofre, this model can handle the problems of scalability & sparsity.

However, SVD is not still not a perfect model. One of the drawbacks being there is are no clear reasoning as to why the recommendation was made to a user. This can become problematic if the user wants to know why this recommendation has occured.

Appendix

Rcode: Github

Project Repo: Github