library(tidyverse)
library(splitstackshape)
library(recommenderlab)

Matrix Factorization Methods

This submission is a continuation of P1 and P2 and utilizes the MovieLens movie ratings data linked from grouplens.org.

This third project of this summer course explores the Matrix Factorization method of Singular Value Decomposition (SVD) and how it can be applied to reduce the amount of stored data needed to make a recommendation preditions.

For reference, a preview of the MovieLens data:

ratings <- read.csv("ratings.csv", header = T)
links <- read.csv("links.csv", header = T, stringsAsFactors = F)
movies <- read.csv("movies.csv", header = T, stringsAsFactors = F) %>% 
  left_join(links, by = "movieId") %>% select(-tmdbId)

head(ratings)

##   userId movieId rating  timestamp
## 1      1       2    3.5 1112486027
## 2      1      29    3.5 1112484676
## 3      1      32    3.5 1112484819
## 4      1      47    3.5 1112484727
## 5      1      50    3.5 1112484580
## 6      1     112    3.5 1094785740

Below I create the vectors for filtering. I include only users with greater than 1000 reviews and movies with more than 2500 revies.

users <- ratings %>% select(userId, rating) %>% group_by(userId) %>%
  summarise(Qty_Rated = n()) %>%
  arrange(desc(Qty_Rated)) %>% 
  filter(Qty_Rated > 1000)

items <- ratings %>% select(movieId, rating) %>% 
  filter(rating > 0) %>%
  group_by(movieId) %>%
  summarise(Qty_Rated = n()) %>%
  arrange(desc(Qty_Rated)) %>% 
  filter(Qty_Rated > 2500)

#nrow(users);nrow(items)

Even after the filtering, the resulting matrix will be large ~1800 users x 1700 movies.

Now I convert the ratings table into a sparse matrix of ratings. Unlike the previous assignment, I’m using a larger proportion of the total data (1,397,024 more ratings to be precise.)

set.seed(4)
# "mm" is my matrix for creating the realRatingsMatrix
mm <- ratings %>% 
  filter(rating > 0,
         userId %in% users$userId,
         movieId %in% items$movieId) %>% 
  select(-timestamp) %>% 
  mutate(userId = as.factor(userId), 
         movieId = as.factor(movieId))

mM <- evaluationScheme(as(mm, "realRatingMatrix"),
                       method = "split", 
                       train = 0.7, 
                       given = 5,
                       goodRating = 5)

mM

## Evaluation scheme with 5 items given
## Method: 'split' with 1 run(s).
## Training set proportion: 0.700
## Good ratings: >=5.000000
## Data set: 1884 x 1765 rating matrix of class 'realRatingMatrix' with 1538188 ratings.

Below, I preview the similarity between users using cosine distance and find that among a small sample of 25, most users are different with their ratings.

#slotNames(mM)
s_users <- similarity(mM@data[1:25,],
                      method = "cosine", 
                      which = "users")
image(as.matrix(s_users), main = "User Similarity")

Before proceeding, I’ll review using the image function in recommenderlab - just a corner of the data showing 25 users and 25 movies as the original is too large to show properly.

image(getData(mM, "train")[c(1:25),c(1:25)], 
      main = "50x50 corner of the rating matrix")

Utilize SVD:

One of the options for the Recommender object from the recommenderlab library is SVD. The resulting model contains the components of the SVD form:

\[A = U \Sigma V^{T} \]

One of the results of the above equation is that one can store the ratings matrix as the product of $U$, $\Sigma$, and $V^T$.

The singular values of $\Sigma$ provide a high-to-low diagonalization of the eigenvalues that provide sort of “genre weight” that projects a user’s ratings onto the movie ratings to create a prediction.

Below, I create a ratings matrix from the original dataset to mirror the test dataset in the recommender lab. The reason I’m doing this is so that I can compare the svd results between base and the results of the recommenderlab method SVD.

# obtain the same matrix as in the training set 
# of the realRatingMatrix object:  
rl_train <- getData(mM, "train")
train_users <- as.numeric(rl_train@data@Dimnames[[1]])
train_movie <- as.numeric(rl_train@data@Dimnames[[2]])

tr_ratings <- ratings %>% 
  filter(userId %in% train_users, 
         movieId %in% train_movie) %>% 
  select(-timestamp) %>% 
  spread(movieId, rating)

tr_ratings$userId <- rownames(tr_ratings)
tr_ratings <- tr_ratings %>% 
  select(-userId) %>% 
  as.matrix()                      # set NA's to mean for SVD
tr_ratings[is.na(tr_ratings)] <- 0 # mean(tr_ratings, na.rm = T) 

dim(rl_train);dim(tr_ratings)      # confirms same size

## [1] 1318 1765

## [1] 1318 1765

Now, I calculate the SVD decomposition using R’s base function svd() and then use the recommender function using method “SVD”.

t_svd_base <- proc.time()
svd_base <- svd(tr_ratings)
t_svd_base <- proc.time() - t_svd_base

The list of values in svd_base$d represent the diagnols of $\Sigma$. A general rule of thumb is that if 90% of the prediction energy is retained in the reduced $\Sigma$, the rec-suggestion power will remain.

Next, I’ll use recommenderlab’s SVD method as the SVD decomposition to “aspire to.”

# build model with "train", predict with "known", 
# test quality of prediction with "unknown"
t_svd <- proc.time()
m_svd <- Recommender(rl_train, method = "SVD")
#p_svd <- predict(m_svd, getData(mM, "known"), type = "ratings")
t_svd <- proc.time() - t_svd

Dropping lowest singular values

The below function will narrow the values of $\Sigma$ such that 90% of the “energy” in prediction power is retained. A general rule of thumb is that if the resulting matrix has 90% of the projection “engergy” of the orignal, it’ll provide very similar predicting power.

#energy_total <- sum((svd_base$d)**2)
# an_svd <- svd_base
reduce_dims <- function(an_svd){
  energy_total = sum((an_svd)**2)
  e_ratio  = 1
  indx = 0
  while(e_ratio > .9 | indx <= length(energy_total)){
    
    c = an_svd[1:(length(an_svd)-indx)]
    #print(length(c))
    e_ratio = sum((c)**2)/energy_total
    indx = indx+1
    #print(e_ratio)
  }
  return(c)
}

revised_d <- reduce_dims(svd_base$d)

length(revised_d)

## [1] 360

sum((revised_d)**2)/sum((svd_base$d)**2)

## [1] 0.899941

Above, using impute to zero (probably a bad idea), I reduce the length of d down from 1318 to 360.

What’s more, because $\Sigma$ matrix is reduced, this will result in fewer dimensions needing process to produce an accurate recommender.

Below, I compare the dimensions of the recommenderlab and the base svd dimensions of the constiutents parts of $A = U V^{T} $:

# recommender model version is very streamlined
dim(m_svd@model$svd$u)   # 1318  10

## [1] 1318   10

length(m_svd@model$svd$d)# 10

## [1] 10

dim(m_svd@model$svd$v)   # 1765  10

## [1] 1765   10

t_svd

##    user  system elapsed 
##    0.51    0.12    0.67

# the base svd version is MUCH larger - this represents
# the original decomposition. 
dim(svd_base$u)          # 1318  1318

## [1] 1318 1318

length(svd_base$d)       # 1318

## [1] 1318

dim(svd_base$v)          # 1765  1318

## [1] 1765 1318

t_svd_base

##    user  system elapsed 
##   14.29    0.00   14.65

As shown above, the recommenderlab SVD ran in a shorter period of time and provided constituent components that are much small in dimension.

Finally, I’ll rebuild each original rating matrix (one from recommenderlab and the other from base’s svd() function:)

# A =   U      sigma         # V^T
dim(round(svd_base$u %*% diag(svd_base$d) %*% t(svd_base$v)))       #  X = U D V'

## [1] 1318 1765

dim(round(m_svd@model$svd$u %*% diag(m_svd@model$svd$d) %*% t(m_svd@model$svd$v)))

## [1] 1318 1765

The recommenderlab’s svd method provides a stark reduction in data dimensions and reveals the power of singular value decomposition.

# user_genre_mix <- ratings %>% ungroup() %>% select(userId, movieId) %>% 
#   left_join(g_movies[, c(1, 5:ncol(g_movies))], by = "movieId") %>% 
#   select(-movieId) %>% ungroup() %>% 
#   group_by(userId) %>% 
#   summarise_all(funs(sum))

# tots <- rowSums(user_genre_mix[,2:ncol(user_genre_mix)])
# 
# user_genre_mix %>% gather(genre, qty_rated, -userId) %>% 
#   filter(qty_rated != 0) %>% 
#   ggplot(aes(x = qty_rated)) + geom_histogram() + 
#   scale_x_log10()+
#   facet_wrap(~ genre, scales = "free")
  
#user_genre_mix <- cbind(user_genre_mix, tots)
#tail(tidy_gm, 75)

# #combine cols 4:6 into a col "n", gather col:headers of 4:6 under new col "humans"
# zz<-gather(tb, "humans", "n", 4:6, na.rm = TRUE) %>%  
# tidy_gm <- g_movies %>% 
#   select(movieId, `g_(no genres listed)`:g_Western) %>% 
#   group_by(movieId) %>% right_join(ratings)

# g_movies %>% 
#   gather(-c(movieId, title, imdbId, g), 
#          key = "genre", 
#          value = "m_count") %>%
#   ggplot(aes(x = genre, y = m_count)) +
#   geom_bar(stat = "identity") +
#   coord_flip()

References:

Wiki:SVD

DATA643_P3

jbrnbrg

June 22, 2018

Matrix Factorization Methods

Utilize SVD:

Dropping lowest singular values

References: