1 Introduction

You may have been familiar with what a recommendation system is. With the rise of big data and computation power, you can see recommender system everywhere, from the famous Netflix movie recommendation¹, the controversial Youtube algorithm², or even the product placement of your e-commerce platform or nearby supermarket³. But how does a recommendation system works? There are a lot of articles and websites that have discussed this topic already. This article would not be too much difference, but I will try to guide you to understand how recommendation system works and how to build and evaluate it, especially for R user.

1.1 What is Recommendation System

So, what is a recommendation system and what makes it different from other machine learning algorithm like the regression and classification model? A recommendation system, as its name suggests, is an algorithm that tried to predict what items (movies, products, etc.) that likely you gonna like based on historical data and other complementary information.

For a simple illustration, this is how commonly a recommendation system works. Based on the retail sales data, a person who like to buy banana will more likely to buy milk, bread, and carrot as well.

There are many type of recommendation algorithm but we can define them into the following several categories with some example of the algorithm.

Content-Based Method

The content-based method is a collection of algorithm that rely on information about the item and/or the user. For example, if we build a recommendation system for e-commerce, the information will be the user age, gender, and residence while the information about item is about the category, price, and grade. The idea of content-based methods is to try to build a model, based on the available “features” that explain the observed user-item interactions. Sometimes the algorithm is combined with other approach such as suggesting the popular item or randomly give recommendation to user.

Collaborative Filtering Method

Collaborative filtering method get its name because the algorithm use all available information gathered by historical data about rating given by users. Thus, this method only use the user-item interaction like the following figure. The advantage of collaborative filtering method is that it doesn’t require additional information and easier to implement than using the content-based method, where we need to gather the information regarding the item/user. However, the collaborative filtering method suffers from the cold start problem. When a new user come and has not interacted with any item, the algorithm can’t give recommendation for the user.

Typically, the user-item interactions are represented as a matrix with the rating as the value of the matrix. If the rating information is not available and hard to collect, the value can be filled with the information whether the user has buy the item or watch the movie.

set.seed(123)
mat_sample <- matrix(round(runif(16, min = -10, max = 10)), 
                     nrow = 4) %>% 
  as.data.frame() %>% 
  mutate_all(.funs = function(x) ifelse(x < 1, NA, x)) %>% 
  `rownames<-`(paste0("user_", 1:4)) %>% 
  setNames(paste0("item_", 1:4))

mat_sample

The collaborative filtering method can be divided into two different groups:

Memory-Based Method

The memory-based method only tried to recommend items to user based on the similarity between user. Users will be recommended with item from other users with similar behavior (buy the same products, watch the same movies, give product with same rating). This approach works directly with the rating or the value of the matrix without fitting any model or representation. Some people call this method a nearest neighbour approach since it tried to find the closest neighbour or other users that has the highest similarity. Some example of algorithms that belong to this group is the item-based collaborative filtering (IBCF), user-based collaborative filtering (UBCF), and association rule. I have written about association rule in another post⁴ if you wish to understand them.

Model-Based Method

The model-based method tries to make a representation of the user-item matrix to predict items that has not been rated or interacted with user. The popular method to get the representation of the matrix is through matrix factorization, which break down the user-item matrix into several different matrix.

Hybrid Method

The hybrid method combine the result from several algorithms, be it using content-based method or collaborative filtering method.

I recommend you to read the following article⁵ to get the full explanation of the recommendation system. On this article, I will only focus on the matrix factorization method called Funk SVD, developed by Simon Funk.

1.2 Matrix Factorization

Matrix factorization decompose or break down the user-item matrix into lower dimentionality matrix. One of the popular method of matrix factorization is the Funk SVD, which has won the Netflix prize competition ⁶.

The Funk SVD method tried to make a matrix decomposition that can approximate the value of the real matrix as close as possible (with minimum error).

\[A \approx U\ V^T\]

The method use the Sum of Squared Error (SSE) as the error term that will be minimized via Stochastic Gradient Descent⁷. To make the model generalize well and not over-fit the training set, we implement a penalty term to our minimization equation. This is represented by a regularization factor \(\lambda\) multiplied by the sum of square of the magnitudes of user and item vectors.

\[min\ \Sigma_{i,j \in obs}\ (A_{ij} - U_i\ V_j^T)^2 + \lambda (\ ||U_i||^2 + ||V^T_j||^2\ )\]

Notation

\(A\) : the real user-item rating matrix

\(A_{i,j}\) : rating given by user \(i\) for item \(j\)

\(U_i\) : latent factor for user \(i\)

\(V^T_j\) : latent factor for item \(j\)

Suppose we have the following actual real rating matrix.

mat_sample

We initialize the \(U\) matrix of user latent factor with 3 features/factors and filled with randomized value.

set.seed(123)
u_init <- matrix(runif(12, min = -3, max = 3) %>% round(2), 
                     nrow = 4) %>% 
  as.data.frame() %>% 
  `rownames<-`(paste0("user_", 1:4)) %>% 
  setNames(paste0("factor_", 1:3))

u_init

We initialize the \(V^T\) matrix of item latent factor with 3 features/factors and filled with randomized value.

set.seed(13)
i_init <- matrix(runif(12, min = -3, max = 3) %>% round(2), 
                     nrow = 3) %>% 
  as.data.frame() %>% 
  `rownames<-`(paste0("factor_", 1:3)) %>% 
  setNames(paste0("item_", 1:4))

i_init

Suppose we want to get the rating of the \(user_1\) and \(item_2\), which has actual rating of 9. We simply multiply the \(U\) matrix with the \(V^T\) matrix for \(user_1\) and \(item_2\).

\[\hat {r} = U\ V^T = \begin{bmatrix} -1.27 & 2.64 & 0.31 \end{bmatrix} \begin{bmatrix} -2.45 \\ 2.77 \\ -2.93 \\ \end{bmatrix} = 9.516\]

Now we calculate the squared error of the prediction.

\[Error = (9 - 9.516)^2 = 0.266\]

After that, we update the value of the \(U\) and \(V^T\) matrix using the SGD with regularization.

Now to get your hand dirty, we will start to do a use case of recommendation system using data from Amazon.

2 Library and Setup

The following package is required if you wish to replicate the result of this post. All source code are available on my github repository.

# Data Wrangling
library(tidyverse)
library(lubridate)

# Data Visualization
library(scales)
library(skimr)

# Recommender System
library(recommenderlab)

options(scipen = 999)

3 Import Data

Data is acquired from Amazon Review Data (2018). The data consists of 2 separate datasets:

rating : The rating given by each user to each time with a timestamp
metadata : The metadata/information about each video game item/accessories

We have over 2 millions rating given by users and more than 80,000 different video game items. The metadata is already cleansed previously. You can visit github repo to check the cleansing method.

rating <- data.table::fread("data/Video_Games.csv")
metadata <- data.table::fread("data/metadata.csv")
metadata <- metadata %>% 
  select(asin, title, category1, category2, category3, price, brand, feature, tech1, tech2, image, description)

cat("----------Rating Dataset----------\n")

## ----------Rating Dataset----------

glimpse(rating)

## Rows: 2,565,349
## Columns: 4
## $ V1 <chr> "0439381673", "0439381673", "0439381673", "0439381673", "043938167…
## $ V2 <chr> "A21ROB4YDOZA5P", "A3TNZ2Q5E7HTHD", "A1OKRM3QFEATQO", "A2XO1JFCNEY…
## $ V3 <dbl> 1, 3, 4, 1, 4, 5, 3, 5, 5, 5, 5, 5, 1, 1, 5, 5, 5, 2, 5, 5, 4, 3, …
## $ V4 <int> 1402272000, 1399680000, 1391731200, 1391731200, 1389830400, 138905…

Data Description :

V1 : item id
V2 : user id
V3 : rating given by the user
V4 : timestamp of the rating

We will rename the column name of the rating dataset and transform the timestamp from integer to proper date format. The rating dataset contain essential informations for us to build the recommendation system. On the later analysis, we will convert this rating dataset into something called user-item rating matrix.

rating <- rating %>% 
  set_names(c("item", "user", "rating", "timestamp")) %>% 
  mutate(
    timestamp = as_datetime(timestamp)
  )

head(rating, 10)

The metadata consists of the relevant information of each video game title and accessories, including the product name, category, price, etc. The metadata can be used to give proper recommendation output because I am sure we will not give an item id to user and let them figure what the product name of the recommended item. This metadata is not essential if we don’t use them as part of the recommendation system, for example if we want to build the recommendation system based on the category and brand of the item.

cat("\n\n----------Metadata Dataset----------\n")

## 
## 
## ----------Metadata Dataset----------

glimpse(metadata)

## Rows: 84,819
## Columns: 12
## $ asin        <chr> "0042000742", "0078764343", "0276425316", "0324411812", "…
## $ title       <chr> "Reversi Sensory Challenger", "Medal of Honor: Warfighter…
## $ category1   <chr> "Video Games", "Video Games", "Video Games", "Video Games…
## $ category2   <chr> "PC", "Xbox 360", "Retro Gaming & Microconsoles", "Xbox 3…
## $ category3   <chr> "Games", "Games", "Super Nintendo", "Accessories", "Games…
## $ price       <chr> "", "\n\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"\"verticalAl…
## $ brand       <chr> "Fidelity Electronics", "by\n    \n    EA Games", "Ninten…
## $ feature     <chr> NA, NA, NA, NA, NA, "Sim City 3000 CD-ROM", "Phonics Aliv…
## $ tech1       <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ tech2       <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ image       <chr> "https://images-na.ssl-images-amazon.com/images/I/31nTxlN…
## $ description <chr> NA, "Brand new sealed!", NA, "MAS's Pro Xbox 360 Stick (P…

Data description:

asin : item id
title : name of the product
category : categories the product belongs to
price : price in US dollars (at time of crawl)
brand : brand name
feature : bullet-point format features of the product
tech : the technical detail table of the product
image : url of the product image
description : description of the product

4 Exploratory Data Analysis (EDA)

We need to be more familiar with our data before building the model. Through exploratory data analysis, we first try to look at the data more closely, by summarizing or inspecting the data. As we go through the EDA process, we can find that perhaps not all data should be used or that we may need to transform or preprocess the data first.

4.1 Does a single user can rate a single video game multiple time?

We will check by counting how many times an item is rated by a single user.

rating %>% 
  count(item, user) %>% 
  arrange(desc(n)) %>% 
  head(10)

As we can see, a item can be rated more than once by a single user, perhaps they buy them multiple times or there are duplicate data. Depending on your purpose, you can calculate the mean of the rating or you can just choose the recent rating only. Thus, for the next analysis we will only consider the latest rating given by the user and ignore the rest.

rating <- rating %>% 
  group_by(item, user) %>% 
  arrange(desc(timestamp)) %>% # Arrange rating by timestamp 
  slice(1) %>% # Take only the latest rating
  ungroup()

cat(
  paste("Number of Unique Rating :", nrow(rating) %>% prettyNum(big.mark = ","))
  )

## Number of Unique Rating : 2,489,395

4.2 Video Games Rating Distribution

Let’s check the frequency of rating given to each game. Based on the summary, some games are rated only once or twice. On average, a item is rated 5 times based on the median value.

game_count <- rating %>% 
  count(item)

game_count %>% 
  skim()

Data summary
Name	Piped data
Number of rows	71982
Number of columns	2
_______________________
Column type frequency:
character	1
numeric	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
item	0	1	10	10	0	71982	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
n	0	1	34.58	133.49	1	2	5	19	6462	▇▁▁▁▁

For the next analysis, we will only consider items that’s been rated at least 50 times. The choice of number of rating is arbitrary, you can set the limit yourself. After we filter out the item, the number of rating is significantly reduced.

select_item <- game_count %>%  
  filter(n > 50) %>% 
  pull(item)

# Update the rating
rating <- rating %>% 
  filter(item %in% select_item)

cat(
  paste("Number of Rating :", nrow(rating) %>% prettyNum(big.mark = ","))
  )

## Number of Rating : 1,957,649

4.3 User Rating Distribution

Now that we have the updated rating data, we check the frequency of each rating score (1-5) given by user. Based on the bar chart, most user will give rating score of 5 for the game they’ve bought.

rating %>% 
  ggplot(aes(rating)) +
  geom_bar(fill = "firebrick") +
  scale_y_continuous(labels = number_format(big.mark = ",")) +
  labs(x = "Rating", y = "Frequency",
       title = "Number of Rating Given by User") +
  theme_minimal()

We also check how many video games a user have been rated by looking at the distribution. Based on the statistics, most user only rate a single game item. This data may not informative to us since we don’t know what other item that those user also buy.

user_count <- rating %>% 
  count(user)

user_count %>% 
  skim()

Data summary
Name	Piped data
Number of rows	1284091
Number of columns	2
_______________________
Column type frequency:
character	1
numeric	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
user	0	1	10	20	0	1284091	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
n	0	1	1.52	2.04	1	1	1	1	514	▇▁▁▁▁

For the next analysis, we will only consider users who’ve rated at least 10 different games. Again, this choice is arbitrary.

select_user <- user_count %>% 
  filter(n > 10) %>% 
  pull(user)

# update rating
rating <- rating %>% 
  filter(user %in% select_user)

cat(
  paste("Number of Rating :", nrow(rating) %>% prettyNum(big.mark = ","))
  )

## Number of Rating : 133,397

This will also decrease the dimension for our dataset with only 7113 users remaining. We omit most of the user since they only give one rating and most game are only rated once so they are not very informative.

Let’s once again check the rating distribution.

rating %>% 
  ggplot(aes(rating)) +
  geom_bar(fill = "firebrick") +
  scale_y_continuous(labels = number_format(big.mark = ",")) +
  labs(x = "Rating", y = "Frequency",
       title = "Number of Rating Given by User") +
  theme_minimal()

We may also check the number of rating frequency over time. The graph below shows that rating activity reach its peak around 2015 and start to decrease afterwards.

rating %>% 
  mutate(
    timestamp = floor_date(timestamp, unit = "week")
  ) %>% 
  count(timestamp, rating) %>% 
  ggplot(aes(timestamp, n, color = as.factor(rating), group = rating)) +
  geom_line() +
  scale_color_brewer(palette = "Dark2") +
  scale_x_datetime(date_breaks = "2 year", labels = date_format(format = "%Y")) +
  labs(x = NULL, y = "Frequency",
       title = "Weekly Frequency of Rating Activity",
       color = "Rating") +
  theme_minimal() +
  theme(legend.position = "top")

We can check the most rated game at time interval between 2014 and 2016 by joining the rating data with the item metadata.

rating %>% 
  filter(year(timestamp) > 2014,
         year(timestamp) < 2016) %>% 
  count(item) %>% 
  arrange(desc(n)) %>% 
  head(10) %>% 
  left_join(metadata, by = c("item" = "asin"))

5 Data Preprocessing

After we finished exploring the data, we will convert the data into a matrix, with the row is the user and each column is the item/item. The value in each cell is the rating given by the user. If the user haven’t rated any item, the cell value will be a missing value (NA).

We can make a recommendation system using 2 type of matrix:

Real Rating Matrix

Each cell represent a normalized rating given by the user. Below is the example of a non-normalized real rating matrix with 3 users and 3 items.

# example
matrix(data = c(NA, NA, 1, 5, NA, 3, 4, NA, 2), nrow = 3, 
       dimnames = list(c("user_1", "user_2", "user_3"), 
                       c("item_1", "item_2", "item_3")))

##        item_1 item_2 item_3
## user_1     NA      5      4
## user_2     NA     NA     NA
## user_3      1      3      2

Binary rating matrix

Each cell represent a response given by the user and can only have binary values (recommended/not recommended, good/bad).

# example
matrix(data = c(NA, 1, NA, 0, 1, 1, 0, NA, 1), nrow = 3, 
       dimnames = list(c("user_1", "user_2", "user_3"), 
                       c("item_1", "item_2", "item_3")))

##        item_1 item_2 item_3
## user_1     NA      0      0
## user_2      1      1     NA
## user_3     NA      1      1

Below is the real rating matrix of our data. From 133,197 ratings, we have build a 7,113 x 8,550 rating matrix.

rating_matrix <- rating %>% 
  select(item, user, rating) %>% 
  reshape2::dcast(user ~ item) %>% # Convert long data.frame to wide data.frame
  column_to_rownames("user") %>% 
  as.matrix() %>% 
  as("realRatingMatrix")

rating_matrix

## 7113 x 8550 rating matrix of class 'realRatingMatrix' with 133397 ratings.

We may peek inside the sample of the matrix. Since a user rarely give rating to all available items, the matrix is mostly empty. This kind of matrix is called the sparse matrix because mostly it’s sparse and has missing value. On the example below, we only have 2 ratings from 9 user and 9 items.

rating_matrix@data[1:9, 2001:2009]

## 9 x 9 sparse Matrix of class "dgCMatrix"
##                      B000OPPR7W B000OYITQO B000OYKQBU B000OYMSL6 B000OYMYZQ
## A0380485C177Q6QQNJIX          .          .          .          .          .
## A0685888WB02Q69S553P          .          .          .          .          .
## A100JCBNALJFAW                .          .          .          .          .
## A100U2O7L15XNL                .          .          .          .          .
## A1027EV8A9PV1O                .          .          .          .          .
## A102MU6ZC9H1N6                .          5          .          .          .
## A102RLOGIBBDMW                .          .          .          .          .
## A103B6MQ5IF2BK                .          .          .          .          .
## A103KKI1Y4TFNQ                .          .          .          .          .
##                      B000P0QIM4 B000P0QIP6 B000P0QJ1E B000P0QJD2
## A0380485C177Q6QQNJIX          .          .          .          .
## A0685888WB02Q69S553P          .          .          .          .
## A100JCBNALJFAW                .          .          .          4
## A100U2O7L15XNL                .          .          .          .
## A1027EV8A9PV1O                .          .          .          .
## A102MU6ZC9H1N6                .          .          .          .
## A102RLOGIBBDMW                .          .          .          .
## A103B6MQ5IF2BK                .          .          .          .
## A103KKI1Y4TFNQ                .          .          .          .

As you can see, the value is not normalized yet and still in range of [1,5]. Since we use the real rating matrix, we need to normalize the rating. However, you can also skip this step since the model fitting will normalize the data by default. You can normalize the data via two method:

Normalization by mean (center method)

We normalize the data by subtracting the data with it’s own mean for each user.

\[normalized\ x = x - \overline x\]

Normalization by Z-score

We use the Z-score of the standard normal distribution to scale the data.

\[Z = \frac{(x - \overline x)^2}{s}\]

We don’t have to manually normalize the rating matrix, since the model fitting process in recommenderlab will normalize our data by default. But if you want to do it outside the model, you use the normalize() function and determine the method, either center method or Z-score method

normalize(rating_matrix, method = "center")

## 7113 x 8550 rating matrix of class 'realRatingMatrix' with 133397 ratings.
## Normalized using center on rows.

6 Building Recommender System

The data is ready. Now we can start building the recommendation system. There are several algorithm that you can use to build a recommendation system using the recommenderlab package. You can check it by looking at the registry and specify the data type. Below are some recommendation algorithm for a rating matrix with real value.

recommenderRegistry$get_entries(dataType = "realRatingMatrix") %>% 
  names()

##  [1] "HYBRID_realRatingMatrix"       "ALS_realRatingMatrix"         
##  [3] "ALS_implicit_realRatingMatrix" "IBCF_realRatingMatrix"        
##  [5] "LIBMF_realRatingMatrix"        "POPULAR_realRatingMatrix"     
##  [7] "RANDOM_realRatingMatrix"       "RERECOMMEND_realRatingMatrix" 
##  [9] "SVD_realRatingMatrix"          "SVDF_realRatingMatrix"        
## [11] "UBCF_realRatingMatrix"

Description :

POPULAR : Popular Recommendation
UBCF : User-Based Collaborative Filtering
IBCF : Item-Based Collaborative Filtering
RANDOM : Random Recommendation
SVD : Singular Value Decomposition
SVDF : Funk Singular Value Decomposition

For now, let’s start make a recommendation system with the Funk SVD method. You can check the initial/default parameter of the model.

recommenderRegistry$get_entry("SVDF", dataType = "realRatingMatrix")

## Recommender method: SVDF for realRatingMatrix
## Description: Recommender based on Funk SVD with gradient descend (https://sifter.org/~simon/journal/20061211.html).
## Reference: NA
## Parameters:
##    k gamma lambda min_epochs max_epochs min_improvement normalize verbose
## 1 10 0.015  0.001         50        200        0.000001  "center"   FALSE

Description:

k : number of features (i.e, rank of the approximation).
gamma : regularization term.
lambda : learning rate.
min_improvement : required minimum improvement per iteration.
min_epochs : minimum number of iterations per feature.
max_epochs : maximum number of iterations per feature.
verbose : show progress.

We will modify the parameter by using Z-score normalization instead.

recom_svdf <- Recommender(data = rating_matrix,
                         method = "SVDF",
                         parameter = list(normalize = "Z-score")
                         )

recom_svdf <- read_rds("output/svdf.Rds")

7 Give Recommendation

Now we will try to generate a random new user to simulate the recommendation process. Let’s say we have the following new users who only gave a single or two rating.

select_item <- unique(rating$item)

set.seed(251)
new_user <- data.frame(user = sample(10, 10, replace = T),
                       item = sample(select_item, 10),
                       rating = sample(1:5, 10, replace = T)
                       ) %>% 
  arrange(user) 

new_user %>% 
  left_join(metadata %>% select(asin, title), by = c("item" = "asin"))

We also need to convert them into the same real rating matrix.

dummy_df <- data.frame(user = -1,
                       item = select_item,
                       rating = NA) %>% 
  reshape2::dcast(user ~ item) %>% 
  select(-user)

new_matrix <- new_user %>% 
  reshape2::dcast(user ~ item) %>% 
  column_to_rownames("user")

new_matrix

Let’s convert them into the proper real rating matrix.

select_empty <- select_item[!(select_item %in% names(new_matrix))]

new_matrix <- new_matrix %>% 
  bind_cols(
    dummy_df %>% select(all_of(select_empty)) 
  ) %>% 
  as.matrix() %>% 
  as("realRatingMatrix")

new_matrix

## 6 x 8550 rating matrix of class 'realRatingMatrix' with 10 ratings.

You can check the content of the rating matrix.

new_matrix@data[ , 1:9]

## 6 x 9 sparse Matrix of class "dgCMatrix"
##      B000NUBY0C B000SFK0PW B0018ZWH0W B001CRM3YQ B003WY86L6 B0050SZ49Y
## [1,]          .          .          .          .          .          .
## [2,]          1          .          .          3          .          .
## [3,]          .          .          .          .          .          4
## [4,]          .          .          .          .          .          .
## [5,]          .          .          .          .          1          .
## [6,]          .          4          2          .          .          .
##      B005HRZ29K B006JA7EWW B00EZV6HHU
## [1,]          1          .          .
## [2,]          .          .          .
## [3,]          .          1          .
## [4,]          .          .          5
## [5,]          .          .          .
## [6,]          .          .          .

To get the recommendation for the new data, we simply use predict(). Here, we want to get top 5 recommendation for each user based on what items they have already rated. To get the recommended item, use type = "topNList" and specify the number of top n recommendation. The top-n method will automatically give you the top n item that has the highest score/rating for each new user.

predict_new <- predict(recom_svdf, 
                       new_matrix,
                       type = "topNList",
                       n = 5
                       )

predict_new

## Recommendations as 'topNList' with n = 5 for 6 users.

We further build the proper data.frame to show the recommendation. Below are the top 5 recommended item for each user.

as(predict_new, 'list') %>% 
  map_df(as.data.frame) %>% 
  rename("asin" = 1) %>% 
  mutate(
    user = map(unique(new_user$user), rep, 5) %>% unlist()
  ) %>% 
  select(user, everything()) %>% 
  left_join(metadata %>% select(asin, title)) %>% 
  distinct()

You can also get the predicted rating from all missing item of each user. The missing value (the dots .) is the item that has been rated previously by the user and so they don’t have new predicted rating.

pred_rating <- predict(recom_svdf, 
                       new_matrix,
                       type = "ratings"
                       )

pred_rating@data[ , 1:9]

## 6 x 9 sparse Matrix of class "dgCMatrix"
##      B000NUBY0C B000SFK0PW B0018ZWH0W B001CRM3YQ B003WY86L6 B0050SZ49Y
## [1,]   1.070433   1.099133   1.110897   1.080597   1.102519   1.103395
## [2,]   .          2.448270   2.461899   .          2.452165   2.453192
## [3,]   2.647957   2.707811   2.732478   2.669147   2.714929   .       
## [4,]   5.070331   5.098954   5.110698   5.080468   5.102336   5.103209
## [5,]   1.070360   1.099013   1.110764   1.080507   .          1.103271
## [6,]   3.098442   .          .          3.112634   3.143250   3.144473
##      B005HRZ29K B006JA7EWW B00EZV6HHU
## [1,]   .          1.112544   1.107992
## [2,]   2.437817   2.463955   2.458629
## [3,]   2.689297   .          2.726323
## [4,]   5.090099   5.112334   .       
## [5,]   1.090148   1.112405   1.107859
## [6,]   3.126118   3.157248   3.150889

8 Evaluating Model

Now that we’ve successfully build our model, how do we now that the recommendation system is good enough and not just throwing some random suggestions?

Similar with the classical regression and classification problem, we can use cross-validation by splitting data into data train and data test with 90% of the rating data will be the training dataset. Selecting given = -1 means that for the test users ‘all but 1’ randomly selected item is withheld for evaluation.

The goodRating determine the threshold to classify whether an item should be recommended or not, similar with how we determine threshold for classification problem. The goodRating is set on 0 since our normalized data is zero-centered and any rating that has value above 0 will be considered as positive and will be recommended.

Using the top-N recommendation, we will get the following confusion matrix from the model.

\[\begin{matrix} & \underline{Actually\ Buy} & \underline{Actually\ Not\ Buy} \\ Recommended & TP & FN\\ Not\ Recommend & FP & FN \end{matrix}\]

We then evaluate the model using the same metrics as the usual classification method, such as model accuracy, recall, and precision.

\[Recall (Sensitivity) = \frac{TP}{TP + FN}\]

\[Precision = \frac{TP}{TP + FP}\]

set.seed(123)
scheme <- rating_matrix %>% 
  evaluationScheme(method = "split",
                   train  = 0.9,  # 90% data train
                   given  = -1,
                   goodRating = 0
                   )

scheme

## Evaluation scheme using all-but-1 items
## Method: 'split' with 1 run(s).
## Training set proportion: 0.900
## Good ratings: >=0.000000
## Data set: 7113 x 8550 rating matrix of class 'realRatingMatrix' with 133397 ratings.

Now we will run the training process for the Funk SVD method with Z-score normalization. We will look at the model performance performance when it give use 1, 4, 8, 12, 16, and 20 recommended items.

8.1 Rating Error Measurement

You can get the rating score of the recommended item and calculate the error instead. The evaluation method using top-N method rely on the good rating as the threshold for classifying positive and negative recommendation. For a real rating matrix, we can also directly measure how good the model predict the rating and measures their error, including MAE, MSE, and RMSE.

\[RMSE = \sqrt \frac{\Sigma_{i,j \in K} (r_{i,j} - \hat r_{i,j})^2}{|K|}\]

Notation:

\(K\) : set of all pairing user \(i\) and rating \(j\)
\(r_{i,j}\) : actual rating
\(\hat r_{i,j}\) : predicted rating

\[MAE = \frac{\Sigma_{i,j \in K} |r_{i,j} - \hat r_{i,j}|}{|K|}\]

result_rating <- evaluate(scheme, 
                          method = "svdf",
                          parameter = list(normalize = "Z-score", k = 20),
                          type  = "ratings"
                          )

beepr::beep(8)

result_rating <- read_rds("output/svdf_rating.Rds")

From the evaluation process, we can summarize the mean of each performance measures from each fold.

result_rating@results %>% 
  map(function(x) x@cm) %>% 
  unlist() %>% 
  matrix(ncol = 3, byrow = T) %>% 
  as.data.frame() %>% 
  summarise_all(mean) %>% 
  setNames(c("RMSE", "MSE", "MAE"))

8.2 Top-N Recommendation

set.seed(123)
result <- evaluate(scheme, 
                   method = "svdf",
                   parameter = list(normalize = "Z-score", k = 20),
                   type  = "topNList", 
                   n     = c(1, seq(4, 20, 4))
                   )

The evaluation scheme took some time to run, so I have provided the saved object as well. Here is the recap of the model performance using the top-N recommendation.

result <- read_rds("output/svdf_val.Rds")

result@results %>% 
  map_df(function(x) x@cm %>% 
           as.data.frame %>% 
           rownames_to_column("n")) %>% 
  mutate(n = as.numeric(n)) %>% 
  arrange(n) %>% 
  rename("Top-N" = n)

8.2.1 ROC Curve

From the result of the evaluation method, we can get the performance metrics. Here, we will visualize the ROC Curve of the model.

result %>% 
  getConfusionMatrix() %>% 
  map_df(~as.data.frame(.) %>% rownames_to_column("n")) %>%
  group_by(n) %>% 
  summarise_all(mean) %>% 
  ggplot(aes(x = FPR, y = TPR)) +
  geom_line() +
  geom_point(shape = 21, fill = "skyblue", size = 2.5) +
  scale_x_continuous(limits = c(0, 0.0025)) +
  labs(title = "ROC Curve",
       x = "False Positive Rate", 
       y = "True Positive Rate",
       subtitle = "method : SVD") +
  theme_minimal()

8.2.2 Precision-Recall Curve

We can also see the precision-recall curve.

result %>% 
  getConfusionMatrix() %>% 
  map_df(~as.data.frame(.) %>% rownames_to_column("n")) %>%
  group_by(n) %>% 
  summarise_all(mean) %>% 
  ggplot(aes(x = recall, y = precision)) +
  geom_line() +
  geom_point(shape = 21, fill = "skyblue", size = 2.5) +
  labs(title = "Precision-Recall Curve",
       x = "Recall", y = "Precision",
       subtitle = "method : SVD") +
  theme_minimal()

9 Model Comparison

Now that we’ve learn how to evaluate a recommendation model, we can start to compare multiple model to get the best model for our dataset. Since we’ve evaluated Funk SVD on the previous step, for this part we will evaluate the following method:

Random
Popular item
SVD
Alternating Least Square (ALS)
Item-Based Collaborative Filtering (IBCF)

algorithms <- list(
  "Random items" = list(name = "RANDOM"),
  "Popular items" = list(name = "POPULAR"),
  "SVD" = list(name = "SVD"),
  "ALS" = list(name = "ALS"),
  "item-based CF" = list(name = "IBCF")
                   )

9.0.1 Rating Error Measurement

We will evaluate the model by measuring the ratings and get the RMSE, MSE, and MAE value.

set.seed(123)
result_error <- evaluate(scheme, 
                         algorithms, 
                         type  = "ratings"
                         )

result_error <- read_rds("output/eval_error.Rds")

Then, we visualize the result.

get_error <- function(x){
x %>% 
  map(function(x) x@cm) %>% 
  unlist() %>% 
  matrix(ncol = 3, byrow = T) %>% 
  as.data.frame() %>% 
  summarise_all(mean) %>% 
  setNames(c("RMSE", "MSE", "MAE"))
}

result_error_svdf <- result_rating@results %>% 
  get_error() %>% 
  mutate(method = "Funk SVD")

map2_df(.x = result_error@.Data, 
     .y = c("Random", "Popular", "SVD", "ALS", "IBCF"), 
     .f = function(x,y) x@results %>% get_error() %>% mutate(method = y)) %>% 
  bind_rows(result_error_svdf) %>%
  pivot_longer(-method) %>% 
  mutate(method = tidytext::reorder_within(method, -value, name)) %>% 
  ggplot(aes(y =  method, 
             x =  value)) +
  geom_segment(aes(x = 0, xend = value, yend = method)) +
  geom_point(size = 2.5, color = "firebrick" ) +
  tidytext::scale_y_reordered() +
  labs(y = NULL, x = NULL, title = "Model Comparison") +
  facet_wrap(~name, scales = "free_y") +
  theme_minimal()

The Funk SVD method acquire the lowest error compared to other algorithms. However, the difference is not that significant with the SVD method.

9.0.2 Top-N Recommendation

If you are interested, you may also evaluate all algorithm using the top-N recommendation instead.

result_multi <- evaluate(scheme, 
                         algorithms, 
                         type  = "topNList", 
                         n     = c(1, seq(4, 20, 4))
                         )

beepr::beep(8)

result_multi <- read_rds("output/eval_scheme.Rds")

Popular and SVD method is competing as the best method for this problem with the Funk SVD following behind. With bigger N, popular method is expected to be better since during the preprocess step we only consider game items that has been rated more than 50 times, so less popular item is out of the data.

get_recap <- function(x){
  x %>% 
    getConfusionMatrix() %>% 
    map_df(~as.data.frame(.) %>% rownames_to_column("n")) %>%
    group_by(n) %>% 
    summarise_all(mean)
}

result_svdf <- result %>% 
  get_recap() %>% 
  mutate(method = "Funk SVD")

result_eval <- map2_df(.x = result_multi, 
                       .y = c("Random", "Popular", "SVD","ALS", "IBCF"), 
                       .f = function(x, y) x %>% get_recap() %>% mutate(method = y)
                       ) %>% 
  bind_rows(result_svdf)


result_eval %>% 
  ggplot(aes(x = FPR, y = TPR, color = method)) +
  geom_line() +
  geom_point() +
  labs(title = "ROC Curve", color = "Method",
       y = "True Positive Rate", x = "False Positive Rate") +
  theme_minimal() +
  theme(legend.position = "top")

Recommendation System : Matrix Factorization with Funk SVD

Arga

9/20/2020