1. Introduction

1.1 Recommender systems

The goal of a recommender system is to help users – usually consumers – find what they want and discover new information that will help them. Recommender systems are ubiquitous now in online marketing – for books, music, health care services, the arts. Recommender systems may follow different paradigms. Content-based systems recommend based on similarity between items or users; similarity can be quantified using a variety of measures: cosine similarity, k-nearest neighbors, Pearson correlation, logistic regression models and more.

In User Based Collaborative Filtering (UBCF), items are recommended assuming that users with similar preferences will rate items similarly. In Item Based Collaborative Filtering (IBCF), the presumption is that users will prefer items that are similar to other items they like. Information about users and items is stored in a matrix that is modeled and used to make predictions – the recommendations.

In this project, we use the recommenderlab package in R and textual analysis to design User Based and Item Based recommender systems based on a database of Amazon Fine Food reviews.

1.2 Amazon Fine Foods and the Recommender System

Amazon is an American electronic commerce and cloud computing company founded on July 5, 1994, by Jeff Bezos and is based in Seattle. It is the largest internet-based retailer in the world by total sales and market capitalization. Part of Amazon’s success is due to the company’s recommender systems and their incredible success in marketing products based on user preferences and product similarity. This as a result had led to increases in sales and an overall asset to the company. Amazon’s own recomender system, known as item-to-item collaborative filtering, is a hybrid approach designed to produce strong results using fast, efficient computing in a high-volume, information dense environment. By comparison, the recommender systems were offer here are basic.

Amazon’s recommender algorithms can be found here.

In this exercise, we will be creating a recommender system that will help guide its users for their next meals. Just like many other products on Amazon, a user would have purchased the food item and rated it from a 1 to 5 scale, with 1 being terrible and 5 being terrific. Though we are not using Amazon’s algorithms, the models we created have a true positive rate approximating XX percent. That means recommendations are more likely than not to fit a particular user’s interests based on similarities between users or items.

References are here. [^1]



## 2. Database Connect

We’ve stored our data in a PostgresSQL volume in an Amazon RDS. Code in this section logs in and downloads 568,454 records. We’re including the code for information purposes, but because this is a large dataset we are commenting it out and only loading the cleaned subsets that are used for modeling. The clean-up code, which uses regular expressions and dplyr is also included but commented out for computing efficiency.

# amzpw <- "lezted17"
# amzhost <- 'tdpostgres.cfbzmuyqdlau.us-east-1.rds.amazonaws.com'
# amzuser <- 'tomdetz'

# loads the RPostgreSQL driver
# drv <- dbDriver("PostgreSQL")

# https://434017622503.signin.aws.amazon.com/console

# connect to the postgres db, normal setup
# con <- dbConnect(drv, dbname = "kaggle",
#                 host = amzhost, port = 5432,
#                 user = amzuser, password = amzpw)

# check for the table
# dbExistsTable(con, "reviews")

# pull 100K rows for testing
# data <- dbGetQuery(con, "SELECT * FROM reviews ORDER BY hashfloat8(random())   LIMIT 100000 ;")

# data <- dbGetQuery(con, "SELECT * FROM reviews;")

load("amDat.rda")
# data <- data[sample(nrow(data), 100000), ]

kable(data[c(1:6), c(2,3,7,9)], caption="Reviews Raw Import")
Reviews Raw Import
ProductId UserId Score Summary
B000E7WM0K A3BSTFFIKK5YTW 3 OKAY IN A PINCH…
B000E7WM0K A3NITES609Z2LN 3 Now containing gluten!
B000E7WM0K A3EL8P50JQ5OIF 2 not bad, but I won’t reorder
B000E7WM0K AINZRFRVI8H8E 4 Great gluten free quick fix meal
B000E7WM0K A3BJ9NS09YGQT5 5 Easy and delicious
B000E7WM0K A160JZXED5K7P1 4 Gluten free goodness
***

3. Data Preparation

In the following steps, we load the raw data, clean up some dirty fields with regex, create a mean review score for each user and build a user-item matrix that will be fed to recommenderlab to build our recommendation models.

The data includes variables for User and Product identification codes, a score (1-5) that is each user’s product recommendation, a summary review and an extended review that goes into details. In addition to recommendation models built on user and item similarity, we will use a semantic analysis of the summary field to recommend products.

# get product counts
count <- ungroup(data) %>% 
          group_by(ProductId) %>% 
          summarize(Count=n()) %>% 
          arrange(desc(Count))

# get mean score for each product
mean_score <- ungroup(data) %>% 
                group_by(ProductId) %>% 
                summarize(Mean_score = mean(Score)) %>% 
                arrange(desc(Mean_score))

# merge counts and mean into data frame
data <- merge(data, count, by.x='ProductId', by.y='ProductId', all.x=T)
data <- merge(data, mean_score, by.x='ProductId', by.y='ProductId', all.x=T)

# drop unneeded columns
data2 <- data[, c(1:4,7,9:12)]

# delete rid of stray characters
data2$UserId <- gsub('#oc-', '', data2$UserId)

# trim white space
data2[, c(1:6)] <- lapply(data2[, c(1:6)], trimws)

# make Score numeric
data2$Score <- as.numeric(data2$Score)

# create a new data set with a column that groups by product and combines the Summary reviews; this df is used for semantic analysis later

data3 <- ungroup(data2) %>%
            group_by(ProductId) %>% 
            mutate(combine_summary = paste0(Summary, collapse = ' '))

# check lengths
# length(unique(data3$combine_summary))
# length(unique(data3$ProductId))

# end data cleanup on original data; clean data in data3

## for recommenderlab, the data must be imported in a particular format
## the following steps create 'datRlab' in the right format 

# drop products with fewer than median count
medianProds <- median(data2$Count)

datRlab <- ungroup(data3) %>%
            filter(Count >= medianProds)

# remove unneded columns
datRlab <- datRlab[, c(3,1,5)]

# remove duplicates
datRlab <- datRlab[!duplicated(datRlab[,c(1,2)]),]
## datRlab is now in the format needed for recommenderlab

kable(head(datRlab), caption='Data in Recommenderlab format')
Data in Recommenderlab format
UserId ProductId Score
ADBFSA9KTQANE 0006641040 5
A1S3C5OFU508P3 0006641040 4
A2P4F2UO0UMP8C 0006641040 4
A3OI7ZGH6WZJ5G 0006641040 5
A12HY5OZ2QNK4N 0006641040 5
A3SJWISOCP31TR 0006641040 5
***

4. Data Exploration

4.1 How many unique users and products?

Unique users: 256,059. Unique products: 74,258

4.2 What are most-reviewed products?

The most-reviewed item is Quaker Soft-Baked Cookies, with 913 reviews. Second most-reviewed product is Nature’s Way coconut oil moisturizer.

4.3 Who are the top reviewers?

The top reviewer, with 181 reviews, is ‘C. F. Hill’. Second-top reviewer, with 106 reviews, is ‘B. Davis, The Happy Hermit’.

4.4 What is the distribution of reviewer scores?

Most users only review a single item – the distribution is strongly skewed to the left. By comparison, review scores are skewed to the right – the average review score is 4.18. Reviewers tend to give positive reviews overall.

# top products sort
# ungroup(data4) %>% 
#   group_by(ProductId) %>% 
#   summarize(Count=n()) %>% 
#   arrange(desc(Count))

# top reviewers
# ungroup(data4) %>% 
#   group_by(UserId) %>% 
#   summarize(Count=n()) %>%
#   arrange(desc(Count))

# limit to products with more than median number of reviews
data4 <- data3[which(data3$Count >= medianProds),]

# remove any duplicate reviews
data4 <- data4[!duplicated(data4[,c(1,3)]),]

# add a reviewer count
reviewer_count <- ungroup(data4) %>% 
                    group_by(UserId) %>% 
                    summarize(RCount = n())

# merge into df
data4 <- merge(data4, reviewer_count, by.x='UserId', by.y='UserId', all.x=T)

avgScore <-round(mean(data4$Score), 2)

# distributon of product mean scores 
 ggplot(data4[which(data4$RCount <= 50),], aes(x=Mean_score)) +
    geom_histogram(binwidth=.01, alpha=.5, position="identity") +
    geom_vline(aes(xintercept=mean(Score)), color="red") +
    annotate("text", x=4.6, y=1500, label=paste("Mean = ", avgScore)) +
    labs(x="Mean Score", y="Count",
         title="Distribution of Review Scores") +
    theme_tufte()

# number of reviews 
ggplot(data4[which(data4$RCount <= 50),], aes(x=RCount)) +
    geom_histogram(binwidth=1, alpha=.5, position="identity") +
    labs(x="Count of Reviews", y="Count of Users",
         title="Distribution, Number of Reviews per User") +
    theme_tufte()

4.5 Differences among reviewers

Do users who review more products tend to give different scores? The chart below shows that reviewers are fairly consistent despite reviewer experience. Scores tend to be about the same, although they are skewed to be positive overall with averages above 4 in all groups.

# suggestion for further exploration: difference in average scores for bins of reviewers 0-5 reviews 5-10 reviews, etc.

data4$Rcut<-cut(data4$RCount, c(0,5,10,15,20,25,30,35,40,45,50,100,200))

# descriptive stats based on reviwer activity

statbox <- ungroup(data4) %>% 
              group_by(Rcut) %>% 
              summarize(avgScore = round(mean(Score, na.rm=T), 2),
                        medScore = median(Score),
                        sdScore = round(sd(Score, na.rm=T), 2))

colnames(statbox) <- c("Review Count", "Average Score", 
                       "Median Score", "Std Deviation")
kable(statbox)
Review Count Average Score Median Score Std Deviation
(0,5] 4.17 5 1.34
(5,10] 4.27 5 1.22
(10,15] 4.14 5 1.27
(15,20] 4.05 5 1.22
(20,25] 4.14 5 1.21
(25,30] 4.02 4 1.14
(30,35] 3.94 4 1.17
(35,40] 4.10 4 1.07
(40,45] 4.13 5 1.09
(45,50] 4.15 5 1.11
(50,100] 4.14 5 1.08
(100,200] 4.47 5 0.77
***

5. Recommenderlab modeling

The Amazon Fine Foods.csv has been downloaded and transformed into the dataframe datRlab. The recommenderlab package allows us to use and evaluate recommender systems. But in order to utilize the data, the data must be first converted into a matrix format. Below, the datRlab matrix prepared earlier is converted into a recommenderlab format called a realRatingMatrix.

g <- acast(datRlab, UserId~ ProductId)
R <- as.matrix(g)
r <- as(R, "realRatingMatrix")
r
## 101400 x 2921 rating matrix of class 'realRatingMatrix' with 278406 ratings.

5.1 Matrix inspection

recommenderlab provides functions to inspect the data. As with most sales websites with reviews, there will be many users, but the majority of users will not have reviewed most of the items (foods in this case) listed on the website. As a result, the matrix will be sparse (in other words, lots of empty fields!).

# This function shows what the sparse matrix looks like.
getRatingMatrix(r[c(1:5),c(1:4)])
## 5 x 4 sparse Matrix of class "dgCMatrix"
##                       0006641040 7310172001 7310172101 B00002N8SM
## A015565634RZNSDLJBE5M          .          .          .          .
## A06364072LBY1F3ING9XN          .          .          .          .
## A09539661HB8JHRFVRDC           .          .          .          .
## A1000WA98BFTQB                 .          .          .          .
## A1001H5ESPYUFH                 .          .          .          .

5.2 Reviewer bias

Sparse data is not the only issue. Most review sites suffer from reviewer bias. There will be some users who tend to rate nearly all 4s and 5s, while others rate 1s and 2s. To deal with these biases, we will normalize the data. In other words, this normalization will transform scores around a mean of 0.

5.3 Distribution of normalized scores

# Histogram of getRatings using Normalized Scores
hist(getRatings(normalize(r)), breaks=100, xlim = c(-2,2), main = "Normalized Scores Histogram")

hist(getRatings(normalize(r, method="Z-score")), breaks = 100, xlim = c(-2,2), main = "Z-score Histogram")

# Let's calculate the average_ratings_per_user and visualize the distribution

average_ratings_per_user <- rowMeans(r)

qplot(average_ratings_per_user) + stat_bin(binwidth = 0.1) +
      ggtitle("Distribution of the average rating per user")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# A very strange looking distribution indeed. This is likely secondary to rating bias.

# We will look at how many foods each user has rated and what the mean rating for each food is.

hist(rowCounts(r), breaks = 100, xlim = c(0,25), main = "Histogram of Number of ratings per User", xlab = "Reviews per User", col = "blue")

hist(colMeans(r), breaks = 20, main = "Frequency of mean ratings per food", col = "lightgreen")

# Let's take a look at the heat index for the top 60 x 60 matrix (as if we visualize the entire matrix, the heatmap would be too small to read.)
image(r[1:60, 1:60], main = "Heatmap of the first rows and columns")

Just a sidenote (and before we continue going onto the main discussion of recommender systems), I wanted to discuss about heatmaps. The heatmap is very helpful in visualizing the overall ratings between the users and his/her items. However, this heatmap unfortunately suffers from sparse data, thus rendering this heatmap unhelpful. Below is an attempt to rid of most of the sparseness (empty cells) and to identify the users that are high raters and likewise, products that are rated frequently.

# What if, instead, we selected the most relevant users and items?

# This means visualizing only the users who have tried many differents foods and the foods that have been tried/eaten/rated by many users. To identify and select the most relevant users and foods, follow these steps:

# 1. Determine the minimum number of foods per user.
# 2. Determine the minimum number of users per food.
# 3. Select the users and foods matching these criteria.

min_n_foods <- quantile(rowCounts(r), 0.999) 
min_n_users <- quantile(colCounts(r), 0.99)

# Now, we can visualize the rows and columns matching the criteria
image(r[rowCounts(r) > min_n_foods, colCounts(r) > min_n_users, main = "Heatmap of the top users and foods"])

As you can see, this heatmap can be fun to visualize, but does take some work to make it somewhat more useful.

Going back to our previous discussion (histogram), we have noted that most users have written less than 5 reviews (which explains why the matrix is sparse). Though, when the raters did rate, the mean rating is greater than a 4 and the distribution has a left sided skew. Let’s actually quantify the mean, median, and skew.

library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
average_ratings <- colMeans(r)
describe(average_ratings)
##    vars    n mean   sd median trimmed mad  min max range skew kurtosis
## X1    1 2921 4.19 0.46   4.27    4.24 0.4 1.32   5  3.68 -1.4     3.79
##      se
## X1 0.01

Let’s continue to look further into the data. While preparing the data for the recommender system, it’s always interesting to look what else the data can reveal. In the above histogram, we had noticed that many of the ratings were either between a 4 and a five. Let’s take a closer look and see the occurrences of the ratings.

# Exploring the values of the ratings.
# Let's take a look at the ratings. We can convert the matrix into a vector and explore its values:
# The ratings are integers in the range 0-5. Let's count the occurrences of each of them.

vector_ratings <- as.vector(r@data)
table_ratings <- table(vector_ratings)
table_ratings
## vector_ratings
##         0         1         2         3         4         5 
## 295910994     23701     15053     23179     43590    172883
# According to the documentation, a rating equal to 0 represents a missing value, so we can remove them from vector_ratings.
vector_ratings <- vector_ratings[vector_ratings != 0]

# Now we can build a frequency plot of the ratings. In order to visualize a bar plot with frequencies, we can use ggplot2. Let's convert them into categories using factor and build a quick chart.
vector_ratings <- factor(vector_ratings)

# Now let's visualize their distribution
qplot(vector_ratings) + ggtitle("Distribution of the Ratings")

It’s always interesting to see the occurrences of the ratings. From this data visualization, the most common rating is a 5.

When most customers go into a restaurant or into a grocery store, they naturally gravitate to items that are typically most popular. If the items have been bought and have been maintaining high ratings, it seems reasonable that the customer would time over and over again go and purchase this food/item. However, Amazon’s job is not only to sell the most popular item, but to also sell similar items that the user may be interested in buying.

There are many different recommender systems with different algorithms that we can use. In this particular chunk, we will explore the “POPULAR” method, which is a method based on item popularity.

# Let's take a recommender that generates recommendations solely on the popularity of items.
# We will create a recommender from the first 10000 users in the Amazon Food Dataset.

r.popular <- Recommender(r[1:10000], method = "POPULAR")

# Now to get recommendations for the top 5 recommendation lists for the next user not used to learn the model.
recom <- predict(r.popular, r[10001], n =5)

# The result contains ordered top-N (n = 5) recommendation lists, one for each user. The recommended items can be inspected as a list.
as(recom, "list")
## $A1DMBZHY2EJGCE
## [1] "B0051COPH6" "B0045XE32E" "B004FEN3GK" "B001EO5U3I" "B008J1HO4C"
# With this information, we can take these top 5 corresponding product items for user 10001 and check to see what this POPULAR recommender system recommends.

Another great resource that we have found was a book called, “Building a Recommendation System with R”. They had explored item-based and user-based collaborative filtering as two popular methods to recommending different items/foods. We’ll also be exploring the cosine and Pearson Correlation methods as well. However, before we start building the recommender models, we have to prepare the data prior to its use.

Data Preparation

When we looked at the data, we noticed that the table contains: Foods that have been reviewed/tried/eaten only a few times. There may be bias in this data from lack of data and/or biased reviewers.

We need to determine the number of users per food. We must prepare the data, build a recommendation model, and validate it. Since this is our first time building this model, we can use a rule of thumb. After we have built the model, we can come back and modify the data preparation.

We will use users who have rated at least 30 foods, and foods that have been reviewed at least 100 times.

ratings_foods <- r[rowCounts(r) > 30, colCounts(r) > 100]
ratings_foods1 <- ratings_foods[rowCounts(ratings_foods) > 30,]
ratings_foods1
## 100 x 815 rating matrix of class 'realRatingMatrix' with 3860 ratings.

Now the data is prepared and ready for use for recommender system development.

Item-based collaborative filtering

Collaborative filtering is a branch of recommendation that takes account of the information about different users. Given a new user, the algorithm considers the user’s purchases and recommends similar items.

The core algorithm is based on these steps: 1. For each two items, measure how similar they are in terms of having received similar ratings by similar users. 2. For each item, identify the k-most similar items. 3. For each user, identify the items that are most similar to the user’s purchases.

We will define the training and test set and create an IBCF recommender system.

# We randomly define the which_train vector that is True for users in the training set and FALSE for the others.
# Will set the probability in the training set as 80%
which_train <- sample(x = c(TRUE, FALSE), size = nrow(ratings_foods1), replace = TRUE, prob = c(0.8, 0.2))

# Define the training and the test sets
recc_data_train <- ratings_foods1[which_train, ]
recc_data_test <- ratings_foods1[!which_train, ]

# Let's build the recommender IBCF - cosine:
recc_model <- Recommender(data = recc_data_train, method = "IBCF", parameter = list(k = 30)) 

# We have now created a IBCF Recommender Model

Now that we have trained the IBCF Recommender Model with the training set. We will now apply the test set.

# We will define n_recommended that defines the number of items to recommend to each user and with the predict function, create prediction(recommendations) for the test set.
n_recommended <- 6
recc_predicted <- predict(object = recc_model, newdata = recc_data_test, n = n_recommended)

# This is the recommendation for the first user
recc_predicted@items[[1]]
## [1] 315 379 556 148 703 561
# Now let's define a list with the recommendations for each user
recc_matrix <- lapply(recc_predicted@items, function(x){
  colnames(ratings_foods)[x]
})

# Let's take a look the recommendations for the first four users:
recc_matrix[1:4]
## $A1B05INWIDZ74O
## [1] "B001EO5Q64" "B001PQTYN2" "B0041CIR62" "B000HDI5O8" "B0062A87HA"
## [6] "B0043WOANY"
## 
## $A1H7Y5XKPGT0OS
## [1] "B000CQBZPG" "B000CQC05U" "B000CQC064" "B000CQG8K8" "B000CQG8KS"
## [6] "B000CQID6U"
## 
## $A1IH42TUIZ2XJL
## [1] "B0007A0AQM" "B0007A0AQW" "B0009YUEG2" "B000BRR8VQ" "B000CPZSC8"
## [6] "B000EVOSE4"
## 
## $A1P2XYD265YE21
## [1] "B0007A0AQM" "B0007A0AQW" "B000BRR8VQ" "B000CQ01NS" "B000CQBZOW"
## [6] "B000CQC04Q"
# Simply by plugging in any number, we can identify their top recommendations.
# IBCF recommends items on the basis of the simlarity matrix.

User-based collaborative filtering

The User based collaborate filtering is a method that identifies which items are similar in terms of having been purchased by the same people. Depending on what similar users have purchased, this collaborative filtering system would then recommend to a new user the items that are similar to its purchases.

For each user, these steps are taken. 1. Measure how similar each user is to the new one. Like IBCF, popular similarity measures are correlation and cosine. 2. Identify the most similar users. The options are: Take account of the top k users (k-nearest_neighbors). Take account of the users whose similarity is above a defined threshold. 3. Rate the items purchased by the most similar users and the approaches are: Average rating. Weighted average rating, using the similarities as weights. 4. Pick the top-rated items.

Let’s build the recommendation model with UBCF.

# The method computes the similarity between users with cosine

# Let's build a recommender model leaving the parameters to their defaults. 
recc_model <- Recommender(data = recc_data_train, method = "UBCF")

# A UBCF recommender has now been created

Likewise with the IBCF recommender, we will now apply a prediction to the test set.

recc_predicted <- predict(object = recc_model, newdata = recc_data_test, n = n_recommended)

# Let's define a list with the recommendations to the test set users.
recc_matrix <- sapply(recc_predicted@items, function(x) {
  colnames(ratings_foods)[x]
})

# Again, let's look at the first four users
recc_matrix[1:4]
## $A1B05INWIDZ74O
## [1] "B001BCVY4W" "B001BCVY9W" "B0051COPH6" "B001BCXTGS" "B001BDDT8K"
## [6] "B001BDDTB2"
## 
## $A1H7Y5XKPGT0OS
## [1] "B000YSRK7E" "B000YSTIL0" "B001AHFVHO" "B001AHJ2D8" "B001AHJ2FQ"
## [6] "B001AHL6CI"
## 
## $A1IH42TUIZ2XJL
## [1] "B004FEN3GK" "B0045XE32E" "B004YV80O4" "B007JFXWRC" "B000YSRK7E"
## [6] "B000YSTIL0"
## 
## $A1P2XYD265YE21
## [1] "B004T80BYE" "B0061IUIDY" "B005OVPK9G" "B008EG5ADY" "B004FEN3GK"
## [6] "B004R8J8E0"
# And again, very much like the previous recommender, we can plug in any number for the user to find their recommendations.

Evaluating the Recommender Systems

As you can see, we have several methods we can choose from. But how do we know which method predicts the best? How do we decide which one to use? Fortunately, there are ways to make this determination such as ROC curve, precision, accuracy, confustion matrices, etc. But prior to making these evaluations, we have to prepare the data so that way it can be evaluated by recommenderlab. The most popular way (at least according to recommenderlab) is the K-fold Cross Validation. (This is what we will do here.)

Let’s prepare the data to evaluate the models.

# We can split the data into some chunks, take a chunk out as the test set, and evaluate the accuracy. Then we can do the same with each other chunk and comput the average accuracy.

n_fold <- 4
rating_threshold <- 4 # threshold at which we consider the item to be good
items_to_keep <- 20 # arbitrary number. This was chosen as it was less than rowCounts(r) < 30 
eval_sets <- evaluationScheme(data = ratings_foods1, method = "cross-validation", k = n_fold, 
                              given = items_to_keep, goodRating = rating_threshold)

size_sets <-sapply(eval_sets@runsTrain, length)

Now that we have developed an evaluationScheme. We are prepared to make some evaluations. Before we make an evaluation on multiple methods, we will start with one example first to demonstrate the inner workings of the evaluationScheme.

model_to_evaluate <- "IBCF"
model_parameters <- NULL

eval_recommender <-Recommender(data = getData(eval_sets, "train"), method = model_to_evaluate, parameter = model_parameters)

# The IBCF can recommend new items and predict their ratings. In order to build the model, we need to specify how many items we want to recommend, for example, 5.
items_to_recommend <- 5

# We can build the matrix with the predicted ratings using the predict function:
eval_prediction <- predict(object = eval_recommender, newdata = getData(eval_sets, "known"), n = items_to_recommend, type = "ratings")

# By using the calcPredictionAccuracy, we can calculate the Root mean square error (RMSE), Mean squared error (MSE), and the Mean absolute error (MAE).

eval_accuracy <- calcPredictionAccuracy(
  x = eval_prediction, data = getData(eval_sets, "unknown"), byUser = TRUE
)

# This is a small sample of the results for the Prediction and Accuracy
head(eval_accuracy)
##                     RMSE       MSE       MAE
## A13MKSASQ6YWL7 0.7386599 0.5456185 0.4108137
## A17HMM1M7T9PJ1 0.6614378 0.4375000 0.3125000
## A1B05INWIDZ74O 0.8458838 0.7155194 0.7636865
## A1IH42TUIZ2XJL 0.6096025 0.3716152 0.2747885
## A1JMR1N9NBYJ1X 0.5641074 0.3182172 0.3191068
## A1P2XYD265YE21 1.0834083 1.1737736 0.5292990
# Now, let's take a look at the RMSE by each user
qplot(eval_accuracy[,"RMSE"]) + geom_histogram(binwidth = 0.1) +
  ggtitle("Distribution of the RMSE by user")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# However, we need to evaluate the model as a whole, so we will set the byUser to False
eval_accuracy <- calcPredictionAccuracy(
  x = eval_prediction, data = getData(eval_sets, "unknown"), byUser = FALSE
)

eval_accuracy #for IBCF
##      RMSE       MSE       MAE 
## 0.8192373 0.6711497 0.4200525

So as you can see, the IBCF (cosine as this is default) demonstrates an RMSE of 0.8936615, MSE of 0.7986309, and an MAE of 0.4430356.

Let’s take a look at confusion matrices.

# Confusion matrix
results <- evaluate(x = eval_sets, method = model_to_evaluate, n = seq(10, 100, 10))
## IBCF run fold/sample [model time/prediction time]
##   1  [3.017sec/0.015sec] 
##   2  [3.091sec/0.014sec] 
##   3  [3.08sec/0.015sec] 
##   4  [3.654sec/0.014sec]
# results object is an evaluationResults object containing the results of the evaluation.
# Each element of the list corresponds to a different split of the k-fold.
# Let's look at the first element
head(getConfusionMatrix(results)[[1]])
##      TP    FP    FN     TN  precision     recall        TPR        FPR
## 10 0.16  9.84 11.60 773.40 0.01600000 0.01203008 0.01203008 0.01256307
## 20 1.20 18.80 10.56 764.44 0.06000000 0.11229279 0.11229279 0.02400286
## 30 2.20 27.80  9.56 755.44 0.07333333 0.24583455 0.24583455 0.03549432
## 40 2.76 37.24  9.00 746.00 0.06900000 0.31753524 0.31753524 0.04754634
## 50 3.16 46.84  8.60 736.40 0.06320000 0.35597969 0.35597969 0.05980321
## 60 4.04 55.96  7.72 727.28 0.06733333 0.41740084 0.41740084 0.07144091
# In this case, look at the first four columns
# True Positives (TP): These are recommended items that have been purchased.
# False Positives (FP): These are recommended items that haven't been purchased
# False Negatives (FN): These are not recommended items that have been purchased.
# True Negatives (TN): These are not recommended items that haven't been purchased.

# If we want to take account of all the splits at the same time, we can just sum up the indices:
columns_to_sum <- c("TP", "FP", "FN", "TN")
indices_summed <- Reduce("+", getConfusionMatrix(results))[, columns_to_sum]
head(indices_summed)
##       TP     FP    FN      TN
## 10  1.48  37.72 51.40 3089.40
## 20  5.20  73.20 47.68 3053.92
## 30  7.80 109.56 45.08 3017.56
## 40 10.52 145.64 42.36 2981.48
## 50 12.60 182.36 40.28 2944.76
## 60 15.32 218.44 37.56 2908.68
# Building an ROC curve. Will need these factors
# 1. True Positive Rate (TPR): Percentage of purchased items that have been recommended. TP/(TP + FN)
# 2. False Positive Rate (FPR): Percentage of not purchased items that have been recommended. FP/(FP + TN)
plot(results, annotate = TRUE, main = "ROC curve")

# We can also look at the accuracy metrics as well
# Precision: Percentage of recommended items that have been purchased. FP/(TP + FP)
# Recall: Percentage of purchased items that have been recommended. TP/(TP + FN) = True Positive Rate

plot(results, "prec/rec", annotate = TRUE, main = "Precision-Recall")

Now that we we demonstrated on one model on how to evaluate the model, it is time to compare multiple models. The performance indices are useful to compare different models and/or parameters. Applying different techniques on the same data, we can compare a performance index to pick the most appropriate recommender.

Again, the starting point is the k-fold evaluation framework that we defined in the previous section. It is stored as eval_sets. In order to evaluate different models, we can define a list with them. We can build the following filtering:

  1. Item-based collaborative filtering, using the Cosine as the distance function
  2. Item-based collaborative filtering, using the Pearson correlation as the distance function.
  3. User-based collaborative filtering, using the Cosine as the distance function.
  4. User-based collaborative filtering, using the Pearson correlation as the distance function.
  5. Random recommendations to have a base line.
models_to_evaluate <- list(IBCF_cost = list(name = "IBCF", param = list(method = "cosine")), IBCF_cor = list(name = "IBCF", param = list(method = "pearson")), UBCF_cos = list(name = "UBCF", param = list(method = "cosine")), UBCF_cor = list(name = "UBCF", param = list(method = "pearson")), random = list(name = "RANDOM", param = NULL))

# In order to evaluate the models, we need to test them, varying the number of items.
n_recommendations <- c(1,5,seq(10,100,10))

# Now let's run and evaluate the models
list_results <- evaluate(x = eval_sets, method = models_to_evaluate, n = n_recommendations)
## IBCF run fold/sample [model time/prediction time]
##   1  [2.856sec/0.012sec] 
##   2  [2.801sec/0.016sec] 
##   3  [2.904sec/0.016sec] 
##   4  [2.969sec/0.014sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [2.932sec/0.018sec] 
##   2  [2.91sec/0.014sec] 
##   3  [2.916sec/0.016sec] 
##   4  [2.87sec/0.013sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.026sec/0.046sec] 
##   2  [0.002sec/0.054sec] 
##   3  [0.002sec/0.049sec] 
##   4  [0.001sec/0.039sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.001sec/0.038sec] 
##   2  [0.001sec/0.046sec] 
##   3  [0.001sec/0.149sec] 
##   4  [0.002sec/0.033sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.027sec] 
##   2  [0.022sec/0.028sec] 
##   3  [0.001sec/0.023sec] 
##   4  [0sec/0.024sec]
# We can extract the related average confusion matrices
avg_matrices <- lapply(list_results, avg)

# We can explore the performance evaulation. For example, IBCF with Cosine distance
head(avg_matrices$IBCF_cos[, 5:8])
##     precision      recall         TPR         FPR
## 1  0.05041667 0.003898642 0.003898642 0.001189154
## 5  0.03050000 0.013863112 0.013863112 0.006074036
## 10 0.03787500 0.032869309 0.032869309 0.012058757
## 20 0.06654167 0.110590258 0.110590258 0.023398505
## 30 0.06645139 0.173397407 0.173397407 0.035019746
## 40 0.06742014 0.237749530 0.237749530 0.046552802

Identifying the most suitable model

We can compare the models by building a chart displaying their ROC curves. We can use plot function and modify the annotate parameter.

plot(list_results, annotate = 1, legend = "topleft")
title("ROC curve")

A good performance index is the (AUC) Area under the curve, that is, the area under the ROC curve. We can notice that the highest is IBCF with Pearson correlation. We can also build a precision-recall chart as well.

plot(list_results, "prec/rec", annotate = 1, legend = "bottomright", ylim = c(0,0.4))
title("Precision-recall")

Again, IBCF with Pearson correlation is still the top model. And this is the model we should likely use when we choose with recommender system we would like to use.



[^1]:

https://ashokharnal.wordpress.com/2014/12/18/using-recommenderlab-for-predicting-ratings-for-movielens-data/ https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf

@Manual{, title = {recommenderlab: Lab for Developing and Testing Recommender Algorithms}, author = {Michael Hahsler}, year = {2014}, note = {R package version 0.1-5}, url = { http:// CRAN.R- project.org/ package = recommenderlab}, }