1 INTRODUCTION

A recommendation system is a type of machine learning algorithm that provides personalized suggestions or recommendations to users. The main goal of recommendation systems is to predict and offer items such as products, services, content, or information that a user is likely to find interesting, relevant, or useful. Recommendation systems are widely used in various domains, including e-commerce, streaming services, social media, and online content platforms, to enhance user experience and help users discover relevant products or content.

Recommendation systems contribute to business success by enhancing user satisfaction, increasing sales, fostering customer loyalty, and leveraging data to make informed decisions. As technology continues to evolve, businesses that prioritize and invest in effective recommendation systems are better positioned to thrive in competitive markets.

There are several types of recommendation systems. Some of them are:

Collaborative Filtering: User-Based Collaborative Filtering recommends items based on the preferences and behaviors of users with similar tastes. Item-Based Collaborative Filtering recommends items based on their similarity to other items that the user has shown interest in.
Content-Based Filtering: Recommends items similar to those the user has liked in the past, based on the content or features of the items.
Association Rule Mining: Identifies associations or relationships between items in a dataset and uses these associations to make recommendations.
Popular items: Focuses on recommending items based on their overall popularity or frequency of interaction with users. These methods are simple to implement and can be effective, especially in scenarios where user preferences are not well-defined or when dealing with new users.
Hybrid Methods: Combines collaborative filtering and content-based filtering to leverage the strengths of both approaches.

2 ANALYSIS

In this project we will create several models using collaborative filtering, association rules, and item popularity within the recommenderlab package¹. The recommenderlab package is a powerful tool for building recommendation systems. It offers a variety of algorithms for collaborative filtering, content-based filtering, and hybrid methods. The package includes various metrics for evaluating the performance of recommendation models, helping users choose the most effective algorithm for their specific use case.

2.1 Reading Libraries

library(recommenderlab) 
library(tidyverse) 
library(tidyquant)
library(knitr)
library(glue)
library(DT)

2.2 Importing Data

The data belongs to the bakery, The Bread Basket², located in Edinburgh, and includes online customer transactions in the time period from January 26, 2011, to December 27, 2013. The database comprises 20,507 entries, over 90,000 transactions, and 5 columns (features).

Features:

TransactionNo: unique transaction number
Items: name of purchased product
DateTime: transaction date and time
Daypart: part of the day when the transaction occurred (morning, afternoon, evening, night)
DayType: whether the transaction was made on weekends or weekdays

# Reading Data
data_bakery <- read.csv("Data/Bakery.csv")
datatable(data_bakery, caption = htmltools::tags$caption(
    style = 'caption-side: top; text-align: center;',
    'Table 1: ', htmltools::em('The Bread Basket Dataset ')))

2.3 Business Understanding

The primary goal for this Bakery is to enhance the overall customer experience and drive key business outcomes. They want to provide personalized and relevant recommendations to customers, improving their overall satisfaction with the platform or service. Encourage additional purchases by recommending products or services that align with customers’ preferences, increasing the average transaction value and overall revenue. Keep users engaged by offering tailored content or products, leading to increased time spent on the platform and a higher likelihood of repeat visits.

2.4 Data Understanding

Let’s take a look for missing values and how frequently individual items are purchased and identify popular items.

# Glimpse data
data_bakery %>% glimpse()

## Rows: 20,507
## Columns: 5
## $ TransactionNo <int> 1, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8,…
## $ Items         <chr> "Bread", "Scandinavian", "Scandinavian", "Hot chocolate"…
## $ DateTime      <chr> "2016-10-30 09:58:11", "2016-10-30 10:05:34", "2016-10-3…
## $ Daypart       <chr> "Morning", "Morning", "Morning", "Morning", "Morning", "…
## $ DayType       <chr> "Weekend", "Weekend", "Weekend", "Weekend", "Weekend", "…

# Formating date time variables
data_bakery_tbl <- data_bakery %>% 
  mutate(Date = as.Date(DateTime),
         Time = format(as.POSIXct(DateTime), format = "%H:%M:%S"),
         Daypart = as.factor(Daypart))
  
# Check for missing values
data_bakery_tbl %>%  summarise_all(~sum(is.na(.)))

# Items Freq
data_bakery_freq <- data_bakery_tbl %>% 
  count(Items) %>% 
  arrange(desc(n)) %>% 
  mutate(pct = n/sum(n),
         cum_pct = cumsum(pct)) 

data_bakery_freq

We formatted the DateTime variable into two distinct features (date and time), and confirmed that there are no missing values. The best-selling item is coffee, accounting for 27% of the total number of products sold. Along with coffee, bread and tea collectively represent 50% of the total sales. Now we will visualize the top 10 selling items.

data_bakery_freq %>% 
  top_n(10, wt = n) %>% 
  ggplot(aes(x = reorder(Items,n), y = n)) +
  geom_bar(stat = "identity", fill = "#5b0b15") +
  labs(title = "10 Top Selling Items", x= "", y = "Number of Sold Items") +
  theme_tq() +
  scale_color_tq() +
  coord_flip() +
  scale_y_continuous(n.breaks = 6)

We can see that the top 10 best-selling items include coffee, bread, tea, cake, pastry, sandwich, medialunas (Argentinian pastry), hot chocolate, cookies, and brownie.

Let’s see some other interesting facts.

# Number of orders by Daypart
data_bakery_tbl %>% 
  ggplot(aes(Daypart)) +
  geom_histogram(stat = "count", fill = "#5b0b15") +
  theme_tq() +
  labs(title = "Number of Orders by Daypart",
        y    = "",
        x    = "Daypart")

# Orders by Day of Week
data_bakery_tbl %>% 
  ggplot(aes(wday(Date,
                  week_start = getOption("lubridate.week.start", 1)))) +
  geom_histogram(stat = "count", fill = "#5b0b15") +
  scale_x_continuous(breaks = c(1,2,3,4,5,6,7),
                     labels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")) + theme_tq() +
  labs(title = "Orders by Day Week",
       y     = "",
       x     = "Day Week")

# Orders by Day Hours
data_bakery_tbl %>% 
  ggplot(aes(hour(hms(Time)))) +
  geom_histogram(stat = "count", fill = "#5b0b15") +
  theme_tq() +
  labs(title = "Orders by Day Hours",
       y     = "",
       x     = "Day Hours")

The highest number of orders occurs in the afternoon and morning during the weekends, with the peak on Saturdays between 10:00 AM and 2:00 PM.

2.5 Data Preparation

We will format the data into a 2x2 matrix where orders are in rows and items in columns. This format is called a user-item matrix because users (buyers or orders) are in rows, and items (products) are in columns. Then, we convert the matrix into a “rating matrix” (binaryRatingMatrix). This type of matrix is required for analysis and we don’t need to normalize the data.

Before creating the matrix, we will check if the orders have the same product more than once.

# Filtering orders
data_bakery_tbl %>% 
  filter(TransactionNo == 2 & Items == "Scandinavian") %>% 
  select(TransactionNo, Items)

Some orders have the same items multiple times. We will remove duplicates because we are only interested in whether a particular item was ordered or not.

# Creating a unique identifier for identical items in the same order
data_bakery_tbl <- data_bakery_tbl %>% 
# Removing duplicates and identifier
  mutate(TranNo_Item = paste(TransactionNo, Items, sep = ' ')) %>% 
  distinct(TranNo_Item, .keep_all = TRUE) %>% 
  select(-TranNo_Item)

Now we can create the user-item matrix by selecting the order number and items, format it into a binary rating matrix, and remove the transaction number that is no longer needed.

ratings_matrix <- data_bakery_tbl %>% 
  select(TransactionNo, Items) %>% 
  mutate(value = 1) %>% 
  spread(Items, value, fill = 0) %>% 
  select(-TransactionNo) %>% 
  as.matrix() %>% 
  as("binaryRatingMatrix")

2.6 Data Modeling

2.6.1 Training/test split

With recommenderlab³ package, users can seamlessly evaluate the performance of different recommendation models, allowing for informed decisions on algorithm selection. To establish the model’s effectiveness, we will split the data into training and testing sets. Setting the train parameter to 0.8 specifies the proportion of the data to be used for training. In this case 80% of the data will be used for training in each iteration of cross-validation and 20% for testing. We specify the evaluation method, and in this case, it’s set to “cross” indicating cross-validation. Cross-validation is a technique where the dataset is divided into k subsets (folds), and the model is trained and evaluated k times, each time using a different subset for testing and the remaining data for training. The number of folds in the cross-validation process is set to 5 meaning that the data will be divided into 5 subsets, and the evaluation will be performed 5 times, each time using a different subset for testing. The default setting given = -1 is related to how recommendations are evaluated. A value of -1 means that all items except one will be used for learning, and the remaining item will be used for evaluation. This setup is common in leave-one-out cross-validation, where one interaction is held out for testing in each iteration.

eval_scheme <- ratings_matrix %>% 
  evaluationScheme(method = "cross",
                   train  = 0.8,
                   k      = 5,
                   given  = -1)
eval_scheme

## Evaluation scheme using all-but-1 items
## Method: 'cross-validation' with 5 run(s).
## Good ratings: NA
## Data set: 5517 x 94 rating matrix of class 'binaryRatingMatrix' with 14939 ratings.

list_algorithms <- list(
  "random items"      = list(name = "RANDOM",
                        param = NULL),
  "popular items"     = list(name = "POPULAR",
                        param = NULL),
  "user-based CF"     = list(name = "UBCF",
                        param = list(method = "Cosine", nn = 500)),
  "item-based CF"     = list(name = "IBCF",
                        param = list(k = 5)),
  "association rules" = list(name = "AR",
                        param = list(supp = 0.01, conf = 0.01))
)

2.6.2 Algorithm Evaluation

The process of evaluating algorithms is crucial to understand their performance and effectiveness. We will analyze algorithms using the evaluate() specifying type = "topNList" to assess the Top N List of recommended products, and n = 1:10 to evaluate accuracy for recommendations ranging from 1 to 10.

results <- recommenderlab::evaluate(
  eval_scheme,
  list_algorithms,
  type  = "topNList",
  n     = 1:10)

## RANDOM run fold/sample [model time/prediction time]
##   1  [0.004sec/0.084sec] 
##   2  [0sec/0.078sec] 
##   3  [0sec/0.066sec] 
##   4  [0sec/0.063sec] 
##   5  [0sec/0.063sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  [0.001sec/0.131sec] 
##   2  [0sec/0.131sec] 
##   3  [0sec/0.131sec] 
##   4  [0.001sec/0.134sec] 
##   5  [0.001sec/0.145sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/2.548sec] 
##   2  [0sec/2.555sec] 
##   3  [0sec/2.348sec] 
##   4  [0sec/2.452sec] 
##   5  [0sec/2.492sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.047sec/0.045sec] 
##   2  [0.043sec/0.043sec] 
##   3  [0.043sec/0.045sec] 
##   4  [0.043sec/0.042sec] 
##   5  [0.043sec/0.043sec] 
## AR run fold/sample [model time/prediction time]
##   1  [0.028sec/4.299sec] 
##   2  [0.01sec/4.444sec] 
##   3  [0.007sec/4.487sec] 
##   4  [0.006sec/4.462sec] 
##   5  [0.006sec/4.495sec]

2.6.3 Model Evaluation

The result is a list containing 5 evaluations. Each model can be explored using the getConfusionMatrix() function, which displays a list with the matrix. Below is an example for the “Random Items” model.

results

## List of evaluation results for 5 recommenders:
## 
## $`random items`
## Evaluation results for 5 folds/samples using method 'RANDOM'.
## 
## $`popular items`
## Evaluation results for 5 folds/samples using method 'POPULAR'.
## 
## $`user-based CF`
## Evaluation results for 5 folds/samples using method 'UBCF'.
## 
## $`item-based CF`
## Evaluation results for 5 folds/samples using method 'IBCF'.
## 
## $`association rules`
## Evaluation results for 5 folds/samples using method 'AR'.

cf_matrix_model <-results$`random items` %>% 
  getConfusionMatrix() %>% 
  as.list()

# Calculating the average value of 5-fold cross-validation and selecting columns
as.data.frame(Reduce("+", cf_matrix_model) / length(cf_matrix_model)) %>% 
  select("n", "precision", "recall", "TPR", "FPR")

We will transform the previous steps into a function and apply it to all elements in the list. Then we use the map() function to iterate the function through all models, and enframe(), and unnest() to get the results in one level for model assessment.

# Function
avg_confusion_matrix <- function(results) {
    cf_matrix_model <- results %>%
        getConfusionMatrix() %>%
         as.list()
as.data.frame(Reduce("+", cf_matrix_model) / length(cf_matrix_model)) %>% 
  select("n", "precision", "recall", "TPR", "FPR")
}

# Iteration through all models
results_tbl <- results %>% 
  map(avg_confusion_matrix) %>% 
  enframe() %>% 
  unnest()

results_tbl

2.6.4 Visualizing Model Performance

To assess model performance, ROC (Receiver Operating Characteristic) and PR (Precision-Recall) curves are often utilized. ROC curve plots the true positive rate (TPR) against the false positive rate (FPR), providing insights into a model’s ability to discriminate between classes. PR curves, on the other hand, illustrate the trade-off between precision and recall, particularly important when dealing with imbalanced datasets.

TPR is the ratio of correctly predicted positive observations to the total actual positives (TPR = True Positives / (True Positives + False Negatives)) plotted on the y-axis. FPR is the ratio of incorrectly predicted negative observations to the total actual negatives (FPR = False Positives / (False Positives + True Negatives)) plotted on the x-axis. The AUC represents the area under the ROC curve and serves as a single value summarizing the model’s overall performance. A higher AUC indicates better performance.

results_tbl %>% 
  ggplot(aes(FPR, TPR,
             colour = fct_reorder2(as.factor(name), FPR, TPR))) +
  geom_line() +
  geom_label(aes(label = n)) +
  theme_tq() +
  scale_color_tq() + 
  theme(legend.position = "right",
        legend.direction = "vertical") +
  labs(title = "ROC Curve",
       subtitle = "The Best Model: Popular Items",
       color = "Model")

The popular items model has proven to be the best as it achieves the highest TPR for any level of FPR, meaning the model generates the highest number of relevant recommendations (TPR) for the same level of irrelevant recommendations (FPR).

Precision shows how sensitive models are to false positives (i.e., the model suggests items that are unlikely to be purchased), while recall shows how sensitive models are to false negatives (i.e., the model does not suggest an item that is highly likely to be purchased). Ultimately, the goal is to accurately predict items that are very likely to be purchased, as this would have a positive impact on sales and revenue. In other words, we want to increase recall (identifying more relevant items) while maintaining a certain level of precision (minimizing false positives).

results_tbl %>% 
  ggplot(aes(recall, precision,
             color = fct_reorder2(as.factor(name), recall, precision))) +
  geom_line() +
  geom_label(aes(label = n)) +
  theme_tq() +
  scale_color_tq() +
  theme(legend.position = "right",
        legend.direction = "vertical") +
  labs( title = "Precision Vs Recall",
        subtitle = "The Best Model: Popular Items (Popularne stavke)",
        color = "Model")

The precision-recall curve indicates that the model popular items is the best because it minimizes false negatives across all levels of false positive recommendations.

2.7 Generating Predictions

After determining which model performs the best, we can proceed to test predictions, and for that, we need a new hypothetical order. We will create an order containing jam, mineral water, and bread, format it appropriately for recommenderlab, and pass our hypothetical order to the prediction function.

First, we format the recommender with the best model settings and create a new order.

# Recommender
train_recLab <- getData(eval_scheme, "train")

fit_PopularItm<- recommenderlab::Recommender(
  train_recLab,
  method = "POPULAR",
  param = NULL)

fit_PopularItm

## Recommender of type 'POPULAR' for 'binaryRatingMatrix' 
## learned using 4412 users.

# Hypothetical Order
new_order <- c("Jam", "Mineral Water", "Bread")

Before making predictions, we will convert the order into a binary rating matrix with column names corresponding to the training dataset and using 1 and 0 to indicate the presence or absence of an item in the order.

# Extracting column names from the training dataset
items_names <- train_recLab@data %>% colnames()

# Building new order matrix
new_order_matrix <- tibble(
  item = items_names) %>% 
  mutate(value = as.numeric(item %in%new_order)) %>% 
  spread(key = item, value = value) %>% 
  as.matrix() %>% 
  as("binaryRatingMatrix")

Now we can use the predict() function, entering the recommender, the new order, and the number of predictions we want to make.

prediction <- predict(fit_PopularItm, 
                newdata = new_order_matrix, 
                n       = 5)

as(prediction, "list")

## $`0`
## [1] "Coffee"   "Tea"      "Cake"     "Pastry"   "Sandwich"

The prediction produces a list of the top 5 recommended products, including coffee, tea, cake, pastry, and sandwich.

3 Evaluation

The analysis revealed that the Popular Items model outperformed others, showcasing its ability to minimize false negatives for various levels of false positive recommendations. This implies that the model excels in accurately suggesting items that are likely to be purchased, aligning with the ultimate goal of boosting sales and revenue. In the subsequent prediction phase, the recommender system demonstrated its practical application by generating recommendations for a hypothetical order. The predicted products align with the system’s learned patterns, reflecting its capability to provide relevant and valuable suggestions.

In summary, recommender systems play a crucial role in creating a personalized, efficient, and user-friendly environment for customers. By understanding and anticipating user needs, businesses can foster stronger connections, drive sales, and position themselves as leaders in their respective markets.

Building a Recommendation System

Marijana Andabaka, marijana@andalytics.com

2023-04-03