The Bread Basket

Description

This report describe data transaction in The Bread Basket Cafe, located in Edinburgh. The dataset has 20507 entries and over 9000 transactions. The dataset has transactions of customers who ordered different items from this bakery online and the time period of the data is 26-01-11 to 27-12-03. The dataset used in this report is The Bread Basket hosted in kaggle.

DataSet -> (https://www.kaggle.com/mittalvasu95/the-bread-basket)

Report Outline
1. Data Extraction
2. Exploratory Data Analysis
3. Modelling
4. Recommendation

1. Data Extraction

The Dataset is downlaoded from kaggle and saved in the data folder. we used read_excel function to read the dataset and put in bread data frame.

library(readxl)
bread_df <- read_excel("data/bread-basket.xlsx")

To see the number of rows and column from data frame, we used dim fuction. The dataset has 20507 rows and 5 columns.

dim(bread_df)

## [1] 20507     5

2. Exploratory Data Analysis

To find out the column names and types, we used str() function.
To find out the the first 6 transactions, we used head() function.
To find out the sumamry from this transactions, we used **summary() function.

str(bread_df)

## tibble [20,507 x 5] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:20507] 1 2 2 3 3 3 4 5 5 5 ...
##  $ Item           : chr [1:20507] "Bread" "Scandinavian" "Scandinavian" "Hot chocolate" ...
##  $ date_time      : chr [1:20507] "30-10-2016 09:58" "30-10-2016 10:05" "30-10-2016 10:05" "30-10-2016 10:07" ...
##  $ period_day     : chr [1:20507] "morning" "morning" "morning" "morning" ...
##  $ weekday_weekend: chr [1:20507] "weekend" "weekend" "weekend" "weekend" ...

head(bread_df)

## # A tibble: 6 x 5
##   Transaction Item          date_time        period_day weekday_weekend
##         <dbl> <chr>         <chr>            <chr>      <chr>          
## 1           1 Bread         30-10-2016 09:58 morning    weekend        
## 2           2 Scandinavian  30-10-2016 10:05 morning    weekend        
## 3           2 Scandinavian  30-10-2016 10:05 morning    weekend        
## 4           3 Hot chocolate 30-10-2016 10:07 morning    weekend        
## 5           3 Jam           30-10-2016 10:07 morning    weekend        
## 6           3 Cookies       30-10-2016 10:07 morning    weekend

summary(bread_df)

##   Transaction       Item            date_time          period_day       
##  Min.   :   1   Length:20507       Length:20507       Length:20507      
##  1st Qu.:2552   Class :character   Class :character   Class :character  
##  Median :5137   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :4976                                                           
##  3rd Qu.:7357                                                           
##  Max.   :9684                                                           
##  weekday_weekend   
##  Length:20507      
##  Class :character  
##  Mode  :character  
##                    
##                    
##

From the result above, we know the following:
1. The second column is Item. this should be a class variable, Currently the type is char and it should be converted to factor.
2. The third column is date_time. this should be a class variable, Currently the type is char and it should be converted to Date.

bread_df$Item <- as.factor(bread_df$Item)
bread_df$Date <- as.Date(bread_df$date_time)

2.1 Univariate Data Analysis

To find out a diagram of the number of weekday and weekend transactions, we can used **ggplot()

library(ggplot2)
ggplot(data=bread_df, aes(x = weekday_weekend)) +
  geom_bar(color = "blue")

2.2 Bivariate Data Analysis

Analysis of two variables, Distribution of transactions on weekday_weekend variable based on period day.

ggplot(bread_df, aes(x=weekday_weekend, fill = period_day)) +
  geom_bar(position = "dodge")

3. Data Processing

Change to transaction based data.

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v tibble  3.0.5     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## v purrr   0.3.4

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(knitr)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(arules)

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## 
## Attaching package: 'arules'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)

## Loading required package: grid

library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:purrr':
## 
##     compact

bread_df <- bread_df[complete.cases(bread_df),]
bread_df$Transaction <- as.numeric(bread_df$Transaction)
bread_df_sorted <- bread_df[order(bread_df$Transaction), ]


itemlist <- ddply(bread_df, c("Transaction"),
                  function(df1)paste(df1$Item, collapse = ","))

itemlist$Transaction <- NULL
colnames(itemlist) <- c("items")

# Write to csv file
write.csv(itemlist, "data/bread basket.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
tr_df <- read.transactions("data/bread basket.csv",
                           format = "basket",
                           sep = ",")

We can see the summary from this transaction and top 10 product in The Bread Basket Cafe.

summary(tr_df)

## transactions as itemMatrix in sparse format with
##  9466 rows (elements/itemsets/transactions) and
##  9570 columns (items) and a density of 0.0003127952 
## 
## most frequent items:
##  Coffee   Bread     Tea    Cake  Pastry (Other) 
##    4526    3094    1349     983     814   17570 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11 
##    1 3954 3058 1469  662  231   64   17    4    5    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.993   4.000  11.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

itemFrequencyPlot(tr_df, topN = 10)

4. Modelling

we generating rules using apriori and set the value for minimim support and minimum confident. We set a different value for each period and time in order to know what recommendations can be given to the company. We can also plot using method grouped, graph, and **paracoord.

4.1 All Transactions

In All transactions, we set min. Support in 0.001 and min. Confident in 0.8

rules <- apriori(tr_df, parameter = list(supp = 0.001,
                                         conf = 0.8, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[9570 item(s), 9466 transaction(s)] done [0.04s].
## sorting and recoding items ... [55 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules <- sort(rules, by="confidence", decreasing = TRUE)
rules

## set of 7 rules

inspect(rules)

##     lhs                             rhs      support     confidence coverage   
## [1] {Extra Salami or Feta,Salad} => {Coffee} 0.001478977 0.8750000  0.001690260
## [2] {Pastry,Toast}               => {Coffee} 0.001373336 0.8666667  0.001584619
## [3] {Hearty & Seasonal,Sandwich} => {Coffee} 0.001267695 0.8571429  0.001478977
## [4] {Cake,Vegan mincepie}        => {Coffee} 0.001056412 0.8333333  0.001267695
## [5] {Salad,Sandwich}             => {Coffee} 0.001584619 0.8333333  0.001901542
## [6] {Extra Salami or Feta}       => {Coffee} 0.003274879 0.8157895  0.004014367
## [7] {Keeping It Local}           => {Coffee} 0.005387703 0.8095238  0.006655398
##     lift     count
## [1] 1.830038 14   
## [2] 1.812609 13   
## [3] 1.792690 12   
## [4] 1.742893 10   
## [5] 1.742893 15   
## [6] 1.706200 31   
## [7] 1.693096 51

plot(rules)

plot(rules,method = "grouped")

plot(rules,method = "graph")

plot(rules,method = "paracoord")

As we can see, there are 7 rules and coffee is the best selling product at the cafe, followed by bread and tea in the all transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with product keeping it local and extra salami or feta.

4.2 Weekday

we can separate weekday and weekend transactions into differents data frame and the we can generate the rules. In weekday, we set min. Support in 0.001 and min. Confident in 0.8

bread_weekday_df <- bread_df %>%
  filter(weekday_weekend=="weekday")

str(bread_weekday_df)

## tibble [12,807 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:12807] 81 81 82 82 83 83 84 85 85 85 ...
##  $ Item           : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 24 16 83 12 24 12 12 24 24 66 ...
##  $ date_time      : chr [1:12807] "31-10-2016 08:28" "31-10-2016 08:28" "31-10-2016 08:47" "31-10-2016 08:47" ...
##  $ period_day     : chr [1:12807] "morning" "morning" "morning" "morning" ...
##  $ weekday_weekend: chr [1:12807] "weekday" "weekday" "weekday" "weekday" ...
##  $ Date           : Date[1:12807], format: "0031-10-20" "0031-10-20" ...

head(bread_weekday_df)

## # A tibble: 6 x 6
##   Transaction Item    date_time        period_day weekday_weekend Date      
##         <dbl> <fct>   <chr>            <chr>      <chr>           <date>    
## 1          81 Coffee  31-10-2016 08:28 morning    weekday         0031-10-20
## 2          81 Cake    31-10-2016 08:28 morning    weekday         0031-10-20
## 3          82 Tartine 31-10-2016 08:47 morning    weekday         0031-10-20
## 4          82 Bread   31-10-2016 08:47 morning    weekday         0031-10-20
## 5          83 Coffee  31-10-2016 08:57 morning    weekday         0031-10-20
## 6          83 Bread   31-10-2016 08:57 morning    weekday         0031-10-20

summary(bread_weekday_df)

##   Transaction         Item       date_time          period_day       
##  Min.   :  81   Coffee  :3543   Length:12807       Length:12807      
##  1st Qu.:2446   Bread   :2092   Class :character   Class :character  
##  Median :4929   Tea     : 976   Mode  :character   Mode  :character  
##  Mean   :4888   Cake    : 612                                        
##  3rd Qu.:7322   Pastry  : 566                                        
##  Max.   :9550   Sandwich: 512                                        
##                 (Other) :4506                                        
##  weekday_weekend         Date           
##  Length:12807       Min.   :0001-02-20  
##  Class :character   1st Qu.:0007-11-20  
##  Mode  :character   Median :0015-11-20  
##                     Mean   :0015-08-24  
##                     3rd Qu.:0023-02-20  
##                     Max.   :0031-10-20  
##

itemlist_weekday <- ddply(bread_weekday_df, c("Transaction"),
                  function(df1)paste(df1$Item, collapse = ","))

itemlist_weekday$Transaction <- NULL
colnames(itemlist_weekday) <- c("items")

# Write to csv file
write.csv(itemlist_weekday, "data/bread basket weekday.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
bread_weekday_tr_df <- read.transactions("data/bread basket weekday.csv",
                           format = "basket",
                           sep = ",")

bread_weekday_tr_df

## transactions in sparse format with
##  6146 transactions (rows) and
##  6239 items (columns)

summary(bread_weekday_tr_df)

## transactions as itemMatrix in sparse format with
##  6146 rows (elements/itemsets/transactions) and
##  6239 columns (items) and a density of 0.000468537 
## 
## most frequent items:
##  Coffee   Bread     Tea    Cake  Pastry (Other) 
##    2982    1951     917     589     537   10990 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11 
##    1 2651 2087  860  388  115   31    7    2    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.923   3.000  11.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

itemFrequencyPlot(bread_weekday_tr_df, topN = 10,)

rules_weekday <- apriori(bread_weekday_tr_df, parameter = list(supp = 0.001,
                                          conf = 0.8, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 6 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6239 item(s), 6146 transaction(s)] done [0.02s].
## sorting and recoding items ... [49 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [9 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_weekday <- sort(rules_weekday, by="confidence", decreasing = TRUE)
rules_weekday

## set of 9 rules

inspect(rules_weekday)

##     lhs                             rhs      support     confidence coverage   
## [1] {Cake,Vegan mincepie}        => {Coffee} 0.001301660 1.0000000  0.001301660
## [2] {Cake,Salad}                 => {Coffee} 0.001138952 1.0000000  0.001138952
## [3] {Hearty & Seasonal,Sandwich} => {Coffee} 0.001464367 0.9000000  0.001627075
## [4] {Pastry,Toast}               => {Coffee} 0.001464367 0.9000000  0.001627075
## [5] {Extra Salami or Feta,Salad} => {Coffee} 0.001301660 0.8888889  0.001464367
## [6] {Salad,Sandwich}             => {Coffee} 0.001301660 0.8888889  0.001464367
## [7] {Cake,Toast}                 => {Coffee} 0.001789782 0.8461538  0.002115197
## [8] {Extra Salami or Feta}       => {Coffee} 0.002440612 0.8333333  0.002928734
## [9] {Muffin,Sandwich}            => {Coffee} 0.001301660 0.8000000  0.001627075
##     lift     count
## [1] 2.061033  8   
## [2] 2.061033  7   
## [3] 1.854930  9   
## [4] 1.854930  9   
## [5] 1.832029  8   
## [6] 1.832029  8   
## [7] 1.743951 11   
## [8] 1.717527 15   
## [9] 1.648826  8

plot(rules_weekday)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(rules_weekday,method = "grouped")

plot(rules_weekday,method = "graph")

plot(rules_weekday,method = "paracoord")

As we can see, there are 9 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekday transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with product extra salami or feta and cake or toast.

4.3 Weekend

In weekend transactions, we set min. Support in 0.002 and min. Confident in 0.8

bread_weekend_df <- bread_df %>%
  filter(weekday_weekend=="weekend")

str(bread_weekend_df)

## tibble [7,700 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:7700] 1 2 2 3 3 3 4 5 5 5 ...
##  $ Item           : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 12 75 75 49 50 27 61 24 66 12 ...
##  $ date_time      : chr [1:7700] "30-10-2016 09:58" "30-10-2016 10:05" "30-10-2016 10:05" "30-10-2016 10:07" ...
##  $ period_day     : chr [1:7700] "morning" "morning" "morning" "morning" ...
##  $ weekday_weekend: chr [1:7700] "weekend" "weekend" "weekend" "weekend" ...
##  $ Date           : Date[1:7700], format: "0030-10-20" "0030-10-20" ...

head(bread_weekend_df)

## # A tibble: 6 x 6
##   Transaction Item         date_time       period_day weekday_weekend Date      
##         <dbl> <fct>        <chr>           <chr>      <chr>           <date>    
## 1           1 Bread        30-10-2016 09:~ morning    weekend         0030-10-20
## 2           2 Scandinavian 30-10-2016 10:~ morning    weekend         0030-10-20
## 3           2 Scandinavian 30-10-2016 10:~ morning    weekend         0030-10-20
## 4           3 Hot chocola~ 30-10-2016 10:~ morning    weekend         0030-10-20
## 5           3 Jam          30-10-2016 10:~ morning    weekend         0030-10-20
## 6           3 Cookies      30-10-2016 10:~ morning    weekend         0030-10-20

summary(bread_weekend_df)

##   Transaction          Item       date_time          period_day       
##  Min.   :   1   Coffee   :1928   Length:7700        Length:7700       
##  1st Qu.:2578   Bread    :1233   Class :character   Class :character  
##  Median :5490   Tea      : 459   Mode  :character   Mode  :character  
##  Mean   :5123   Cake     : 413                                        
##  3rd Qu.:7545   Pastry   : 290                                        
##  Max.   :9684   Medialuna: 277                                        
##                 (Other)  :3100                                        
##  weekday_weekend         Date           
##  Length:7700        Min.   :0001-01-20  
##  Class :character   1st Qu.:0007-01-20  
##  Mode  :character   Median :0014-01-20  
##                     Mean   :0015-05-30  
##                     3rd Qu.:0022-01-20  
##                     Max.   :0031-12-20  
##

itemlist_weekend <- ddply(bread_weekend_df, c("Transaction"),
                          function(df1)paste(df1$Item, collapse = ","))

itemlist_weekend$Transaction <- NULL
colnames(itemlist_weekend) <- c("items")

# Write to csv file
write.csv(itemlist_weekend, "data/bread basket weekend.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
bread_weekend_tr_df <- read.transactions("data/bread basket weekend.csv",
                                      format = "basket",
                                      sep = ",")

bread_weekend_tr_df

## transactions in sparse format with
##  3321 transactions (rows) and
##  3407 items (columns)

summary(bread_weekend_tr_df)

## transactions as itemMatrix in sparse format with
##  3321 rows (elements/itemsets/transactions) and
##  3407 columns (items) and a density of 0.0009165995 
## 
## most frequent items:
##  Coffee   Bread     Tea    Cake  Pastry (Other) 
##    1544    1143     432     394     277    6581 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10 
##    1 1303  971  609  274  116   33   10    2    2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.123   4.000  10.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

itemFrequencyPlot(bread_weekend_tr_df, topN = 10)

rules_weekend <- apriori(bread_weekend_tr_df, parameter = list(supp = 0.002,
                                                            conf = 0.8, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.002      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 6 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3407 item(s), 3321 transaction(s)] done [0.01s].
## sorting and recoding items ... [51 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_weekend <- sort(rules_weekend, by="confidence", decreasing = TRUE)
rules_weekend

## set of 6 rules

inspect(rules_weekend)

##     lhs                       rhs      support     confidence coverage   
## [1] {Juice,Toast}          => {Coffee} 0.003011141 0.9090909  0.003312255
## [2] {Keeping It Local}     => {Coffee} 0.004516712 0.8823529  0.005118940
## [3] {Juice,Spanish Brunch} => {Coffee} 0.004215598 0.8235294  0.005118940
## [4] {Brownie,Medialuna}    => {Coffee} 0.002710027 0.8181818  0.003312255
## [5] {Extra Salami or Feta} => {Coffee} 0.004817826 0.8000000  0.006022282
## [6] {Muffin,Sandwich}      => {Bread}  0.002408913 0.8000000  0.003011141
##     lift     count
## [1] 1.955370 10   
## [2] 1.897859 15   
## [3] 1.771335 14   
## [4] 1.759833  9   
## [5] 1.720725 16   
## [6] 2.324409  8

plot(rules_weekend)

plot(rules_weekend,method = "grouped")

plot(rules_weekend,method = "graph")

plot(rules_weekend,method = "paracoord")

As we can see, there are 6 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekend transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with product extra salami or feta and juice or spanish brunch.

4.2.1 Weekday Morning

In weekday morning, we set min. Support in 0.002 and min. Confident in 0.7

bread_morning <- bread_df
bread_morning$period_day <- as.factor(bread_morning$period_day)

bread_weekday_morning_df <- bread_weekday_df %>%
  filter(period_day=="morning")

str(bread_weekday_morning_df)

## tibble [5,174 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:5174] 81 81 82 82 83 83 84 85 85 85 ...
##  $ Item           : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 24 16 83 12 24 12 12 24 24 66 ...
##  $ date_time      : chr [1:5174] "31-10-2016 08:28" "31-10-2016 08:28" "31-10-2016 08:47" "31-10-2016 08:47" ...
##  $ period_day     : chr [1:5174] "morning" "morning" "morning" "morning" ...
##  $ weekday_weekend: chr [1:5174] "weekday" "weekday" "weekday" "weekday" ...
##  $ Date           : Date[1:5174], format: "0031-10-20" "0031-10-20" ...

head(bread_weekday_morning_df)

## # A tibble: 6 x 6
##   Transaction Item    date_time        period_day weekday_weekend Date      
##         <dbl> <fct>   <chr>            <chr>      <chr>           <date>    
## 1          81 Coffee  31-10-2016 08:28 morning    weekday         0031-10-20
## 2          81 Cake    31-10-2016 08:28 morning    weekday         0031-10-20
## 3          82 Tartine 31-10-2016 08:47 morning    weekday         0031-10-20
## 4          82 Bread   31-10-2016 08:47 morning    weekday         0031-10-20
## 5          83 Coffee  31-10-2016 08:57 morning    weekday         0031-10-20
## 6          83 Bread   31-10-2016 08:57 morning    weekday         0031-10-20

summary(bread_weekday_morning_df)

##   Transaction          Item       date_time          period_day       
##  Min.   :  81   Coffee   :1679   Length:5174        Length:5174       
##  1st Qu.:2316   Bread    : 987   Class :character   Class :character  
##  Median :4538   Pastry   : 391   Mode  :character   Mode  :character  
##  Mean   :4688   Tea      : 328                                        
##  3rd Qu.:7285   Medialuna: 207                                        
##  Max.   :9522   Cake     : 152                                        
##                 (Other)  :1430                                        
##  weekday_weekend         Date           
##  Length:5174        Min.   :0001-02-20  
##  Class :character   1st Qu.:0007-04-20  
##  Mode  :character   Median :0015-02-20  
##                     Mean   :0015-07-11  
##                     3rd Qu.:0023-03-20  
##                     Max.   :0031-10-20  
##

itemlist_weekday_morning <- ddply(bread_weekday_morning_df, c("Transaction"),
                          function(df1)paste(df1$Item, collapse = ","))

itemlist_weekday_morning$Transaction <- NULL
colnames(itemlist_weekday_morning) <- c("items")

# Write to csv file
write.csv(itemlist_weekday_morning, "data/bread basket weekday morning.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
bread_weekday_morning_tr_df <- read.transactions("data/bread basket weekday morning.csv",
                                         format = "basket",
                                         sep = ",")

bread_weekday_morning_tr_df

## transactions in sparse format with
##  2649 transactions (rows) and
##  2718 items (columns)

summary(bread_weekday_morning_tr_df)

## transactions as itemMatrix in sparse format with
##  2649 rows (elements/itemsets/transactions) and
##  2718 columns (items) and a density of 0.001026669 
## 
## most frequent items:
##    Coffee     Bread    Pastry       Tea Medialuna   (Other) 
##      1400       917       371       318       196      4190 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8 
##    1 1246  896  356  121   22    6    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00    3.00    2.79    3.00    8.00 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

itemFrequencyPlot(bread_weekday_morning_tr_df, topN = 10)

rules_weekday_morning <- apriori(bread_weekday_morning_tr_df, parameter = list(supp = 0.002,
                                                               conf = 0.7, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5   0.002      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[2718 item(s), 2649 transaction(s)] done [0.01s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_weekday_morning <- sort(rules_weekday_morning, by="confidence", decreasing = TRUE)
rules_weekday_morning

## set of 7 rules

inspect(rules_weekday_morning)

##     lhs                   rhs      support     confidence coverage    lift    
## [1] {Pastry,Toast}     => {Coffee} 0.003020008 1.0000000  0.003020008 1.892143
## [2] {Cookies,Toast}    => {Coffee} 0.002642507 0.8750000  0.003020008 1.655625
## [3] {Juice,Pastry}     => {Coffee} 0.004152510 0.7857143  0.005285013 1.486684
## [4] {Medialuna,Toast}  => {Coffee} 0.002265006 0.7500000  0.003020008 1.419107
## [5] {Keeping It Local} => {Coffee} 0.010570026 0.7368421  0.014345036 1.394211
## [6] {Toast}            => {Coffee} 0.040392601 0.7181208  0.056247641 1.358787
## [7] {Cookies,Pastry}   => {Coffee} 0.003775009 0.7142857  0.005285013 1.351531
##     count
## [1]   8  
## [2]   7  
## [3]  11  
## [4]   6  
## [5]  28  
## [6] 107  
## [7]  10

plot(rules_weekday_morning)

plot(rules_weekday_morning,method = "grouped")

plot(rules_weekday_morning,method = "graph")

plot(rules_weekday_morning,method = "paracoord")

As we can see, there are 7 rules and coffee still is the best selling product at the cafe, followed by bread and pastry in the weekday morning transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with toast and keeping it local.

4.2.1 Weekday Afternoon

In weekday morning, we set min. Support in 0.002 and min. Confident in 0.7

bread_afternoon <- bread_df
bread_afternoon$period_day <- as.factor(bread_afternoon$period_day)

bread_weekday_afternoon_df <- bread_weekday_df %>%
  filter(period_day=="afternoon")

str(bread_weekday_afternoon_df)

## tibble [7,273 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:7273] 130 130 130 131 131 132 132 132 132 133 ...
##  $ Item           : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 24 24 61 12 24 84 52 67 27 47 ...
##  $ date_time      : chr [1:7273] "31-10-2016 12:03" "31-10-2016 12:03" "31-10-2016 12:03" "31-10-2016 12:08" ...
##  $ period_day     : chr [1:7273] "afternoon" "afternoon" "afternoon" "afternoon" ...
##  $ weekday_weekend: chr [1:7273] "weekday" "weekday" "weekday" "weekday" ...
##  $ Date           : Date[1:7273], format: "0031-10-20" "0031-10-20" ...

head(bread_weekday_afternoon_df)

## # A tibble: 6 x 6
##   Transaction Item   date_time        period_day weekday_weekend Date      
##         <dbl> <fct>  <chr>            <chr>      <chr>           <date>    
## 1         130 Coffee 31-10-2016 12:03 afternoon  weekday         0031-10-20
## 2         130 Coffee 31-10-2016 12:03 afternoon  weekday         0031-10-20
## 3         130 Muffin 31-10-2016 12:03 afternoon  weekday         0031-10-20
## 4         131 Bread  31-10-2016 12:08 afternoon  weekday         0031-10-20
## 5         131 Coffee 31-10-2016 12:08 afternoon  weekday         0031-10-20
## 6         132 Tea    31-10-2016 12:14 afternoon  weekday         0031-10-20

summary(bread_weekday_afternoon_df)

##   Transaction         Item       date_time          period_day       
##  Min.   : 130   Coffee  :1798   Length:7273        Length:7273       
##  1st Qu.:2739   Bread   :1060   Class :character   Class :character  
##  Median :5265   Tea     : 608   Mode  :character   Mode  :character  
##  Mean   :5049   Sandwich: 442                                        
##  3rd Qu.:7355   Cake    : 437                                        
##  Max.   :9549   Soup    : 256                                        
##                 (Other) :2672                                        
##  weekday_weekend         Date           
##  Length:7273        Min.   :0001-02-20  
##  Class :character   1st Qu.:0007-11-20  
##  Mode  :character   Median :0015-12-20  
##                     Mean   :0015-10-14  
##                     3rd Qu.:0023-02-20  
##                     Max.   :0031-10-20  
##

itemlist_weekday_afternoon <- ddply(bread_weekday_afternoon_df, c("Transaction"),
                                  function(df1)paste(df1$Item, collapse = ","))

itemlist_weekday_afternoon$Transaction <- NULL
colnames(itemlist_weekday_afternoon) <- c("items")

# Write to csv file
write.csv(itemlist_weekday_afternoon, "data/bread basket weekday afternoon.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
bread_weekday_afternoon_tr_df <- read.transactions("data/bread basket weekday afternoon.csv",
                                                 format = "basket",
                                                 sep = ",")

bread_weekday_afternoon_tr_df

## transactions in sparse format with
##  3326 transactions (rows) and
##  3409 items (columns)

summary(bread_weekday_afternoon_tr_df)

## transactions as itemMatrix in sparse format with
##  3326 rows (elements/itemsets/transactions) and
##  3409 columns (items) and a density of 0.0008876084 
## 
## most frequent items:
##   Coffee    Bread      Tea     Cake Sandwich  (Other) 
##     1523      991      561      417      399     6173 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10 
##    1 1322 1143  484  255   87   23    6    2    3 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.026   4.000  10.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

itemFrequencyPlot(bread_weekday_afternoon_tr_df, topN = 10)

rules_weekday_afternoon <- apriori(bread_weekday_afternoon_tr_df, parameter = list(supp = 0.002,
                                                                               conf = 0.7, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5   0.002      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 6 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3409 item(s), 3326 transaction(s)] done [0.01s].
## sorting and recoding items ... [44 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [11 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_weekday_afternoon <- sort(rules_weekday_afternoon, by="confidence", decreasing = TRUE)
rules_weekday_afternoon

## set of 11 rules

inspect(rules_weekday_afternoon)

##      lhs                              rhs      support     confidence
## [1]  {Keeping It Local}            => {Coffee} 0.002405292 1.0000000 
## [2]  {Cake,Vegan mincepie}         => {Coffee} 0.002104630 1.0000000 
## [3]  {Hearty & Seasonal,Sandwich}  => {Coffee} 0.002705953 0.9000000 
## [4]  {Extra Salami or Feta,Salad}  => {Coffee} 0.002405292 0.8888889 
## [5]  {Salad,Sandwich}              => {Coffee} 0.002405292 0.8888889 
## [6]  {Extra Salami or Feta}        => {Coffee} 0.003307276 0.8461538 
## [7]  {Muffin,Sandwich}             => {Coffee} 0.002104630 0.7777778 
## [8]  {Alfajores,Cake}              => {Coffee} 0.003006615 0.7692308 
## [9]  {Coffee,Extra Salami or Feta} => {Salad}  0.002405292 0.7272727 
## [10] {Toast}                       => {Coffee} 0.016837041 0.7000000 
## [11] {Brownie,Sandwich}            => {Coffee} 0.002104630 0.7000000 
##      coverage    lift      count
## [1]  0.002405292  2.183848  8   
## [2]  0.002104630  2.183848  7   
## [3]  0.003006615  1.965463  9   
## [4]  0.002705953  1.941198  8   
## [5]  0.002705953  1.941198  8   
## [6]  0.003908599  1.847871 11   
## [7]  0.002705953  1.698548  7   
## [8]  0.003908599  1.679883 10   
## [9]  0.003307276 40.315152  8   
## [10] 0.024052916  1.528693 56   
## [11] 0.003006615  1.528693  7

plot(rules_weekday_afternoon)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(rules_weekday_afternoon,method = "grouped")

plot(rules_weekday_afternoon,method = "graph")

plot(rules_weekday_afternoon,method = "paracoord")

As we can see, there are 11 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekday afternoon transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with toast and Extra Salami or Feta.

4.2.1 Weekday Evening

In weekday morning, we set min. Support in 0.01 and min. Confident in 0.8

bread_evening <- bread_df
bread_evening$period_day <- as.factor(bread_evening$period_day)

bread_weekday_evening_df <- bread_weekday_df %>%
  filter(period_day=="evening")

str(bread_weekday_evening_df)

## tibble [356 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:356] 172 172 173 173 174 175 176 176 253 254 ...
##  $ Item           : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 12 24 12 61 24 12 27 24 75 56 ...
##  $ date_time      : chr [1:356] "31-10-2016 17:06" "31-10-2016 17:06" "31-10-2016 17:08" "31-10-2016 17:08" ...
##  $ period_day     : chr [1:356] "evening" "evening" "evening" "evening" ...
##  $ weekday_weekend: chr [1:356] "weekday" "weekday" "weekday" "weekday" ...
##  $ Date           : Date[1:356], format: "0031-10-20" "0031-10-20" ...

head(bread_weekday_evening_df)

## # A tibble: 6 x 6
##   Transaction Item   date_time        period_day weekday_weekend Date      
##         <dbl> <fct>  <chr>            <chr>      <chr>           <date>    
## 1         172 Bread  31-10-2016 17:06 evening    weekday         0031-10-20
## 2         172 Coffee 31-10-2016 17:06 evening    weekday         0031-10-20
## 3         173 Bread  31-10-2016 17:08 evening    weekday         0031-10-20
## 4         173 Muffin 31-10-2016 17:08 evening    weekday         0031-10-20
## 5         174 Coffee 31-10-2016 17:40 evening    weekday         0031-10-20
## 6         175 Bread  31-10-2016 17:49 evening    weekday         0031-10-20

summary(bread_weekday_evening_df)

##   Transaction          Item      date_time          period_day       
##  Min.   : 172   Coffee   : 66   Length:356         Length:356        
##  1st Qu.:1474   Bread    : 45   Class :character   Class :character  
##  Median :4156   Tea      : 40   Mode  :character   Mode  :character  
##  Mean   :4448   Cake     : 23                                        
##  3rd Qu.:7075   Cookies  : 20                                        
##  Max.   :9550   Alfajores: 15                                        
##                 (Other)  :147                                        
##  weekday_weekend         Date           
##  Length:356         Min.   :0001-02-20  
##  Class :character   1st Qu.:0006-12-05  
##  Mode  :character   Median :0014-11-20  
##                     Mean   :0014-08-14  
##                     3rd Qu.:0022-02-20  
##                     Max.   :0031-10-20  
##

itemlist_weekday_evening <- ddply(bread_weekday_evening_df, c("Transaction"),
                                    function(df1)paste(df1$Item, collapse = ","))

itemlist_weekday_evening$Transaction <- NULL
colnames(itemlist_weekday_evening) <- c("items")

# Write to csv file
write.csv(itemlist_weekday_evening, "data/bread basket weekday evening.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
bread_weekday_evening_tr_df <- read.transactions("data/bread basket weekday evening.csv",
                                                   format = "basket",
                                                   sep = ",")

bread_weekday_evening_tr_df

## transactions in sparse format with
##  170 transactions (rows) and
##  213 items (columns)

summary(bread_weekday_evening_tr_df)

## transactions as itemMatrix in sparse format with
##  170 rows (elements/itemsets/transactions) and
##  213 columns (items) and a density of 0.01394642 
## 
## most frequent items:
##  Coffee   Bread     Tea    Cake Cookies (Other) 
##      59      43      38      21      18     326 
## 
## element (itemset/transaction) length distribution:
## sizes
##  1  2  3  4  5  6  7 11 
##  1 81 47 20 12  6  2  1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.971   3.000  11.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

itemFrequencyPlot(bread_weekday_evening_tr_df, topN = 10)

rules_weekday_evening <- apriori(bread_weekday_evening_tr_df, parameter = list(supp = 0.01,
                                                                                   conf = 0.8, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 1 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[213 item(s), 170 transaction(s)] done [0.00s].
## sorting and recoding items ... [32 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_weekday_evening <- sort(rules_weekday_evening, by="confidence", decreasing = TRUE)
rules_weekday_evening

## set of 15 rules

inspect(rules_weekday_evening)

##      lhs                          rhs         support    confidence coverage  
## [1]  {Salad}                   => {Coffee}    0.01176471 1          0.01176471
## [2]  {Scone}                   => {Cake}      0.01176471 1          0.01176471
## [3]  {Scone}                   => {Tea}       0.01176471 1          0.01176471
## [4]  {Mineral water}           => {Alfajores} 0.01764706 1          0.01764706
## [5]  {Cake,Scone}              => {Tea}       0.01176471 1          0.01176471
## [6]  {Scone,Tea}               => {Cake}      0.01176471 1          0.01176471
## [7]  {Alfajores,Coke}          => {Coffee}    0.01176471 1          0.01176471
## [8]  {Coffee,Mineral water}    => {Alfajores} 0.01176471 1          0.01176471
## [9]  {Alfajores,Brownie}       => {Bread}     0.01176471 1          0.01176471
## [10] {Bread,Brownie}           => {Alfajores} 0.01176471 1          0.01176471
## [11] {Cookies,Juice}           => {Tea}       0.01176471 1          0.01176471
## [12] {Juice,Tea}               => {Cookies}   0.01176471 1          0.01176471
## [13] {Hot chocolate,Medialuna} => {Coffee}    0.01176471 1          0.01176471
## [14] {Alfajores,Pastry}        => {Tea}       0.01176471 1          0.01176471
## [15] {Bread,Hot chocolate}     => {Tea}       0.01176471 1          0.01176471
##      lift      count
## [1]   2.881356 2    
## [2]   8.095238 2    
## [3]   4.473684 2    
## [4]  12.142857 3    
## [5]   4.473684 2    
## [6]   8.095238 2    
## [7]   2.881356 2    
## [8]  12.142857 2    
## [9]   3.953488 2    
## [10] 12.142857 2    
## [11]  4.473684 2    
## [12]  9.444444 2    
## [13]  2.881356 2    
## [14]  4.473684 2    
## [15]  4.473684 2

plot(rules_weekday_evening)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(rules_weekday_evening,method = "grouped")

plot(rules_weekday_evening,method = "graph")

plot(rules_weekday_evening,method = "paracoord")

As we can see, there are 15 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekday evening transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with Alfajores or Coke.

4.3.1 Weekend Morning

In weekday morning, we set min. Support in 0.005 and min. Confident in 0.8

bread_weekend_morning_df <- bread_weekend_df %>%
  filter(period_day=="morning")

str(bread_weekend_morning_df)

## tibble [3,230 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:3230] 1 2 2 3 3 3 4 5 5 5 ...
##  $ Item           : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 12 75 75 49 50 27 61 24 66 12 ...
##  $ date_time      : chr [1:3230] "30-10-2016 09:58" "30-10-2016 10:05" "30-10-2016 10:05" "30-10-2016 10:07" ...
##  $ period_day     : chr [1:3230] "morning" "morning" "morning" "morning" ...
##  $ weekday_weekend: chr [1:3230] "weekend" "weekend" "weekend" "weekend" ...
##  $ Date           : Date[1:3230], format: "0030-10-20" "0030-10-20" ...

head(bread_weekend_morning_df)

## # A tibble: 6 x 6
##   Transaction Item         date_time       period_day weekday_weekend Date      
##         <dbl> <fct>        <chr>           <chr>      <chr>           <date>    
## 1           1 Bread        30-10-2016 09:~ morning    weekend         0030-10-20
## 2           2 Scandinavian 30-10-2016 10:~ morning    weekend         0030-10-20
## 3           2 Scandinavian 30-10-2016 10:~ morning    weekend         0030-10-20
## 4           3 Hot chocola~ 30-10-2016 10:~ morning    weekend         0030-10-20
## 5           3 Jam          30-10-2016 10:~ morning    weekend         0030-10-20
## 6           3 Cookies      30-10-2016 10:~ morning    weekend         0030-10-20

summary(bread_weekend_morning_df)

##   Transaction          Item       date_time          period_day       
##  Min.   :   1   Coffee   : 882   Length:3230        Length:3230       
##  1st Qu.:2186   Bread    : 623   Class :character   Class :character  
##  Median :5124   Pastry   : 213   Mode  :character   Mode  :character  
##  Mean   :4941   Medialuna: 195                                        
##  3rd Qu.:7234   Tea      : 128                                        
##  Max.   :9665   Cake     : 112                                        
##                 (Other)  :1077                                        
##  weekday_weekend         Date           
##  Length:3230        Min.   :0001-01-20  
##  Class :character   1st Qu.:0007-01-20  
##  Mode  :character   Median :0015-01-20  
##                     Mean   :0015-12-01  
##                     3rd Qu.:0024-12-20  
##                     Max.   :0031-12-20  
##

itemlist_weekend_morning <- ddply(bread_weekend_morning_df, c("Transaction"),
                                  function(df1)paste(df1$Item, collapse = ","))

itemlist_weekend_morning$Transaction <- NULL
colnames(itemlist_weekend_morning) <- c("items")

# Write to csv file
write.csv(itemlist_weekend_morning, "data/bread basket weekend morning.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
bread_weekend_morning_tr_df <- read.transactions("data/bread basket weekend morning.csv",
                                                 format = "basket",
                                                 sep = ",")

bread_weekend_morning_tr_df

## transactions in sparse format with
##  1456 transactions (rows) and
##  1523 items (columns)

summary(bread_weekend_morning_tr_df)

## transactions as itemMatrix in sparse format with
##  1456 rows (elements/itemsets/transactions) and
##  1523 columns (items) and a density of 0.001986482 
## 
## most frequent items:
##    Coffee     Bread    Pastry Medialuna       Tea   (Other) 
##       711       572       200       183       123      2616 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8  10 
##   1 592 449 272  92  33  11   5   1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.025   4.000  10.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

itemFrequencyPlot(bread_weekend_morning_tr_df, topN = 10)

rules_weekend_morning <- apriori(bread_weekend_morning_tr_df, parameter = list(supp = 0.005,
                                                                               conf = 0.8, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.005      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 7 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1523 item(s), 1456 transaction(s)] done [0.00s].
## sorting and recoding items ... [34 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [5 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_weekend_morning <- sort(rules_weekend_morning, by="confidence", decreasing = TRUE)
rules_weekend_morning

## set of 5 rules

inspect(rules_weekend_morning)

##     lhs                   rhs      support     confidence coverage    lift    
## [1] {The Nomad}        => {Coffee} 0.010302198 0.9375000  0.010989011 1.919831
## [2] {Keeping It Local} => {Coffee} 0.006868132 0.9090909  0.007554945 1.861655
## [3] {Smoothies}        => {Coffee} 0.006868132 0.9090909  0.007554945 1.861655
## [4] {Cake,Medialuna}   => {Coffee} 0.005494505 0.8888889  0.006181319 1.820284
## [5] {Spanish Brunch}   => {Coffee} 0.019917582 0.8529412  0.023351648 1.746670
##     count
## [1] 15   
## [2] 10   
## [3] 10   
## [4]  8   
## [5] 29

plot(rules_weekend_morning)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(rules_weekend_morning,method = "grouped")

plot(rules_weekend_morning,method = "graph")

plot(rules_weekend_morning,method = "paracoord")

As we can see, there are 5 rules and coffee still is the best selling product at the cafe, followed by bread and pastry in the weekend morning transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with Spanish Brunch and The Nomad.

4.3.2 Weekend Afternoon

In weekend afternoon, we set min. Support in 0.004 and min. Confident in 0.7

bread_weekend_afternoon_df <- bread_weekend_df %>%
  filter(period_day=="afternoon")

str(bread_weekend_afternoon_df)

## tibble [4,296 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:4296] 43 43 44 44 45 45 45 46 47 47 ...
##  $ Item           : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 75 41 24 56 24 49 56 24 34 52 ...
##  $ date_time      : chr [1:4296] "30-10-2016 12:00" "30-10-2016 12:00" "30-10-2016 12:05" "30-10-2016 12:05" ...
##  $ period_day     : chr [1:4296] "afternoon" "afternoon" "afternoon" "afternoon" ...
##  $ weekday_weekend: chr [1:4296] "weekend" "weekend" "weekend" "weekend" ...
##  $ Date           : Date[1:4296], format: "0030-10-20" "0030-10-20" ...

head(bread_weekend_afternoon_df)

## # A tibble: 6 x 6
##   Transaction Item         date_time       period_day weekday_weekend Date      
##         <dbl> <fct>        <chr>           <chr>      <chr>           <date>    
## 1          43 Scandinavian 30-10-2016 12:~ afternoon  weekend         0030-10-20
## 2          43 Fudge        30-10-2016 12:~ afternoon  weekend         0030-10-20
## 3          44 Coffee       30-10-2016 12:~ afternoon  weekend         0030-10-20
## 4          44 Medialuna    30-10-2016 12:~ afternoon  weekend         0030-10-20
## 5          45 Coffee       30-10-2016 12:~ afternoon  weekend         0030-10-20
## 6          45 Hot chocola~ 30-10-2016 12:~ afternoon  weekend         0030-10-20

summary(bread_weekend_afternoon_df)

##   Transaction              Item       date_time          period_day       
##  Min.   :  43   Coffee       :1025   Length:4296        Length:4296       
##  1st Qu.:2619   Bread        : 601   Class :character   Class :character  
##  Median :5543   Tea          : 322   Mode  :character   Mode  :character  
##  Mean   :5215   Cake         : 294                                        
##  3rd Qu.:7577   Sandwich     : 229                                        
##  Max.   :9684   Hot chocolate: 138                                        
##                 (Other)      :1687                                        
##  weekday_weekend         Date           
##  Length:4296        Min.   :0001-04-20  
##  Class :character   1st Qu.:0007-01-20  
##  Mode  :character   Median :0014-01-20  
##                     Mean   :0015-04-10  
##                     3rd Qu.:0022-01-20  
##                     Max.   :0031-12-20  
##

itemlist_weekend_afternoon <- ddply(bread_weekend_afternoon_df, c("Transaction"),
                                  function(df1)paste(df1$Item, collapse = ","))

itemlist_weekend_afternoon$Transaction <- NULL
colnames(itemlist_weekend_afternoon) <- c("items")

# Write to csv file
write.csv(itemlist_weekend_afternoon, "data/bread basket weekend afternoon.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
bread_weekend_afternoon_tr_df <- read.transactions("data/bread basket weekend afternoon.csv",
                                                 format = "basket",
                                                 sep = ",")

bread_weekend_afternoon_tr_df

## transactions in sparse format with
##  1765 transactions (rows) and
##  1841 items (columns)

summary(bread_weekend_afternoon_tr_df)

## transactions as itemMatrix in sparse format with
##  1765 rows (elements/itemsets/transactions) and
##  1841 columns (items) and a density of 0.001756035 
## 
## most frequent items:
##   Coffee    Bread      Tea     Cake Sandwich  (Other) 
##      817      563      302      279      191     3554 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8   9  10 
##   1 645 501 331 176  81  22   5   2   1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.233   4.000  10.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100

itemFrequencyPlot(bread_weekend_afternoon_tr_df, topN = 10)

rules_weekend_afternoon <- apriori(bread_weekend_afternoon_tr_df, parameter = list(supp = 0.004,
                                                                               conf = 0.7, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5   0.004      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 7 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1841 item(s), 1765 transaction(s)] done [0.00s].
## sorting and recoding items ... [41 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_weekend_afternoon <- sort(rules_weekend_afternoon, by="confidence", decreasing = TRUE)
rules_weekend_afternoon

## set of 6 rules

inspect(rules_weekend_afternoon)

##     lhs                        rhs      support     confidence coverage   
## [1] {Juice,Toast}           => {Coffee} 0.004532578 1.0000000  0.004532578
## [2] {Art Tray}              => {Coffee} 0.004532578 0.8888889  0.005099150
## [3] {Sandwich,Soup}         => {Coffee} 0.005665722 0.7692308  0.007365439
## [4] {Hot chocolate,Scone}   => {Coffee} 0.005099150 0.7500000  0.006798867
## [5] {Cake,Tiffin}           => {Coffee} 0.004532578 0.7272727  0.006232295
## [6] {Cookies,Hot chocolate} => {Coffee} 0.004532578 0.7272727  0.006232295
##     lift     count
## [1] 2.160343  8   
## [2] 1.920305  8   
## [3] 1.661802 10   
## [4] 1.620257  9   
## [5] 1.571158  8   
## [6] 1.571158  8

plot(rules_weekend_afternoon)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(rules_weekend_afternoon,method = "grouped")

plot(rules_weekend_afternoon,method = "graph")

plot(rules_weekend_afternoon,method = "paracoord")

As we can see, there are 6 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekend afternoon transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with Sandwich or Soup and Hot chocolate or Scone.

4.3.2 Weekend Evening

In weekend afternoon, we set min. Support in 0.02 and min. Confident in 0.9

bread_weekend_evening_df <- bread_weekend_df %>%
  filter(period_day=="evening")

str(bread_weekend_evening_df)

## tibble [164 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Transaction    : num [1:164] 647 647 737 738 738 ...
##  $ Item           : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 56 66 3 66 32 5 83 41 62 54 ...
##  $ date_time      : chr [1:164] "05-11-2016 18:45" "05-11-2016 18:45" "06-11-2016 17:15" "06-11-2016 17:19" ...
##  $ period_day     : chr [1:164] "evening" "evening" "evening" "evening" ...
##  $ weekday_weekend: chr [1:164] "weekend" "weekend" "weekend" "weekend" ...
##  $ Date           : Date[1:164], format: "0005-11-20" "0005-11-20" ...

head(bread_weekend_evening_df)

## # A tibble: 6 x 6
##   Transaction Item          date_time      period_day weekday_weekend Date      
##         <dbl> <fct>         <chr>          <chr>      <chr>           <date>    
## 1         647 Medialuna     05-11-2016 18~ evening    weekend         0005-11-20
## 2         647 Pastry        05-11-2016 18~ evening    weekend         0005-11-20
## 3         737 Alfajores     06-11-2016 17~ evening    weekend         0006-11-20
## 4         738 Pastry        06-11-2016 17~ evening    weekend         0006-11-20
## 5         738 Dulce de Lec~ 06-11-2016 17~ evening    weekend         0006-11-20
## 6        1273 Art Tray      13-11-2016 17~ evening    weekend         0013-11-20

summary(bread_weekend_evening_df)

##   Transaction                         Item     date_time        
##  Min.   : 647   Coffee                  :21   Length:164        
##  1st Qu.:5993   Tshirt                  :21   Class :character  
##  Median :6018   Afternoon with the baker:15   Mode  :character  
##  Mean   :6119   Postcard                :10                     
##  3rd Qu.:7663   Bread                   : 9                     
##  Max.   :9231   Tea                     : 9                     
##                 (Other)                 :79                     
##   period_day        weekday_weekend         Date           
##  Length:164         Length:164         Min.   :0001-04-20  
##  Class :character   Class :character   1st Qu.:0004-02-20  
##  Mode  :character   Mode  :character   Median :0005-03-20  
##                                        Mean   :0009-05-25  
##                                        3rd Qu.:0012-07-28  
##                                        Max.   :0031-12-20  
##

itemlist_weekend_evening <- ddply(bread_weekend_evening_df, c("Transaction"),
                                  function(df1)paste(df1$Item, collapse = ","))

itemlist_weekend_evening$Transaction <- NULL
colnames(itemlist_weekend_evening) <- c("items")

# Write to csv file
write.csv(itemlist_weekend_evening, "data/bread basket weekend evening.csv", 
          quote = FALSE,
          row.names = TRUE)

# IMPORT AS TRANSACTION
bread_weekend_evening_tr_df <- read.transactions("data/bread basket weekend evening.csv",
                                                 format = "basket",
                                                 sep = ",")

bread_weekend_evening_tr_df

## transactions in sparse format with
##  93 transactions (rows) and
##  135 items (columns)

summary(bread_weekend_evening_tr_df)

## transactions as itemMatrix in sparse format with
##  93 rows (elements/itemsets/transactions) and
##  135 columns (items) and a density of 0.01943449 
## 
## most frequent items:
##                   Tshirt                   Coffee Afternoon with the baker 
##                       19                       16                       14 
##                 Postcard                    Bread                  (Other) 
##                        9                        8                      178 
## 
## element (itemset/transaction) length distribution:
## sizes
##  1  2  3  4  5  6 
##  1 57 21  6  6  2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   2.000   2.624   3.000   6.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3     11

itemFrequencyPlot(bread_weekend_evening_tr_df, topN = 10)

rules_weekend_evening <- apriori(bread_weekend_evening_tr_df, parameter = list(supp = 0.02,
                                                                               conf = 0.9, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5    0.02      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 1 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[135 item(s), 93 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_weekend_evening <- sort(rules_weekend_evening, by="confidence", decreasing = TRUE)
rules_weekend_evening

## set of 4 rules

inspect(rules_weekend_evening)

##     lhs                   rhs      support    confidence coverage   lift    
## [1] {Scone}            => {Coffee} 0.02150538 1          0.02150538  5.81250
## [2] {Medialuna,Tea}    => {Coffee} 0.02150538 1          0.02150538  5.81250
## [3] {Coffee,Medialuna} => {Tea}    0.02150538 1          0.02150538 13.28571
## [4] {Bread,Cake}       => {Coffee} 0.02150538 1          0.02150538  5.81250
##     count
## [1] 2    
## [2] 2    
## [3] 2    
## [4] 2

plot(rules_weekend_evening)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(rules_weekend_evening,method = "grouped")

plot(rules_weekend_evening,method = "graph")

plot(rules_weekend_evening,method = "paracoord")

As we can see, there are 4 rules and T-shirt is the best selling product at the cafe followed by bread and tea in the weekend afternoon transactions. That’s really unique because the best selling product in that cafe is always coffee. This could be the consumers are coming in the weekend mostly with family so that they are excited to buy marchandise in that cafe followed by bread and tea in the weekend afternoon transactions.

4. Recommendations

Make a product package like Coffee + Bread in weekday or weekend morning, because in the morning mostly consumers wanted to buy coffee and bread.
Make a product package like Tea + Bread in weekday or weekday evening, because in the evening mostly consumers wanted to buy tea and bread besides coffee.
Launch a new food product in the weekday/weekend Afternoon.
Launch a new drink product in the weekday/weekend Morning.
Launch a new Marchandise in the weekend Evening.