This report describe data transaction in The Bread Basket Cafe, located in Edinburgh. The dataset has 20507 entries and over 9000 transactions. The dataset has transactions of customers who ordered different items from this bakery online and the time period of the data is 26-01-11 to 27-12-03. The dataset used in this report is The Bread Basket hosted in kaggle.
DataSet -> (https://www.kaggle.com/mittalvasu95/the-bread-basket)
Report Outline
1. Data Extraction
2. Exploratory Data Analysis
3. Modelling
4. Recommendation
The Dataset is downlaoded from kaggle and saved in the data folder. we used read_excel function to read the dataset and put in bread data frame.
library(readxl)
bread_df <- read_excel("data/bread-basket.xlsx")
To see the number of rows and column from data frame, we used dim fuction. The dataset has 20507 rows and 5 columns.
dim(bread_df)
## [1] 20507 5
To find out the column names and types, we used str() function.
To find out the the first 6 transactions, we used head() function.
To find out the sumamry from this transactions, we used **summary() function.
str(bread_df)
## tibble [20,507 x 5] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:20507] 1 2 2 3 3 3 4 5 5 5 ...
## $ Item : chr [1:20507] "Bread" "Scandinavian" "Scandinavian" "Hot chocolate" ...
## $ date_time : chr [1:20507] "30-10-2016 09:58" "30-10-2016 10:05" "30-10-2016 10:05" "30-10-2016 10:07" ...
## $ period_day : chr [1:20507] "morning" "morning" "morning" "morning" ...
## $ weekday_weekend: chr [1:20507] "weekend" "weekend" "weekend" "weekend" ...
head(bread_df)
## # A tibble: 6 x 5
## Transaction Item date_time period_day weekday_weekend
## <dbl> <chr> <chr> <chr> <chr>
## 1 1 Bread 30-10-2016 09:58 morning weekend
## 2 2 Scandinavian 30-10-2016 10:05 morning weekend
## 3 2 Scandinavian 30-10-2016 10:05 morning weekend
## 4 3 Hot chocolate 30-10-2016 10:07 morning weekend
## 5 3 Jam 30-10-2016 10:07 morning weekend
## 6 3 Cookies 30-10-2016 10:07 morning weekend
summary(bread_df)
## Transaction Item date_time period_day
## Min. : 1 Length:20507 Length:20507 Length:20507
## 1st Qu.:2552 Class :character Class :character Class :character
## Median :5137 Mode :character Mode :character Mode :character
## Mean :4976
## 3rd Qu.:7357
## Max. :9684
## weekday_weekend
## Length:20507
## Class :character
## Mode :character
##
##
##
From the result above, we know the following:
1. The second column is Item. this should be a class variable, Currently the type is char and it should be converted to factor.
2. The third column is date_time. this should be a class variable, Currently the type is char and it should be converted to Date.
bread_df$Item <- as.factor(bread_df$Item)
bread_df$Date <- as.Date(bread_df$date_time)
To find out a diagram of the number of weekday and weekend transactions, we can used **ggplot()
library(ggplot2)
ggplot(data=bread_df, aes(x = weekday_weekend)) +
geom_bar(color = "blue")
Analysis of two variables, Distribution of transactions on weekday_weekend variable based on period day.
ggplot(bread_df, aes(x=weekday_weekend, fill = period_day)) +
geom_bar(position = "dodge")
Change to transaction based data.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble 3.0.5 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(knitr)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
##
## recode
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Loading required package: grid
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
bread_df <- bread_df[complete.cases(bread_df),]
bread_df$Transaction <- as.numeric(bread_df$Transaction)
bread_df_sorted <- bread_df[order(bread_df$Transaction), ]
itemlist <- ddply(bread_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist$Transaction <- NULL
colnames(itemlist) <- c("items")
# Write to csv file
write.csv(itemlist, "data/bread basket.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
tr_df <- read.transactions("data/bread basket.csv",
format = "basket",
sep = ",")
We can see the summary from this transaction and top 10 product in The Bread Basket Cafe.
summary(tr_df)
## transactions as itemMatrix in sparse format with
## 9466 rows (elements/itemsets/transactions) and
## 9570 columns (items) and a density of 0.0003127952
##
## most frequent items:
## Coffee Bread Tea Cake Pastry (Other)
## 4526 3094 1349 983 814 17570
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11
## 1 3954 3058 1469 662 231 64 17 4 5 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 2.993 4.000 11.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
itemFrequencyPlot(tr_df, topN = 10)
we generating rules using apriori and set the value for minimim support and minimum confident. We set a different value for each period and time in order to know what recommendations can be given to the company. We can also plot using method grouped, graph, and **paracoord.
In All transactions, we set min. Support in 0.001 and min. Confident in 0.8
rules <- apriori(tr_df, parameter = list(supp = 0.001,
conf = 0.8, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[9570 item(s), 9466 transaction(s)] done [0.04s].
## sorting and recoding items ... [55 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules <- sort(rules, by="confidence", decreasing = TRUE)
rules
## set of 7 rules
inspect(rules)
## lhs rhs support confidence coverage
## [1] {Extra Salami or Feta,Salad} => {Coffee} 0.001478977 0.8750000 0.001690260
## [2] {Pastry,Toast} => {Coffee} 0.001373336 0.8666667 0.001584619
## [3] {Hearty & Seasonal,Sandwich} => {Coffee} 0.001267695 0.8571429 0.001478977
## [4] {Cake,Vegan mincepie} => {Coffee} 0.001056412 0.8333333 0.001267695
## [5] {Salad,Sandwich} => {Coffee} 0.001584619 0.8333333 0.001901542
## [6] {Extra Salami or Feta} => {Coffee} 0.003274879 0.8157895 0.004014367
## [7] {Keeping It Local} => {Coffee} 0.005387703 0.8095238 0.006655398
## lift count
## [1] 1.830038 14
## [2] 1.812609 13
## [3] 1.792690 12
## [4] 1.742893 10
## [5] 1.742893 15
## [6] 1.706200 31
## [7] 1.693096 51
plot(rules)
plot(rules,method = "grouped")
plot(rules,method = "graph")
plot(rules,method = "paracoord")
As we can see, there are 7 rules and coffee is the best selling product at the cafe, followed by bread and tea in the all transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with product keeping it local and extra salami or feta.
we can separate weekday and weekend transactions into differents data frame and the we can generate the rules. In weekday, we set min. Support in 0.001 and min. Confident in 0.8
bread_weekday_df <- bread_df %>%
filter(weekday_weekend=="weekday")
str(bread_weekday_df)
## tibble [12,807 x 6] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:12807] 81 81 82 82 83 83 84 85 85 85 ...
## $ Item : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 24 16 83 12 24 12 12 24 24 66 ...
## $ date_time : chr [1:12807] "31-10-2016 08:28" "31-10-2016 08:28" "31-10-2016 08:47" "31-10-2016 08:47" ...
## $ period_day : chr [1:12807] "morning" "morning" "morning" "morning" ...
## $ weekday_weekend: chr [1:12807] "weekday" "weekday" "weekday" "weekday" ...
## $ Date : Date[1:12807], format: "0031-10-20" "0031-10-20" ...
head(bread_weekday_df)
## # A tibble: 6 x 6
## Transaction Item date_time period_day weekday_weekend Date
## <dbl> <fct> <chr> <chr> <chr> <date>
## 1 81 Coffee 31-10-2016 08:28 morning weekday 0031-10-20
## 2 81 Cake 31-10-2016 08:28 morning weekday 0031-10-20
## 3 82 Tartine 31-10-2016 08:47 morning weekday 0031-10-20
## 4 82 Bread 31-10-2016 08:47 morning weekday 0031-10-20
## 5 83 Coffee 31-10-2016 08:57 morning weekday 0031-10-20
## 6 83 Bread 31-10-2016 08:57 morning weekday 0031-10-20
summary(bread_weekday_df)
## Transaction Item date_time period_day
## Min. : 81 Coffee :3543 Length:12807 Length:12807
## 1st Qu.:2446 Bread :2092 Class :character Class :character
## Median :4929 Tea : 976 Mode :character Mode :character
## Mean :4888 Cake : 612
## 3rd Qu.:7322 Pastry : 566
## Max. :9550 Sandwich: 512
## (Other) :4506
## weekday_weekend Date
## Length:12807 Min. :0001-02-20
## Class :character 1st Qu.:0007-11-20
## Mode :character Median :0015-11-20
## Mean :0015-08-24
## 3rd Qu.:0023-02-20
## Max. :0031-10-20
##
itemlist_weekday <- ddply(bread_weekday_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist_weekday$Transaction <- NULL
colnames(itemlist_weekday) <- c("items")
# Write to csv file
write.csv(itemlist_weekday, "data/bread basket weekday.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
bread_weekday_tr_df <- read.transactions("data/bread basket weekday.csv",
format = "basket",
sep = ",")
bread_weekday_tr_df
## transactions in sparse format with
## 6146 transactions (rows) and
## 6239 items (columns)
summary(bread_weekday_tr_df)
## transactions as itemMatrix in sparse format with
## 6146 rows (elements/itemsets/transactions) and
## 6239 columns (items) and a density of 0.000468537
##
## most frequent items:
## Coffee Bread Tea Cake Pastry (Other)
## 2982 1951 917 589 537 10990
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11
## 1 2651 2087 860 388 115 31 7 2 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 2.923 3.000 11.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
itemFrequencyPlot(bread_weekday_tr_df, topN = 10,)
rules_weekday <- apriori(bread_weekday_tr_df, parameter = list(supp = 0.001,
conf = 0.8, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 6
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6239 item(s), 6146 transaction(s)] done [0.02s].
## sorting and recoding items ... [49 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [9 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_weekday <- sort(rules_weekday, by="confidence", decreasing = TRUE)
rules_weekday
## set of 9 rules
inspect(rules_weekday)
## lhs rhs support confidence coverage
## [1] {Cake,Vegan mincepie} => {Coffee} 0.001301660 1.0000000 0.001301660
## [2] {Cake,Salad} => {Coffee} 0.001138952 1.0000000 0.001138952
## [3] {Hearty & Seasonal,Sandwich} => {Coffee} 0.001464367 0.9000000 0.001627075
## [4] {Pastry,Toast} => {Coffee} 0.001464367 0.9000000 0.001627075
## [5] {Extra Salami or Feta,Salad} => {Coffee} 0.001301660 0.8888889 0.001464367
## [6] {Salad,Sandwich} => {Coffee} 0.001301660 0.8888889 0.001464367
## [7] {Cake,Toast} => {Coffee} 0.001789782 0.8461538 0.002115197
## [8] {Extra Salami or Feta} => {Coffee} 0.002440612 0.8333333 0.002928734
## [9] {Muffin,Sandwich} => {Coffee} 0.001301660 0.8000000 0.001627075
## lift count
## [1] 2.061033 8
## [2] 2.061033 7
## [3] 1.854930 9
## [4] 1.854930 9
## [5] 1.832029 8
## [6] 1.832029 8
## [7] 1.743951 11
## [8] 1.717527 15
## [9] 1.648826 8
plot(rules_weekday)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(rules_weekday,method = "grouped")
plot(rules_weekday,method = "graph")
plot(rules_weekday,method = "paracoord")
As we can see, there are 9 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekday transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with product extra salami or feta and cake or toast.
In weekend transactions, we set min. Support in 0.002 and min. Confident in 0.8
bread_weekend_df <- bread_df %>%
filter(weekday_weekend=="weekend")
str(bread_weekend_df)
## tibble [7,700 x 6] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:7700] 1 2 2 3 3 3 4 5 5 5 ...
## $ Item : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 12 75 75 49 50 27 61 24 66 12 ...
## $ date_time : chr [1:7700] "30-10-2016 09:58" "30-10-2016 10:05" "30-10-2016 10:05" "30-10-2016 10:07" ...
## $ period_day : chr [1:7700] "morning" "morning" "morning" "morning" ...
## $ weekday_weekend: chr [1:7700] "weekend" "weekend" "weekend" "weekend" ...
## $ Date : Date[1:7700], format: "0030-10-20" "0030-10-20" ...
head(bread_weekend_df)
## # A tibble: 6 x 6
## Transaction Item date_time period_day weekday_weekend Date
## <dbl> <fct> <chr> <chr> <chr> <date>
## 1 1 Bread 30-10-2016 09:~ morning weekend 0030-10-20
## 2 2 Scandinavian 30-10-2016 10:~ morning weekend 0030-10-20
## 3 2 Scandinavian 30-10-2016 10:~ morning weekend 0030-10-20
## 4 3 Hot chocola~ 30-10-2016 10:~ morning weekend 0030-10-20
## 5 3 Jam 30-10-2016 10:~ morning weekend 0030-10-20
## 6 3 Cookies 30-10-2016 10:~ morning weekend 0030-10-20
summary(bread_weekend_df)
## Transaction Item date_time period_day
## Min. : 1 Coffee :1928 Length:7700 Length:7700
## 1st Qu.:2578 Bread :1233 Class :character Class :character
## Median :5490 Tea : 459 Mode :character Mode :character
## Mean :5123 Cake : 413
## 3rd Qu.:7545 Pastry : 290
## Max. :9684 Medialuna: 277
## (Other) :3100
## weekday_weekend Date
## Length:7700 Min. :0001-01-20
## Class :character 1st Qu.:0007-01-20
## Mode :character Median :0014-01-20
## Mean :0015-05-30
## 3rd Qu.:0022-01-20
## Max. :0031-12-20
##
itemlist_weekend <- ddply(bread_weekend_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist_weekend$Transaction <- NULL
colnames(itemlist_weekend) <- c("items")
# Write to csv file
write.csv(itemlist_weekend, "data/bread basket weekend.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
bread_weekend_tr_df <- read.transactions("data/bread basket weekend.csv",
format = "basket",
sep = ",")
bread_weekend_tr_df
## transactions in sparse format with
## 3321 transactions (rows) and
## 3407 items (columns)
summary(bread_weekend_tr_df)
## transactions as itemMatrix in sparse format with
## 3321 rows (elements/itemsets/transactions) and
## 3407 columns (items) and a density of 0.0009165995
##
## most frequent items:
## Coffee Bread Tea Cake Pastry (Other)
## 1544 1143 432 394 277 6581
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10
## 1 1303 971 609 274 116 33 10 2 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 3.123 4.000 10.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
itemFrequencyPlot(bread_weekend_tr_df, topN = 10)
rules_weekend <- apriori(bread_weekend_tr_df, parameter = list(supp = 0.002,
conf = 0.8, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.002 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 6
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3407 item(s), 3321 transaction(s)] done [0.01s].
## sorting and recoding items ... [51 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_weekend <- sort(rules_weekend, by="confidence", decreasing = TRUE)
rules_weekend
## set of 6 rules
inspect(rules_weekend)
## lhs rhs support confidence coverage
## [1] {Juice,Toast} => {Coffee} 0.003011141 0.9090909 0.003312255
## [2] {Keeping It Local} => {Coffee} 0.004516712 0.8823529 0.005118940
## [3] {Juice,Spanish Brunch} => {Coffee} 0.004215598 0.8235294 0.005118940
## [4] {Brownie,Medialuna} => {Coffee} 0.002710027 0.8181818 0.003312255
## [5] {Extra Salami or Feta} => {Coffee} 0.004817826 0.8000000 0.006022282
## [6] {Muffin,Sandwich} => {Bread} 0.002408913 0.8000000 0.003011141
## lift count
## [1] 1.955370 10
## [2] 1.897859 15
## [3] 1.771335 14
## [4] 1.759833 9
## [5] 1.720725 16
## [6] 2.324409 8
plot(rules_weekend)
plot(rules_weekend,method = "grouped")
plot(rules_weekend,method = "graph")
plot(rules_weekend,method = "paracoord")
As we can see, there are 6 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekend transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with product extra salami or feta and juice or spanish brunch.
In weekday morning, we set min. Support in 0.002 and min. Confident in 0.7
bread_morning <- bread_df
bread_morning$period_day <- as.factor(bread_morning$period_day)
bread_weekday_morning_df <- bread_weekday_df %>%
filter(period_day=="morning")
str(bread_weekday_morning_df)
## tibble [5,174 x 6] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:5174] 81 81 82 82 83 83 84 85 85 85 ...
## $ Item : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 24 16 83 12 24 12 12 24 24 66 ...
## $ date_time : chr [1:5174] "31-10-2016 08:28" "31-10-2016 08:28" "31-10-2016 08:47" "31-10-2016 08:47" ...
## $ period_day : chr [1:5174] "morning" "morning" "morning" "morning" ...
## $ weekday_weekend: chr [1:5174] "weekday" "weekday" "weekday" "weekday" ...
## $ Date : Date[1:5174], format: "0031-10-20" "0031-10-20" ...
head(bread_weekday_morning_df)
## # A tibble: 6 x 6
## Transaction Item date_time period_day weekday_weekend Date
## <dbl> <fct> <chr> <chr> <chr> <date>
## 1 81 Coffee 31-10-2016 08:28 morning weekday 0031-10-20
## 2 81 Cake 31-10-2016 08:28 morning weekday 0031-10-20
## 3 82 Tartine 31-10-2016 08:47 morning weekday 0031-10-20
## 4 82 Bread 31-10-2016 08:47 morning weekday 0031-10-20
## 5 83 Coffee 31-10-2016 08:57 morning weekday 0031-10-20
## 6 83 Bread 31-10-2016 08:57 morning weekday 0031-10-20
summary(bread_weekday_morning_df)
## Transaction Item date_time period_day
## Min. : 81 Coffee :1679 Length:5174 Length:5174
## 1st Qu.:2316 Bread : 987 Class :character Class :character
## Median :4538 Pastry : 391 Mode :character Mode :character
## Mean :4688 Tea : 328
## 3rd Qu.:7285 Medialuna: 207
## Max. :9522 Cake : 152
## (Other) :1430
## weekday_weekend Date
## Length:5174 Min. :0001-02-20
## Class :character 1st Qu.:0007-04-20
## Mode :character Median :0015-02-20
## Mean :0015-07-11
## 3rd Qu.:0023-03-20
## Max. :0031-10-20
##
itemlist_weekday_morning <- ddply(bread_weekday_morning_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist_weekday_morning$Transaction <- NULL
colnames(itemlist_weekday_morning) <- c("items")
# Write to csv file
write.csv(itemlist_weekday_morning, "data/bread basket weekday morning.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
bread_weekday_morning_tr_df <- read.transactions("data/bread basket weekday morning.csv",
format = "basket",
sep = ",")
bread_weekday_morning_tr_df
## transactions in sparse format with
## 2649 transactions (rows) and
## 2718 items (columns)
summary(bread_weekday_morning_tr_df)
## transactions as itemMatrix in sparse format with
## 2649 rows (elements/itemsets/transactions) and
## 2718 columns (items) and a density of 0.001026669
##
## most frequent items:
## Coffee Bread Pastry Tea Medialuna (Other)
## 1400 917 371 318 196 4190
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8
## 1 1246 896 356 121 22 6 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 3.00 2.79 3.00 8.00
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
itemFrequencyPlot(bread_weekday_morning_tr_df, topN = 10)
rules_weekday_morning <- apriori(bread_weekday_morning_tr_df, parameter = list(supp = 0.002,
conf = 0.7, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.002 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[2718 item(s), 2649 transaction(s)] done [0.01s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_weekday_morning <- sort(rules_weekday_morning, by="confidence", decreasing = TRUE)
rules_weekday_morning
## set of 7 rules
inspect(rules_weekday_morning)
## lhs rhs support confidence coverage lift
## [1] {Pastry,Toast} => {Coffee} 0.003020008 1.0000000 0.003020008 1.892143
## [2] {Cookies,Toast} => {Coffee} 0.002642507 0.8750000 0.003020008 1.655625
## [3] {Juice,Pastry} => {Coffee} 0.004152510 0.7857143 0.005285013 1.486684
## [4] {Medialuna,Toast} => {Coffee} 0.002265006 0.7500000 0.003020008 1.419107
## [5] {Keeping It Local} => {Coffee} 0.010570026 0.7368421 0.014345036 1.394211
## [6] {Toast} => {Coffee} 0.040392601 0.7181208 0.056247641 1.358787
## [7] {Cookies,Pastry} => {Coffee} 0.003775009 0.7142857 0.005285013 1.351531
## count
## [1] 8
## [2] 7
## [3] 11
## [4] 6
## [5] 28
## [6] 107
## [7] 10
plot(rules_weekday_morning)
plot(rules_weekday_morning,method = "grouped")
plot(rules_weekday_morning,method = "graph")
plot(rules_weekday_morning,method = "paracoord")
As we can see, there are 7 rules and coffee still is the best selling product at the cafe, followed by bread and pastry in the weekday morning transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with toast and keeping it local.
In weekday morning, we set min. Support in 0.002 and min. Confident in 0.7
bread_afternoon <- bread_df
bread_afternoon$period_day <- as.factor(bread_afternoon$period_day)
bread_weekday_afternoon_df <- bread_weekday_df %>%
filter(period_day=="afternoon")
str(bread_weekday_afternoon_df)
## tibble [7,273 x 6] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:7273] 130 130 130 131 131 132 132 132 132 133 ...
## $ Item : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 24 24 61 12 24 84 52 67 27 47 ...
## $ date_time : chr [1:7273] "31-10-2016 12:03" "31-10-2016 12:03" "31-10-2016 12:03" "31-10-2016 12:08" ...
## $ period_day : chr [1:7273] "afternoon" "afternoon" "afternoon" "afternoon" ...
## $ weekday_weekend: chr [1:7273] "weekday" "weekday" "weekday" "weekday" ...
## $ Date : Date[1:7273], format: "0031-10-20" "0031-10-20" ...
head(bread_weekday_afternoon_df)
## # A tibble: 6 x 6
## Transaction Item date_time period_day weekday_weekend Date
## <dbl> <fct> <chr> <chr> <chr> <date>
## 1 130 Coffee 31-10-2016 12:03 afternoon weekday 0031-10-20
## 2 130 Coffee 31-10-2016 12:03 afternoon weekday 0031-10-20
## 3 130 Muffin 31-10-2016 12:03 afternoon weekday 0031-10-20
## 4 131 Bread 31-10-2016 12:08 afternoon weekday 0031-10-20
## 5 131 Coffee 31-10-2016 12:08 afternoon weekday 0031-10-20
## 6 132 Tea 31-10-2016 12:14 afternoon weekday 0031-10-20
summary(bread_weekday_afternoon_df)
## Transaction Item date_time period_day
## Min. : 130 Coffee :1798 Length:7273 Length:7273
## 1st Qu.:2739 Bread :1060 Class :character Class :character
## Median :5265 Tea : 608 Mode :character Mode :character
## Mean :5049 Sandwich: 442
## 3rd Qu.:7355 Cake : 437
## Max. :9549 Soup : 256
## (Other) :2672
## weekday_weekend Date
## Length:7273 Min. :0001-02-20
## Class :character 1st Qu.:0007-11-20
## Mode :character Median :0015-12-20
## Mean :0015-10-14
## 3rd Qu.:0023-02-20
## Max. :0031-10-20
##
itemlist_weekday_afternoon <- ddply(bread_weekday_afternoon_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist_weekday_afternoon$Transaction <- NULL
colnames(itemlist_weekday_afternoon) <- c("items")
# Write to csv file
write.csv(itemlist_weekday_afternoon, "data/bread basket weekday afternoon.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
bread_weekday_afternoon_tr_df <- read.transactions("data/bread basket weekday afternoon.csv",
format = "basket",
sep = ",")
bread_weekday_afternoon_tr_df
## transactions in sparse format with
## 3326 transactions (rows) and
## 3409 items (columns)
summary(bread_weekday_afternoon_tr_df)
## transactions as itemMatrix in sparse format with
## 3326 rows (elements/itemsets/transactions) and
## 3409 columns (items) and a density of 0.0008876084
##
## most frequent items:
## Coffee Bread Tea Cake Sandwich (Other)
## 1523 991 561 417 399 6173
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10
## 1 1322 1143 484 255 87 23 6 2 3
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 3.026 4.000 10.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
itemFrequencyPlot(bread_weekday_afternoon_tr_df, topN = 10)
rules_weekday_afternoon <- apriori(bread_weekday_afternoon_tr_df, parameter = list(supp = 0.002,
conf = 0.7, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.002 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 6
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3409 item(s), 3326 transaction(s)] done [0.01s].
## sorting and recoding items ... [44 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [11 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_weekday_afternoon <- sort(rules_weekday_afternoon, by="confidence", decreasing = TRUE)
rules_weekday_afternoon
## set of 11 rules
inspect(rules_weekday_afternoon)
## lhs rhs support confidence
## [1] {Keeping It Local} => {Coffee} 0.002405292 1.0000000
## [2] {Cake,Vegan mincepie} => {Coffee} 0.002104630 1.0000000
## [3] {Hearty & Seasonal,Sandwich} => {Coffee} 0.002705953 0.9000000
## [4] {Extra Salami or Feta,Salad} => {Coffee} 0.002405292 0.8888889
## [5] {Salad,Sandwich} => {Coffee} 0.002405292 0.8888889
## [6] {Extra Salami or Feta} => {Coffee} 0.003307276 0.8461538
## [7] {Muffin,Sandwich} => {Coffee} 0.002104630 0.7777778
## [8] {Alfajores,Cake} => {Coffee} 0.003006615 0.7692308
## [9] {Coffee,Extra Salami or Feta} => {Salad} 0.002405292 0.7272727
## [10] {Toast} => {Coffee} 0.016837041 0.7000000
## [11] {Brownie,Sandwich} => {Coffee} 0.002104630 0.7000000
## coverage lift count
## [1] 0.002405292 2.183848 8
## [2] 0.002104630 2.183848 7
## [3] 0.003006615 1.965463 9
## [4] 0.002705953 1.941198 8
## [5] 0.002705953 1.941198 8
## [6] 0.003908599 1.847871 11
## [7] 0.002705953 1.698548 7
## [8] 0.003908599 1.679883 10
## [9] 0.003307276 40.315152 8
## [10] 0.024052916 1.528693 56
## [11] 0.003006615 1.528693 7
plot(rules_weekday_afternoon)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(rules_weekday_afternoon,method = "grouped")
plot(rules_weekday_afternoon,method = "graph")
plot(rules_weekday_afternoon,method = "paracoord")
As we can see, there are 11 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekday afternoon transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with toast and Extra Salami or Feta.
In weekday morning, we set min. Support in 0.01 and min. Confident in 0.8
bread_evening <- bread_df
bread_evening$period_day <- as.factor(bread_evening$period_day)
bread_weekday_evening_df <- bread_weekday_df %>%
filter(period_day=="evening")
str(bread_weekday_evening_df)
## tibble [356 x 6] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:356] 172 172 173 173 174 175 176 176 253 254 ...
## $ Item : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 12 24 12 61 24 12 27 24 75 56 ...
## $ date_time : chr [1:356] "31-10-2016 17:06" "31-10-2016 17:06" "31-10-2016 17:08" "31-10-2016 17:08" ...
## $ period_day : chr [1:356] "evening" "evening" "evening" "evening" ...
## $ weekday_weekend: chr [1:356] "weekday" "weekday" "weekday" "weekday" ...
## $ Date : Date[1:356], format: "0031-10-20" "0031-10-20" ...
head(bread_weekday_evening_df)
## # A tibble: 6 x 6
## Transaction Item date_time period_day weekday_weekend Date
## <dbl> <fct> <chr> <chr> <chr> <date>
## 1 172 Bread 31-10-2016 17:06 evening weekday 0031-10-20
## 2 172 Coffee 31-10-2016 17:06 evening weekday 0031-10-20
## 3 173 Bread 31-10-2016 17:08 evening weekday 0031-10-20
## 4 173 Muffin 31-10-2016 17:08 evening weekday 0031-10-20
## 5 174 Coffee 31-10-2016 17:40 evening weekday 0031-10-20
## 6 175 Bread 31-10-2016 17:49 evening weekday 0031-10-20
summary(bread_weekday_evening_df)
## Transaction Item date_time period_day
## Min. : 172 Coffee : 66 Length:356 Length:356
## 1st Qu.:1474 Bread : 45 Class :character Class :character
## Median :4156 Tea : 40 Mode :character Mode :character
## Mean :4448 Cake : 23
## 3rd Qu.:7075 Cookies : 20
## Max. :9550 Alfajores: 15
## (Other) :147
## weekday_weekend Date
## Length:356 Min. :0001-02-20
## Class :character 1st Qu.:0006-12-05
## Mode :character Median :0014-11-20
## Mean :0014-08-14
## 3rd Qu.:0022-02-20
## Max. :0031-10-20
##
itemlist_weekday_evening <- ddply(bread_weekday_evening_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist_weekday_evening$Transaction <- NULL
colnames(itemlist_weekday_evening) <- c("items")
# Write to csv file
write.csv(itemlist_weekday_evening, "data/bread basket weekday evening.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
bread_weekday_evening_tr_df <- read.transactions("data/bread basket weekday evening.csv",
format = "basket",
sep = ",")
bread_weekday_evening_tr_df
## transactions in sparse format with
## 170 transactions (rows) and
## 213 items (columns)
summary(bread_weekday_evening_tr_df)
## transactions as itemMatrix in sparse format with
## 170 rows (elements/itemsets/transactions) and
## 213 columns (items) and a density of 0.01394642
##
## most frequent items:
## Coffee Bread Tea Cake Cookies (Other)
## 59 43 38 21 18 326
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 11
## 1 81 47 20 12 6 2 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 2.971 3.000 11.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
itemFrequencyPlot(bread_weekday_evening_tr_df, topN = 10)
rules_weekday_evening <- apriori(bread_weekday_evening_tr_df, parameter = list(supp = 0.01,
conf = 0.8, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 1
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[213 item(s), 170 transaction(s)] done [0.00s].
## sorting and recoding items ... [32 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_weekday_evening <- sort(rules_weekday_evening, by="confidence", decreasing = TRUE)
rules_weekday_evening
## set of 15 rules
inspect(rules_weekday_evening)
## lhs rhs support confidence coverage
## [1] {Salad} => {Coffee} 0.01176471 1 0.01176471
## [2] {Scone} => {Cake} 0.01176471 1 0.01176471
## [3] {Scone} => {Tea} 0.01176471 1 0.01176471
## [4] {Mineral water} => {Alfajores} 0.01764706 1 0.01764706
## [5] {Cake,Scone} => {Tea} 0.01176471 1 0.01176471
## [6] {Scone,Tea} => {Cake} 0.01176471 1 0.01176471
## [7] {Alfajores,Coke} => {Coffee} 0.01176471 1 0.01176471
## [8] {Coffee,Mineral water} => {Alfajores} 0.01176471 1 0.01176471
## [9] {Alfajores,Brownie} => {Bread} 0.01176471 1 0.01176471
## [10] {Bread,Brownie} => {Alfajores} 0.01176471 1 0.01176471
## [11] {Cookies,Juice} => {Tea} 0.01176471 1 0.01176471
## [12] {Juice,Tea} => {Cookies} 0.01176471 1 0.01176471
## [13] {Hot chocolate,Medialuna} => {Coffee} 0.01176471 1 0.01176471
## [14] {Alfajores,Pastry} => {Tea} 0.01176471 1 0.01176471
## [15] {Bread,Hot chocolate} => {Tea} 0.01176471 1 0.01176471
## lift count
## [1] 2.881356 2
## [2] 8.095238 2
## [3] 4.473684 2
## [4] 12.142857 3
## [5] 4.473684 2
## [6] 8.095238 2
## [7] 2.881356 2
## [8] 12.142857 2
## [9] 3.953488 2
## [10] 12.142857 2
## [11] 4.473684 2
## [12] 9.444444 2
## [13] 2.881356 2
## [14] 4.473684 2
## [15] 4.473684 2
plot(rules_weekday_evening)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(rules_weekday_evening,method = "grouped")
plot(rules_weekday_evening,method = "graph")
plot(rules_weekday_evening,method = "paracoord")
As we can see, there are 15 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekday evening transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with Alfajores or Coke.
In weekday morning, we set min. Support in 0.005 and min. Confident in 0.8
bread_weekend_morning_df <- bread_weekend_df %>%
filter(period_day=="morning")
str(bread_weekend_morning_df)
## tibble [3,230 x 6] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:3230] 1 2 2 3 3 3 4 5 5 5 ...
## $ Item : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 12 75 75 49 50 27 61 24 66 12 ...
## $ date_time : chr [1:3230] "30-10-2016 09:58" "30-10-2016 10:05" "30-10-2016 10:05" "30-10-2016 10:07" ...
## $ period_day : chr [1:3230] "morning" "morning" "morning" "morning" ...
## $ weekday_weekend: chr [1:3230] "weekend" "weekend" "weekend" "weekend" ...
## $ Date : Date[1:3230], format: "0030-10-20" "0030-10-20" ...
head(bread_weekend_morning_df)
## # A tibble: 6 x 6
## Transaction Item date_time period_day weekday_weekend Date
## <dbl> <fct> <chr> <chr> <chr> <date>
## 1 1 Bread 30-10-2016 09:~ morning weekend 0030-10-20
## 2 2 Scandinavian 30-10-2016 10:~ morning weekend 0030-10-20
## 3 2 Scandinavian 30-10-2016 10:~ morning weekend 0030-10-20
## 4 3 Hot chocola~ 30-10-2016 10:~ morning weekend 0030-10-20
## 5 3 Jam 30-10-2016 10:~ morning weekend 0030-10-20
## 6 3 Cookies 30-10-2016 10:~ morning weekend 0030-10-20
summary(bread_weekend_morning_df)
## Transaction Item date_time period_day
## Min. : 1 Coffee : 882 Length:3230 Length:3230
## 1st Qu.:2186 Bread : 623 Class :character Class :character
## Median :5124 Pastry : 213 Mode :character Mode :character
## Mean :4941 Medialuna: 195
## 3rd Qu.:7234 Tea : 128
## Max. :9665 Cake : 112
## (Other) :1077
## weekday_weekend Date
## Length:3230 Min. :0001-01-20
## Class :character 1st Qu.:0007-01-20
## Mode :character Median :0015-01-20
## Mean :0015-12-01
## 3rd Qu.:0024-12-20
## Max. :0031-12-20
##
itemlist_weekend_morning <- ddply(bread_weekend_morning_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist_weekend_morning$Transaction <- NULL
colnames(itemlist_weekend_morning) <- c("items")
# Write to csv file
write.csv(itemlist_weekend_morning, "data/bread basket weekend morning.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
bread_weekend_morning_tr_df <- read.transactions("data/bread basket weekend morning.csv",
format = "basket",
sep = ",")
bread_weekend_morning_tr_df
## transactions in sparse format with
## 1456 transactions (rows) and
## 1523 items (columns)
summary(bread_weekend_morning_tr_df)
## transactions as itemMatrix in sparse format with
## 1456 rows (elements/itemsets/transactions) and
## 1523 columns (items) and a density of 0.001986482
##
## most frequent items:
## Coffee Bread Pastry Medialuna Tea (Other)
## 711 572 200 183 123 2616
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 10
## 1 592 449 272 92 33 11 5 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 3.025 4.000 10.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
itemFrequencyPlot(bread_weekend_morning_tr_df, topN = 10)
rules_weekend_morning <- apriori(bread_weekend_morning_tr_df, parameter = list(supp = 0.005,
conf = 0.8, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.005 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 7
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1523 item(s), 1456 transaction(s)] done [0.00s].
## sorting and recoding items ... [34 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [5 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_weekend_morning <- sort(rules_weekend_morning, by="confidence", decreasing = TRUE)
rules_weekend_morning
## set of 5 rules
inspect(rules_weekend_morning)
## lhs rhs support confidence coverage lift
## [1] {The Nomad} => {Coffee} 0.010302198 0.9375000 0.010989011 1.919831
## [2] {Keeping It Local} => {Coffee} 0.006868132 0.9090909 0.007554945 1.861655
## [3] {Smoothies} => {Coffee} 0.006868132 0.9090909 0.007554945 1.861655
## [4] {Cake,Medialuna} => {Coffee} 0.005494505 0.8888889 0.006181319 1.820284
## [5] {Spanish Brunch} => {Coffee} 0.019917582 0.8529412 0.023351648 1.746670
## count
## [1] 15
## [2] 10
## [3] 10
## [4] 8
## [5] 29
plot(rules_weekend_morning)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(rules_weekend_morning,method = "grouped")
plot(rules_weekend_morning,method = "graph")
plot(rules_weekend_morning,method = "paracoord")
As we can see, there are 5 rules and coffee still is the best selling product at the cafe, followed by bread and pastry in the weekend morning transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with Spanish Brunch and The Nomad.
In weekend afternoon, we set min. Support in 0.004 and min. Confident in 0.7
bread_weekend_afternoon_df <- bread_weekend_df %>%
filter(period_day=="afternoon")
str(bread_weekend_afternoon_df)
## tibble [4,296 x 6] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:4296] 43 43 44 44 45 45 45 46 47 47 ...
## $ Item : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 75 41 24 56 24 49 56 24 34 52 ...
## $ date_time : chr [1:4296] "30-10-2016 12:00" "30-10-2016 12:00" "30-10-2016 12:05" "30-10-2016 12:05" ...
## $ period_day : chr [1:4296] "afternoon" "afternoon" "afternoon" "afternoon" ...
## $ weekday_weekend: chr [1:4296] "weekend" "weekend" "weekend" "weekend" ...
## $ Date : Date[1:4296], format: "0030-10-20" "0030-10-20" ...
head(bread_weekend_afternoon_df)
## # A tibble: 6 x 6
## Transaction Item date_time period_day weekday_weekend Date
## <dbl> <fct> <chr> <chr> <chr> <date>
## 1 43 Scandinavian 30-10-2016 12:~ afternoon weekend 0030-10-20
## 2 43 Fudge 30-10-2016 12:~ afternoon weekend 0030-10-20
## 3 44 Coffee 30-10-2016 12:~ afternoon weekend 0030-10-20
## 4 44 Medialuna 30-10-2016 12:~ afternoon weekend 0030-10-20
## 5 45 Coffee 30-10-2016 12:~ afternoon weekend 0030-10-20
## 6 45 Hot chocola~ 30-10-2016 12:~ afternoon weekend 0030-10-20
summary(bread_weekend_afternoon_df)
## Transaction Item date_time period_day
## Min. : 43 Coffee :1025 Length:4296 Length:4296
## 1st Qu.:2619 Bread : 601 Class :character Class :character
## Median :5543 Tea : 322 Mode :character Mode :character
## Mean :5215 Cake : 294
## 3rd Qu.:7577 Sandwich : 229
## Max. :9684 Hot chocolate: 138
## (Other) :1687
## weekday_weekend Date
## Length:4296 Min. :0001-04-20
## Class :character 1st Qu.:0007-01-20
## Mode :character Median :0014-01-20
## Mean :0015-04-10
## 3rd Qu.:0022-01-20
## Max. :0031-12-20
##
itemlist_weekend_afternoon <- ddply(bread_weekend_afternoon_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist_weekend_afternoon$Transaction <- NULL
colnames(itemlist_weekend_afternoon) <- c("items")
# Write to csv file
write.csv(itemlist_weekend_afternoon, "data/bread basket weekend afternoon.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
bread_weekend_afternoon_tr_df <- read.transactions("data/bread basket weekend afternoon.csv",
format = "basket",
sep = ",")
bread_weekend_afternoon_tr_df
## transactions in sparse format with
## 1765 transactions (rows) and
## 1841 items (columns)
summary(bread_weekend_afternoon_tr_df)
## transactions as itemMatrix in sparse format with
## 1765 rows (elements/itemsets/transactions) and
## 1841 columns (items) and a density of 0.001756035
##
## most frequent items:
## Coffee Bread Tea Cake Sandwich (Other)
## 817 563 302 279 191 3554
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10
## 1 645 501 331 176 81 22 5 2 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 3.233 4.000 10.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
itemFrequencyPlot(bread_weekend_afternoon_tr_df, topN = 10)
rules_weekend_afternoon <- apriori(bread_weekend_afternoon_tr_df, parameter = list(supp = 0.004,
conf = 0.7, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.004 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 7
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1841 item(s), 1765 transaction(s)] done [0.00s].
## sorting and recoding items ... [41 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_weekend_afternoon <- sort(rules_weekend_afternoon, by="confidence", decreasing = TRUE)
rules_weekend_afternoon
## set of 6 rules
inspect(rules_weekend_afternoon)
## lhs rhs support confidence coverage
## [1] {Juice,Toast} => {Coffee} 0.004532578 1.0000000 0.004532578
## [2] {Art Tray} => {Coffee} 0.004532578 0.8888889 0.005099150
## [3] {Sandwich,Soup} => {Coffee} 0.005665722 0.7692308 0.007365439
## [4] {Hot chocolate,Scone} => {Coffee} 0.005099150 0.7500000 0.006798867
## [5] {Cake,Tiffin} => {Coffee} 0.004532578 0.7272727 0.006232295
## [6] {Cookies,Hot chocolate} => {Coffee} 0.004532578 0.7272727 0.006232295
## lift count
## [1] 2.160343 8
## [2] 1.920305 8
## [3] 1.661802 10
## [4] 1.620257 9
## [5] 1.571158 8
## [6] 1.571158 8
plot(rules_weekend_afternoon)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(rules_weekend_afternoon,method = "grouped")
plot(rules_weekend_afternoon,method = "graph")
plot(rules_weekend_afternoon,method = "paracoord")
As we can see, there are 6 rules and coffee still is the best selling product at the cafe, followed by bread and tea in the weekend afternoon transactions. Through those rules, we can find out what products are bought along with coffee And it turns out they used to buy coffee with Sandwich or Soup and Hot chocolate or Scone.
In weekend afternoon, we set min. Support in 0.02 and min. Confident in 0.9
bread_weekend_evening_df <- bread_weekend_df %>%
filter(period_day=="evening")
str(bread_weekend_evening_df)
## tibble [164 x 6] (S3: tbl_df/tbl/data.frame)
## $ Transaction : num [1:164] 647 647 737 738 738 ...
## $ Item : Factor w/ 94 levels "Adjustment","Afternoon with the baker",..: 56 66 3 66 32 5 83 41 62 54 ...
## $ date_time : chr [1:164] "05-11-2016 18:45" "05-11-2016 18:45" "06-11-2016 17:15" "06-11-2016 17:19" ...
## $ period_day : chr [1:164] "evening" "evening" "evening" "evening" ...
## $ weekday_weekend: chr [1:164] "weekend" "weekend" "weekend" "weekend" ...
## $ Date : Date[1:164], format: "0005-11-20" "0005-11-20" ...
head(bread_weekend_evening_df)
## # A tibble: 6 x 6
## Transaction Item date_time period_day weekday_weekend Date
## <dbl> <fct> <chr> <chr> <chr> <date>
## 1 647 Medialuna 05-11-2016 18~ evening weekend 0005-11-20
## 2 647 Pastry 05-11-2016 18~ evening weekend 0005-11-20
## 3 737 Alfajores 06-11-2016 17~ evening weekend 0006-11-20
## 4 738 Pastry 06-11-2016 17~ evening weekend 0006-11-20
## 5 738 Dulce de Lec~ 06-11-2016 17~ evening weekend 0006-11-20
## 6 1273 Art Tray 13-11-2016 17~ evening weekend 0013-11-20
summary(bread_weekend_evening_df)
## Transaction Item date_time
## Min. : 647 Coffee :21 Length:164
## 1st Qu.:5993 Tshirt :21 Class :character
## Median :6018 Afternoon with the baker:15 Mode :character
## Mean :6119 Postcard :10
## 3rd Qu.:7663 Bread : 9
## Max. :9231 Tea : 9
## (Other) :79
## period_day weekday_weekend Date
## Length:164 Length:164 Min. :0001-04-20
## Class :character Class :character 1st Qu.:0004-02-20
## Mode :character Mode :character Median :0005-03-20
## Mean :0009-05-25
## 3rd Qu.:0012-07-28
## Max. :0031-12-20
##
itemlist_weekend_evening <- ddply(bread_weekend_evening_df, c("Transaction"),
function(df1)paste(df1$Item, collapse = ","))
itemlist_weekend_evening$Transaction <- NULL
colnames(itemlist_weekend_evening) <- c("items")
# Write to csv file
write.csv(itemlist_weekend_evening, "data/bread basket weekend evening.csv",
quote = FALSE,
row.names = TRUE)
# IMPORT AS TRANSACTION
bread_weekend_evening_tr_df <- read.transactions("data/bread basket weekend evening.csv",
format = "basket",
sep = ",")
bread_weekend_evening_tr_df
## transactions in sparse format with
## 93 transactions (rows) and
## 135 items (columns)
summary(bread_weekend_evening_tr_df)
## transactions as itemMatrix in sparse format with
## 93 rows (elements/itemsets/transactions) and
## 135 columns (items) and a density of 0.01943449
##
## most frequent items:
## Tshirt Coffee Afternoon with the baker
## 19 16 14
## Postcard Bread (Other)
## 9 8 178
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6
## 1 57 21 6 6 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 2.000 2.624 3.000 6.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 11
itemFrequencyPlot(bread_weekend_evening_tr_df, topN = 10)
rules_weekend_evening <- apriori(bread_weekend_evening_tr_df, parameter = list(supp = 0.02,
conf = 0.9, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.02 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 1
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[135 item(s), 93 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_weekend_evening <- sort(rules_weekend_evening, by="confidence", decreasing = TRUE)
rules_weekend_evening
## set of 4 rules
inspect(rules_weekend_evening)
## lhs rhs support confidence coverage lift
## [1] {Scone} => {Coffee} 0.02150538 1 0.02150538 5.81250
## [2] {Medialuna,Tea} => {Coffee} 0.02150538 1 0.02150538 5.81250
## [3] {Coffee,Medialuna} => {Tea} 0.02150538 1 0.02150538 13.28571
## [4] {Bread,Cake} => {Coffee} 0.02150538 1 0.02150538 5.81250
## count
## [1] 2
## [2] 2
## [3] 2
## [4] 2
plot(rules_weekend_evening)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(rules_weekend_evening,method = "grouped")
plot(rules_weekend_evening,method = "graph")
plot(rules_weekend_evening,method = "paracoord")
As we can see, there are 4 rules and T-shirt is the best selling product at the cafe followed by bread and tea in the weekend afternoon transactions. That’s really unique because the best selling product in that cafe is always coffee. This could be the consumers are coming in the weekend mostly with family so that they are excited to buy marchandise in that cafe followed by bread and tea in the weekend afternoon transactions.