> library(arules)
> library(readxl)
> library(dplyr)
The data was obtained from the Kaggle Market Basket Analysis Data The data consists of 999 rows and 17 columns. Values of the data include ‘TRUE’ and ‘FALSE’.
> data_ap = read_excel("C:/Users/acer/Downloads/Market_Basket_Optimisation.xlsx")[,2:17]
>
> head(data_ap, 10)
# A tibble: 10 × 16
Apple Bread Butter Cheese Corn Dill Eggs `Ice cream` `Kidney Beans` Milk
<lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 FALSE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
2 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
3 TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
4 FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
5 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
6 TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
7 FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
8 TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
9 TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE
10 TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE
# ℹ 6 more variables: Nutmeg <lgl>, Onion <lgl>, Sugar <lgl>, Unicorn <lgl>,
# Yogurt <lgl>, chocolate <lgl>
> cols_to_change = c('Apple','Bread','Butter','Cheese','Corn','Dill','Eggs', 'Ice cream','Kidney Beans','Milk','Nutmeg','Onion','Sugar','Unicorn','Yogurt','chocolate')
>
> data_apriori = data_ap %>% mutate(across(all_of(cols_to_change), ~ifelse(. == 'TRUE', 1, 0)))
>
> head(data_apriori, 10)
# A tibble: 10 × 16
Apple Bread Butter Cheese Corn Dill Eggs `Ice cream` `Kidney Beans` Milk
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 1 0 0 1 1 0 1 0 0
2 0 0 0 0 0 0 0 0 0 1
3 1 0 1 0 0 1 0 1 0 1
4 0 0 1 1 0 1 0 0 0 1
5 1 1 0 0 0 0 0 0 0 0
6 1 1 1 1 0 1 0 1 0 0
7 0 0 1 0 0 0 1 1 1 1
8 1 0 0 1 0 0 1 0 0 0
9 1 0 0 0 1 1 1 1 0 1
10 1 0 0 0 0 1 1 1 0 1
# ℹ 6 more variables: Nutmeg <dbl>, Onion <dbl>, Sugar <dbl>, Unicorn <dbl>,
# Yogurt <dbl>, chocolate <dbl>
> trans <- as(as.matrix(data_ap), "transactions")
In this case, we’ll be using a minimum support of 20%. Support value indicates the popularity of an itemset, quantified by the proportion of transactions in which the itemset is present. Next, the minimum confidence we’ll use is 50%. Confidence value denotes the probability of item Y being bought when item X is purchased, represented as {X -> Y}. It is determined by the ratio of transactions containing item X where item Y is also present.
> rules <- apriori(trans, parameter = list(support = 0.2, confidence = 0.5))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.5 0.1 1 none FALSE TRUE 5 0.2 1
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 199
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[16 item(s), 999 transaction(s)] done [0.00s].
sorting and recoding items ... [16 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [3 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
> inspect(rules)
lhs rhs support confidence coverage lift count
[1] {Milk} => {chocolate} 0.2112112 0.5209877 0.4054054 1.236263 211
[2] {chocolate} => {Milk} 0.2112112 0.5011876 0.4214214 1.236263 211
[3] {Ice cream} => {Butter} 0.2072072 0.5048780 0.4104104 1.200889 207
With the use of a minimum support value of 20% and a minimum confidence value of 50%, three rules containing 2 items were obtained. The itemsets {Milk → chocolate} and {chocolate → Milk} both have support of 0.211. Therefore, it implies that 21.1% of customers who buy milk also purchases chocolate, and vice versa. Therefore, it implies that 21.1% of customers who buy milk also purchase chocolate, and vice versa. This information suggests a strong association between these two products, indicating that promoting chocolate to milk buyers (and vice versa) could increase sales.
Meanwhile, the itemset {Ice cream → Butter} has a support of 0.207. This implies that 20.7% of customers who purchase ice cream also buy butter. By knowing this, retailers can consider placing butter near the ice cream section, as there is a significant likelihood that customers purchasing ice cream may also be interested in buying butter.
This information enables targeted marketing strategies. Retailers can tailor promotions or discounts to consumers buying specific combinations of products. For instance, a discount on chocolate for customers purchasing milk or a bundled offer for ice cream and butter could be implemented to encourage certain purchasing behaviors.
All three rules have a leverage greater than 1, indicating usefulness in these rules. The larger the leverage value, the stronger the relationship between the itemsets in these rules.
In summary, with the knowledge gained from these association rules, businesses can provide actionable insights that can optimize their product offerings, and marketing strategies, ultimately leading to increased sales and improved customer satisfaction.