Library Used

> library(arules)
> library(readxl)
> library(dplyr)

Data Overview

The data was obtained from the Kaggle Market Basket Analysis Data The data consists of 999 rows and 17 columns. Values of the data include ‘TRUE’ and ‘FALSE’.

Import Data and Display

> data_ap = read_excel("C:/Users/acer/Downloads/Market_Basket_Optimisation.xlsx")[,2:17]
> 
> head(data_ap, 10)
# A tibble: 10 × 16
   Apple Bread Butter Cheese Corn  Dill  Eggs  `Ice cream` `Kidney Beans` Milk 
   <lgl> <lgl> <lgl>  <lgl>  <lgl> <lgl> <lgl> <lgl>       <lgl>          <lgl>
 1 FALSE TRUE  FALSE  FALSE  TRUE  TRUE  FALSE TRUE        FALSE          FALSE
 2 FALSE FALSE FALSE  FALSE  FALSE FALSE FALSE FALSE       FALSE          TRUE 
 3 TRUE  FALSE TRUE   FALSE  FALSE TRUE  FALSE TRUE        FALSE          TRUE 
 4 FALSE FALSE TRUE   TRUE   FALSE TRUE  FALSE FALSE       FALSE          TRUE 
 5 TRUE  TRUE  FALSE  FALSE  FALSE FALSE FALSE FALSE       FALSE          FALSE
 6 TRUE  TRUE  TRUE   TRUE   FALSE TRUE  FALSE TRUE        FALSE          FALSE
 7 FALSE FALSE TRUE   FALSE  FALSE FALSE TRUE  TRUE        TRUE           TRUE 
 8 TRUE  FALSE FALSE  TRUE   FALSE FALSE TRUE  FALSE       FALSE          FALSE
 9 TRUE  FALSE FALSE  FALSE  TRUE  TRUE  TRUE  TRUE        FALSE          TRUE 
10 TRUE  FALSE FALSE  FALSE  FALSE TRUE  TRUE  TRUE        FALSE          TRUE 
# ℹ 6 more variables: Nutmeg <lgl>, Onion <lgl>, Sugar <lgl>, Unicorn <lgl>,
#   Yogurt <lgl>, chocolate <lgl>

Preprocessing Data

> cols_to_change = c('Apple','Bread','Butter','Cheese','Corn','Dill','Eggs', 'Ice cream','Kidney Beans','Milk','Nutmeg','Onion','Sugar','Unicorn','Yogurt','chocolate')  
> 
> data_apriori = data_ap %>% mutate(across(all_of(cols_to_change), ~ifelse(. == 'TRUE', 1, 0)))
> 
> head(data_apriori, 10)
# A tibble: 10 × 16
   Apple Bread Butter Cheese  Corn  Dill  Eggs `Ice cream` `Kidney Beans`  Milk
   <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>       <dbl>          <dbl> <dbl>
 1     0     1      0      0     1     1     0           1              0     0
 2     0     0      0      0     0     0     0           0              0     1
 3     1     0      1      0     0     1     0           1              0     1
 4     0     0      1      1     0     1     0           0              0     1
 5     1     1      0      0     0     0     0           0              0     0
 6     1     1      1      1     0     1     0           1              0     0
 7     0     0      1      0     0     0     1           1              1     1
 8     1     0      0      1     0     0     1           0              0     0
 9     1     0      0      0     1     1     1           1              0     1
10     1     0      0      0     0     1     1           1              0     1
# ℹ 6 more variables: Nutmeg <dbl>, Onion <dbl>, Sugar <dbl>, Unicorn <dbl>,
#   Yogurt <dbl>, chocolate <dbl>

Create transactions object from data

> trans <- as(as.matrix(data_ap), "transactions")

Run the Apriori Algorithm

In this case, we’ll be using a minimum support of 20%. Support value indicates the popularity of an itemset, quantified by the proportion of transactions in which the itemset is present. Next, the minimum confidence we’ll use is 50%. Confidence value denotes the probability of item Y being bought when item X is purchased, represented as {X -> Y}. It is determined by the ratio of transactions containing item X where item Y is also present.

> rules <- apriori(trans, parameter = list(support = 0.2, confidence = 0.5))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.2      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 199 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[16 item(s), 999 transaction(s)] done [0.00s].
sorting and recoding items ... [16 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [3 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

Show the Result of the Rules

> inspect(rules)
    lhs            rhs         support   confidence coverage  lift     count
[1] {Milk}      => {chocolate} 0.2112112 0.5209877  0.4054054 1.236263 211  
[2] {chocolate} => {Milk}      0.2112112 0.5011876  0.4214214 1.236263 211  
[3] {Ice cream} => {Butter}    0.2072072 0.5048780  0.4104104 1.200889 207

Conclusion

With the use of a minimum support value of 20% and a minimum confidence value of 50%, three rules containing 2 items were obtained. The itemsets {Milk → chocolate} and {chocolate → Milk} both have support of 0.211. Therefore, it implies that 21.1% of customers who buy milk also purchases chocolate, and vice versa. Therefore, it implies that 21.1% of customers who buy milk also purchase chocolate, and vice versa. This information suggests a strong association between these two products, indicating that promoting chocolate to milk buyers (and vice versa) could increase sales.

Meanwhile, the itemset {Ice cream → Butter} has a support of 0.207. This implies that 20.7% of customers who purchase ice cream also buy butter. By knowing this, retailers can consider placing butter near the ice cream section, as there is a significant likelihood that customers purchasing ice cream may also be interested in buying butter.

This information enables targeted marketing strategies. Retailers can tailor promotions or discounts to consumers buying specific combinations of products. For instance, a discount on chocolate for customers purchasing milk or a bundled offer for ice cream and butter could be implemented to encourage certain purchasing behaviors.

All three rules have a leverage greater than 1, indicating usefulness in these rules. The larger the leverage value, the stronger the relationship between the itemsets in these rules.

In summary, with the knowledge gained from these association rules, businesses can provide actionable insights that can optimize their product offerings, and marketing strategies, ultimately leading to increased sales and improved customer satisfaction.

Market Basket Analysis

Kayla Aisya Zahra

2023-12-05