In this mini-project, I explore association rules using data from a beauty products shop.
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected (Kotsiantis and Kanellopoulos 2006).
The questions that I explore in this analysis are;
Which beauty products have the highest demand? (See section 3)
Which combinations of beauty products are often purchased together? (See section 4)
How best can the owner of this beauty shop utilize this analysis? (See section 6)
R (R Core Team 2022), arules, arulesViz, Quarto, Data Science, Association Rules Mining.
Note, however, many rules generated using this association rules mining maybe trivial. It would require domain expertise sift through the output to spot actionable rules.
2Data
The file Cosmetics.csv contains 1000 records of transactions of cosmetics sales in a beauty shop (1: purchased; 0: no purchased). Each transaction is tied to an invoice so that the data shows the basket of items that each consumer bought on a given date.
Code
## read in the data cosmetics <-read_csv("Cosmetics.csv") %>%clean_names() %>%mutate(invoice_number =paste0("c", 1:nrow(.))) %>%pivot_longer(cols =-invoice_number, names_to ="item", values_to ="bought") %>%filter(bought ==1) %>%mutate(invoice_number =factor(invoice_number)) %>%group_by(invoice_number) %>%filter(!duplicated(item)) %>%ungroup()head(cosmetics, 10) %>%formatting_function(caption ="A Peek into the Data")
A Peek into the Data
invoice_number
item
bought
c1
blush
1
c1
nail_polish
1
c1
brushes
1
c1
concealer
1
c1
bronzer
1
c1
lip_liner
1
c1
mascara
1
c1
eyeliner
1
c2
nail_polish
1
c2
concealer
1
3Most Popular Items
After reading in the data and converting the data into a transactions object, I build an item frequency plot. The plot shows the most popular items in all transactions contained in the data. The 2 most popular items are foundation and lip gross.
Code
## Split the data by invoice numbermy_bundles <-split(cosmetics$item, cosmetics$invoice_number)my_bundles <-sapply(my_bundles, unique)
Code
trans <-as(my_bundles, "transactions")summary(trans)
transactions as itemMatrix in sparse format with
957 rows (elements/itemsets/transactions) and
14 columns (items) and a density of 0.33
most frequent items:
foundation lip_gloss eyeliner concealer eye_shadow (Other)
536 490 457 442 381 2080
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13
67 116 166 158 156 107 79 53 17 18 16 3 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 3.0 4.0 4.6 6.0 13.0
includes extended item information - examples:
labels
1 bag
2 blush
3 bronzer
includes extended transaction information - examples:
transactionID
1 c1
2 c10
3 c100
Next, I build an association rules model, setting the support value to 0.01 and the confidence value to 0.1. Based on the association rule results, I show the first eight rules after sorting the rules by their lift values.
Important
I define the terms support, confidence and lift later in section 6.
This rule means that consumers who buy the basket of goods on the left (bronzer,eyeliner,lip_liner,mascara,nail_polish) often buy brushes. But How often do they buy brushes? To answer this question we explore the values for support, confidence and lift for this rule.
Support: First the combination of bronzer, eyeliner, lip_liner, mascara, and nail_polish constitute 3% of all sales.
Confidence: When a customer buys bronzer, eyeliner, lip_liner, mascara, nail_polish, there is a 96% chance of buying brushes.
Lift: Having bronzer, eyeliner, lip_liner, mascara, nail_polish in a shopping basket raises the probability that a customer will buy brushes sixfold (6.2).
6Implications of the Analysis
In this section, I give the precise meaning of “support”, “confidence”, and “lift”.I then discuss the implications of the results above in the setting of association rules and business.
Support:
Support is the percentage of groups that contain all of the items listed in the association rule. In the first rule above, the support is 0.03, meaning that 3% of the transactions contained the items listed on the right (bronzer,eyeliner,lip_liner,mascara,nail_polish => brushes).
Implication of support: The support shows the volume of transactions that a product or groups of products as a percent of total transactions. When combined with other confidence and lift, we can focus on selling products that generate highest sales. In other words, support allows us to filter for meaningful rules. It would not be very useful to have a confidence of 99% but that constitutes only0.0001% of sales.
Confidence:
Confidence is the proportion of times that a customer buys item X given that she/he buys item Y. In our first rule, given that the customer buys bronzer,eyeliner,lip_liner,mascara,nail_polish => brushes then that customer bought brushes 96% of the time.
The implication of confidence: We can target customers buying bronzer,eyeliner,lip_liner,mascara,nail_polish ith bushes as they have a 96% chance of buying. Alternatively, in a store, these products could be placed close to each other.
Lift:
The lift value captures rule importance. Usually, its the confidence of a rule divided by the support of a product. It is the rise in probability of the purchase of product Y once we know that product X is in the basket. Again, the implication is that it helps managers decide on product placements in stores.
7Conclusion
In this mini-project, I have explored association rules using data from a beauty product shop. The analysis has implications for the conduct of business. However, many rules generated using this technique maybe trivial. It would require domain expertise to spot actionable rules.
8Technology and Packages Utilised
In this analysis, I have utilised Zorin OS, R, Quarto and the following R packages.
Kotsiantis, Sotiris, and Dimitris Kanellopoulos. 2006. “Association Rules Mining: A Recent Overview.”GESTS International Transactions on Computer Science and Engineering 32 (1): 71–82.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.