Imagine 10,000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’.
That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. The data set is attached.
Your assignment is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift.
Extra credit: do a simple cluster analysis on the data as well.
First I will read in the data and explore it a bit by looking at the top 10 items purchased
#Read data and Data Exploration
transactions <- read.transactions("GroceryDataSet.csv", sep=",")
itemFrequencyPlot(transactions, topN=10, main="Top 10")Based on the above, the most frequently purchased item is whole milk.
Thhe top 10 rules with their support, confidence and lift will be displayed below using the Apriori method. Apriori is used for frequent item set mining and association rule learning over relational databases.
apriori(transactions, parameter=list(supp=0.001, conf=0.5) , control=list(verbose=FALSE)) %>%
DATAFRAME() %>%
arrange(desc(lift)) %>%
top_n(10) %>%
kable() %>%
kable_styling()## Selecting by count
| LHS | RHS | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|---|
| {root vegetables,tropical fruit} | {other vegetables} | 0.0123030 | 0.5845411 | 0.0210473 | 3.020999 | 121 |
| {rolls/buns,root vegetables} | {other vegetables} | 0.0122013 | 0.5020921 | 0.0243010 | 2.594890 | 120 |
| {root vegetables,yogurt} | {other vegetables} | 0.0129131 | 0.5000000 | 0.0258261 | 2.584078 | 127 |
| {root vegetables,yogurt} | {whole milk} | 0.0145399 | 0.5629921 | 0.0258261 | 2.203354 | 143 |
| {domestic eggs,other vegetables} | {whole milk} | 0.0123030 | 0.5525114 | 0.0222674 | 2.162336 | 121 |
| {rolls/buns,root vegetables} | {whole milk} | 0.0127097 | 0.5230126 | 0.0243010 | 2.046888 | 125 |
| {other vegetables,pip fruit} | {whole milk} | 0.0135231 | 0.5175097 | 0.0261312 | 2.025351 | 133 |
| {tropical fruit,yogurt} | {whole milk} | 0.0151500 | 0.5173611 | 0.0292832 | 2.024770 | 149 |
| {other vegetables,yogurt} | {whole milk} | 0.0222674 | 0.5128806 | 0.0434164 | 2.007235 | 219 |
| {other vegetables,whipped/sour cream} | {whole milk} | 0.0146416 | 0.5070423 | 0.0288765 | 1.984385 | 144 |
I use a network graph to preform the cluster analysis and determine groupings and communities using the Louvain method. Louvain is used for community detection is a method to extract communities from large networks.
temp <- read.csv("GroceryDataSet.csv", header = FALSE) %>%
mutate(shoper_id = row_number()) %>%
pivot_longer(-shoper_id) %>%
filter(value != "") %>%
select(-name)
louvain_communities <- temp %>%
rename(to = value, from = shoper_id) %>%
graph_from_data_frame(directed = FALSE) %>%
cluster_louvain() %>%
communities()The communities will be summarized by naming items and showing the count of the shoppers in the cluster.
items <- as.character(unique(temp$value))
cluster_df <- data.frame(name = c(NA), members = c(NA)) %>% na.omit()
for (i in 1:length(louvain_communities)){
cluster_name <- paste0(i,": ")
cluster_members <- 0
for (member in louvain_communities[[i]]){
if (member %in% items){
cluster_name <- paste0(cluster_name, member, " + ")
} else {
cluster_members <- cluster_members + 1
}
}
cluster_name <- substr(cluster_name,1,nchar(cluster_name)-3)
cluster_df <- rbind(cluster_df, data.frame(name = cluster_name, members = cluster_members))
}
cluster_df %>%
arrange(desc(members)) %>%
kable() %>%
kable_styling()| name | members |
|---|---|
| 8: chocolate + soda + specialty bar + pastry + salty snack + waffles + candy + dessert + chocolate marshmallow + specialty chocolate + popcorn + cake bar + snack products + finished products + make up remover + potato products + hair spray + light bulbs + baby food + tidbits | 1292 |
| 10: other vegetables + rice + abrasive cleaner + flour + beef + chicken + root vegetables + bathroom cleaner + spices + pork + turkey + oil + curd cheese + onions + herbs + dog food + frozen fish + salad dressing + vinegar + roll products + frozen fruits | 1087 |
| 12: ready soups + rolls/buns + frankfurter + sausage + spread cheese + hard cheese + canned fish + seasonal products + frozen potato products + sliced cheese + soft cheese + meat + mustard + mayonnaise + nut snack + ketchup + cream | 1053 |
| 13: whole milk + butter + cereals + curd + detergent + hamburger meat + flower (seeds) + canned vegetables + pasta + softener + Instant food products + honey + cocoa drinks + cleaner + soups + soap + pudding powder | 857 |
| 5: liquor (appetizer) + canned beer + shopping bags + misc. beverages + chewing gum + brandy + liqueur + whisky | 730 |
| 7: yogurt + cream cheese + meat spreads + packaged fruit/vegetables + butter milk + berries + whipped/sour cream + baking powder + specialty cheese + instant coffee + organic sausage + cooking chocolate + kitchen utensil | 674 |
| 4: tropical fruit + pip fruit + white bread + processed cheese + sweet spreads + beverages + ham + cookware + tea + syrup + baby cosmetics + specialty vegetables + sound storage medium | 624 |
| 15: citrus fruit + hygiene articles + domestic eggs + cat food + cling film/bags + canned fruit + dental care + flower soil/fertilizer + female sanitary products + dish cleaner + house keeping products + rubbing alcohol + preservation products | 569 |
| 16: bottled beer + red/blush wine + prosecco + liquor + rum | 432 |
| 11: UHT-milk + bottled water + white wine + male cosmetics | 349 |
| 2: long life bakery product + pot plants + fruit/vegetable juice + pickled vegetables + jam + bags | 341 |
| 3: semi-finished bread + newspapers + pet care + nuts/prunes + toilet cleaner | 298 |
| 6: dishes + napkins + grapes + zwieback + decalcifier | 293 |
| 1: coffee + condensed milk + sparkling wine + fish + kitchen towels | 287 |
| 18: sugar + frozen vegetables + salt + skin care + liver loaf + frozen chicken | 273 |
| 14: frozen dessert + ice cream + frozen meals | 262 |
| 9: margarine + artif. sweetener + specialty fat + candles + organic products | 207 |
| 17: brown bread + sauces | 128 |
| 19: photo/film | 79 |