DATA 624 Homework 10
Introduction
The assignment is to use R to mine the data for association rules. It should report support, confidence and lift and your top 10 rules by lift.
Market Basket Analysis
First I will read in the data and explore it a bit by looking at the top 10 items purchased
transactions <- read.transactions("GroceryDataSet.csv", sep=",")
itemFrequencyPlot(transactions, topN=10, type="absolute", main="Top 10 Items")
Whole milk is the most frequently purchased item.
In order to complete this market basket analysis, I will use the Apriori algorithm. I will print out the top 10 rules with their support, confidence and lift.
apriori(transactions, parameter=list(supp=0.001, conf=0.5) , control=list(verbose=FALSE)) %>%
DATAFRAME() %>%
arrange(desc(lift)) %>%
top_n(10) %>%
kable() %>%
kable_styling()
LHS | RHS | support | confidence | lift | count |
---|---|---|---|---|---|
{root vegetables,tropical fruit} | {other vegetables} | 0.0123030 | 0.5845411 | 3.020999 | 121 |
{rolls/buns,root vegetables} | {other vegetables} | 0.0122013 | 0.5020921 | 2.594890 | 120 |
{root vegetables,yogurt} | {other vegetables} | 0.0129131 | 0.5000000 | 2.584078 | 127 |
{root vegetables,yogurt} | {whole milk} | 0.0145399 | 0.5629921 | 2.203354 | 143 |
{domestic eggs,other vegetables} | {whole milk} | 0.0123030 | 0.5525114 | 2.162336 | 121 |
{rolls/buns,root vegetables} | {whole milk} | 0.0127097 | 0.5230126 | 2.046888 | 125 |
{other vegetables,pip fruit} | {whole milk} | 0.0135231 | 0.5175097 | 2.025351 | 133 |
{tropical fruit,yogurt} | {whole milk} | 0.0151500 | 0.5173611 | 2.024770 | 149 |
{other vegetables,yogurt} | {whole milk} | 0.0222674 | 0.5128806 | 2.007235 | 219 |
{other vegetables,whipped/sour cream} | {whole milk} | 0.0146416 | 0.5070423 | 1.984385 | 144 |
Cluster Analysis
I am going to look for item groupings. I will use a network graph to preform the cluster analysis. First I will need to create a network graph from the transaction data. The I will detect the communities in the graph using the Louvain algorthym.
temp <- read.csv("GroceryDataSet.csv", header = FALSE) %>%
mutate(shoper_id = row_number()) %>%
pivot_longer(-shoper_id) %>%
filter(value != "") %>%
select(-name)
louvain_communities <- temp %>%
rename(to = value, from = shoper_id) %>%
graph_from_data_frame(directed = FALSE) %>%
cluster_louvain() %>%
communities()
Now that all customers and items are assigned to one of 19 clusters I will summarize them by naming them the items and giving the count of the shopers in the cluster.
items <- as.character(unique(temp$value))
cluster_df <- data.frame(name = c(NA), members = c(NA)) %>% na.omit()
for (i in 1:length(louvain_communities)){
cluster_name <- paste0(i,": ")
cluster_members <- 0
for (member in louvain_communities[[i]]){
if (member %in% items){
cluster_name <- paste0(cluster_name, member, " + ")
} else {
cluster_members <- cluster_members + 1
}
}
cluster_name <- substr(cluster_name,1,nchar(cluster_name)-3)
cluster_df <- rbind(cluster_df, data.frame(name = cluster_name, members = cluster_members))
}
cluster_df %>%
arrange(desc(members)) %>%
kable() %>%
kable_styling()
name | members |
---|---|
8: chocolate + soda + specialty bar + pastry + salty snack + waffles + candy + dessert + chocolate marshmallow + specialty chocolate + popcorn + cake bar + snack products + finished products + make up remover + potato products + hair spray + light bulbs + baby food + tidbits | 1292 |
10: other vegetables + rice + abrasive cleaner + flour + beef + chicken + root vegetables + bathroom cleaner + spices + pork + turkey + oil + curd cheese + onions + herbs + dog food + frozen fish + salad dressing + vinegar + roll products + frozen fruits | 1087 |
12: ready soups + rolls/buns + frankfurter + sausage + spread cheese + hard cheese + canned fish + seasonal products + frozen potato products + sliced cheese + soft cheese + meat + mustard + mayonnaise + nut snack + ketchup + cream | 1053 |
13: whole milk + butter + cereals + curd + detergent + hamburger meat + flower (seeds) + canned vegetables + pasta + softener + Instant food products + honey + cocoa drinks + cleaner + soups + soap + pudding powder | 857 |
5: liquor (appetizer) + canned beer + shopping bags + misc. beverages + chewing gum + brandy + liqueur + whisky | 730 |
7: yogurt + cream cheese + meat spreads + packaged fruit/vegetables + butter milk + berries + whipped/sour cream + baking powder + specialty cheese + instant coffee + organic sausage + cooking chocolate + kitchen utensil | 674 |
4: tropical fruit + pip fruit + white bread + processed cheese + sweet spreads + beverages + ham + cookware + tea + syrup + baby cosmetics + specialty vegetables + sound storage medium | 624 |
15: citrus fruit + hygiene articles + domestic eggs + cat food + cling film/bags + canned fruit + dental care + flower soil/fertilizer + female sanitary products + dish cleaner + house keeping products + rubbing alcohol + preservation products | 569 |
16: bottled beer + red/blush wine + prosecco + liquor + rum | 432 |
11: UHT-milk + bottled water + white wine + male cosmetics | 349 |
2: long life bakery product + pot plants + fruit/vegetable juice + pickled vegetables + jam + bags | 341 |
3: semi-finished bread + newspapers + pet care + nuts/prunes + toilet cleaner | 298 |
6: dishes + napkins + grapes + zwieback + decalcifier | 293 |
1: coffee + condensed milk + sparkling wine + fish + kitchen towels | 287 |
18: sugar + frozen vegetables + salt + skin care + liver loaf + frozen chicken | 273 |
14: frozen dessert + ice cream + frozen meals | 262 |
9: margarine + artif. sweetener + specialty fat + candles + organic products | 207 |
17: brown bread + sauces | 128 |
19: photo/film | 79 |