The aim of this assigment is to use is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift.
## citrus.fruit semi.finished.bread margarine ready.soups
## 1 tropical fruit yogurt coffee
## 2 whole milk
## 3 pip fruit yogurt cream cheese meat spreads
## 4 other vegetables whole milk condensed milk long life bakery product
## 5 whole milk butter yogurt rice
## 6 rolls/buns
## X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9 X.10 X.11 X.12 X.13 X.14
## 1
## 2
## 3
## 4
## 5 abrasive cleaner
## 6
## X.15 X.16 X.17 X.18 X.19 X.20 X.21 X.22 X.23 X.24 X.25 X.26 X.27
## 1
## 2
## 3
## 4
## 5
## 6
we need to first read in and explore the data by looking at the top 15 item purchase.
data.df <- read.transactions("GroceryDataSet.csv", sep=",")
itemFrequencyPlot(data.df, topN=15, type="absolute", main="Top 15 Items", col=rainbow(15))
Whole milk is the most frequently purchased item.
In order to complete this market basket analysis, the Apriori algorithm is initiated to print out the top 10 rules with their support, confidence and lift. To find the association rules, we will use the ‘apriori’ function.
rules<- apriori(data.df, parameter=list(supp=0.001, conf=0.5) , control=list(verbose=FALSE))
summary(rules)## set of 5668 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4 5 6
## 11 1461 3211 939 46
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 3.00 4.00 3.92 4.00 6.00
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001017 Min. :0.5000 Min. :0.001017 Min. : 1.957
## 1st Qu.:0.001118 1st Qu.:0.5455 1st Qu.:0.001729 1st Qu.: 2.464
## Median :0.001322 Median :0.6000 Median :0.002135 Median : 2.899
## Mean :0.001668 Mean :0.6250 Mean :0.002788 Mean : 3.262
## 3rd Qu.:0.001729 3rd Qu.:0.6842 3rd Qu.:0.002949 3rd Qu.: 3.691
## Max. :0.022267 Max. :1.0000 Max. :0.043416 Max. :18.996
## count
## Min. : 10.0
## 1st Qu.: 11.0
## Median : 13.0
## Mean : 16.4
## 3rd Qu.: 17.0
## Max. :219.0
##
## mining info:
## data ntransactions support confidence
## data.df 9835 0.001 0.5
## call
## apriori(data = data.df, parameter = list(supp = 0.001, conf = 0.5), control = list(verbose = FALSE))
apriori(data.df, parameter=list(supp=0.001, conf=0.5) , control=list(verbose=FALSE)) %>%
DATAFRAME() %>%
arrange(desc(lift)) %>%
top_n(10) %>%
kable() %>%
kable_styling()## Selecting by count
| LHS | RHS | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|---|
| {root vegetables,tropical fruit} | {other vegetables} | 0.0123030 | 0.5845411 | 0.0210473 | 3.020999 | 121 |
| {rolls/buns,root vegetables} | {other vegetables} | 0.0122013 | 0.5020921 | 0.0243010 | 2.594890 | 120 |
| {root vegetables,yogurt} | {other vegetables} | 0.0129131 | 0.5000000 | 0.0258261 | 2.584078 | 127 |
| {root vegetables,yogurt} | {whole milk} | 0.0145399 | 0.5629921 | 0.0258261 | 2.203354 | 143 |
| {domestic eggs,other vegetables} | {whole milk} | 0.0123030 | 0.5525114 | 0.0222674 | 2.162336 | 121 |
| {rolls/buns,root vegetables} | {whole milk} | 0.0127097 | 0.5230126 | 0.0243010 | 2.046888 | 125 |
| {other vegetables,pip fruit} | {whole milk} | 0.0135231 | 0.5175097 | 0.0261312 | 2.025351 | 133 |
| {tropical fruit,yogurt} | {whole milk} | 0.0151500 | 0.5173611 | 0.0292832 | 2.024770 | 149 |
| {other vegetables,yogurt} | {whole milk} | 0.0222674 | 0.5128806 | 0.0434164 | 2.007235 | 219 |
| {other vegetables,whipped/sour cream} | {whole milk} | 0.0146416 | 0.5070423 | 0.0288765 | 1.984385 | 144 |
We are to look for item groupings. A network graph can be use to preform the cluster analysis. First I will need to create a network graph from the transaction data. The I will detect the communities in the graph using the Louvain algorthym.
temp <- read.csv("GroceryDataSet.csv", header = FALSE) %>%
mutate(shoper_id = row_number()) %>%
pivot_longer(-shoper_id) %>%
filter(value != "") %>%
select(-name)
louvain_communities <- temp %>%
rename(to = value, from = shoper_id) %>%
graph_from_data_frame(directed = FALSE) %>%
cluster_louvain() %>%
communities()items <- as.character(unique(temp$value))
cluster_df <- data.frame(name = c(NA), members = c(NA)) %>% na.omit()
for (i in 1:length(louvain_communities)){
cluster_name <- paste0(i,": ")
cluster_members <- 0
for (member in louvain_communities[[i]]){
if (member %in% items){
cluster_name <- paste0(cluster_name, member, " + ")
} else {
cluster_members <- cluster_members + 1
}
}
cluster_name <- substr(cluster_name,1,nchar(cluster_name)-3)
cluster_df <- rbind(cluster_df, data.frame(name = cluster_name, members = cluster_members))
}
cluster_df %>%
arrange(desc(members)) %>%
kable() %>%
kable_styling()| name | members |
|---|---|
| 5: citrus fruit + other vegetables + rice + abrasive cleaner + beef + chicken + root vegetables + spices + pork + turkey + curd cheese + canned vegetables + onions + herbs + specialty cheese + dog food + frozen fish + salad dressing + vinegar + roll products + rubbing alcohol + jam + toilet cleaner + preservation products | 1147 |
| 9: chocolate + soda + specialty bar + pastry + waffles + candy + beverages + chocolate marshmallow + frozen potato products + cake bar + snack products + finished products + potato products + baby food + tidbits + bags + sound storage medium | 1104 |
| 3: margarine + whole milk + butter + cereals + curd + flour + sugar + detergent + whipped/sour cream + baking powder + specialty fat + flower (seeds) + salt + honey + cocoa drinks + skin care + soups + rum + soap + organic sausage + pudding powder + frozen fruits + cooking chocolate | 1067 |
| 1: ready soups + rolls/buns + frankfurter + sausage + spread cheese + hard cheese + cat food + canned fish + sliced cheese + soft cheese + meat + mustard + mayonnaise + organic products + nut snack + kitchen towels + cream | 1022 |
| 14: shopping bags + misc. beverages + chewing gum + specialty chocolate + sparkling wine + brandy + liqueur + whisky | 517 |
| 4: yogurt + cream cheese + meat spreads + packaged fruit/vegetables + butter milk + bathroom cleaner + berries + fish + instant coffee + frozen chicken + kitchen utensil | 507 |
| 6: liquor (appetizer) + canned beer + candles | 433 |
| 16: frozen dessert + ice cream + frozen vegetables + frozen meals + cleaner + liver loaf | 408 |
| 15: bottled beer + red/blush wine + prosecco + liquor | 394 |
| 18: hygiene articles + domestic eggs + oil + canned fruit + dish cleaner + house keeping products + baby cosmetics + ketchup | 382 |
| 11: UHT-milk + bottled water + artif. sweetener + white wine + male cosmetics | 377 |
| 7: long life bakery product + pot plants + fruit/vegetable juice + sweet spreads + pickled vegetables | 352 |
| 8: tropical fruit + white bread + processed cheese + ham + tea + syrup + specialty vegetables | 341 |
| 13: dishes + napkins + grapes + zwieback + light bulbs + decalcifier | 329 |
| 10: brown bread + hamburger meat + pasta + Instant food products + sauces + hair spray | 323 |
| 19: semi-finished bread + newspapers + pet care + nuts/prunes | 291 |
| 2: coffee + condensed milk + cling film/bags + female sanitary products | 269 |
| 17: pip fruit + photo/film + softener + cookware | 231 |
| 20: salty snack + dental care + popcorn + make up remover | 175 |
| 12: dessert + seasonal products + flower soil/fertilizer | 166 |