Data 624 HW 10
library(knitr)
library(rmdformats)
## Global options
options(max.print="85")
opts_chunk$set(cache=TRUE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)
opts_knit$set(width=35)
library(arules)## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
##
## Attaching package: 'igraph'
## The following object is masked from 'package:arules':
##
## union
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:igraph':
##
## as_data_frame, groups, union
## The following object is masked from 'package:kableExtra':
##
## group_rows
## The following objects are masked from 'package:arules':
##
## intersect, recode, setdiff, setequal, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:igraph':
##
## crossing
## The following objects are masked from 'package:Matrix':
##
## expand, pack, unpack
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
Description of problem
The assignment is to use R to mine the data for association rules. It should report support, confidence and lift and your top 10 rules by lift.
Loading of data
transactions <- read.transactions("GroceryDataSet.csv", sep = ",")
itemFrequencyPlot(transactions, topN = 10, type = "absolute", main = "Top 10 Items sold")Whole milk is the overwhelming best seller in terms of absolute unit sold.
Market Basket Analysis
# default min support of 0.01 didn't work. Need to adjust the level of support to
# be 0.001
apriori(transactions, parameter = list(support = 0.001, conf = 0.95), control = list(verbose = FALSE)) %>%
DATAFRAME() %>% arrange(desc(lift)) %>% top_n(10) %>% kable() %>% kable_styling()| LHS | RHS | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|---|
| {brown bread,pip fruit,whipped/sour cream} | {other vegetables} | 0.0011185 | 1 | 0.0011185 | 5.168156 | 11 |
| {ham,pip fruit,tropical fruit,whole milk} | {other vegetables} | 0.0011185 | 1 | 0.0011185 | 5.168156 | 11 |
| {citrus fruit,root vegetables,tropical fruit,whipped/sour cream} | {other vegetables} | 0.0012201 | 1 | 0.0012201 | 5.168156 | 12 |
| {rice,sugar} | {whole milk} | 0.0012201 | 1 | 0.0012201 | 3.913649 | 12 |
| {canned fish,hygiene articles} | {whole milk} | 0.0011185 | 1 | 0.0011185 | 3.913649 | 11 |
| {flour,root vegetables,whipped/sour cream} | {whole milk} | 0.0017285 | 1 | 0.0017285 | 3.913649 | 17 |
| {cream cheese,domestic eggs,sugar} | {whole milk} | 0.0011185 | 1 | 0.0011185 | 3.913649 | 11 |
| {cream cheese,domestic eggs,napkins} | {whole milk} | 0.0011185 | 1 | 0.0011185 | 3.913649 | 11 |
| {oil,root vegetables,tropical fruit,yogurt} | {whole milk} | 0.0011185 | 1 | 0.0011185 | 3.913649 | 11 |
| {oil,other vegetables,root vegetables,yogurt} | {whole milk} | 0.0014235 | 1 | 0.0014235 | 3.913649 | 14 |
| {butter,domestic eggs,other vegetables,whipped/sour cream} | {whole milk} | 0.0012201 | 1 | 0.0012201 | 3.913649 | 12 |
| {bottled water,other vegetables,pip fruit,root vegetables} | {whole milk} | 0.0011185 | 1 | 0.0011185 | 3.913649 | 11 |
As it’s shown above, the top 10 rules along with its support, confidence, and lift are presented.
Cluster Analysis
grocery_data <- read.csv("GroceryDataSet.csv", header = FALSE) %>% mutate(shopper_id = row_number()) %>%
pivot_longer(-shopper_id) %>% filter(value != "") %>% select(-name)# A tibble: 6 x 2
shopper_id value
<int> <fct>
1 1 citrus fruit
2 1 semi-finished bread
3 1 margarine
4 1 ready soups
5 2 tropical fruit
6 2 yogurt
Leverage Louvain community detection in R to plot out the clusters of shoppers by item groupings.
Printing the sizes of the clusterlouvain
Community sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
292 347 303 637 738 298 687 1312 212 1108 353 1070 874 265 582 437
17 18 19
130 279 80
Community Detection with the functions cluster_louvain() and cluster_infomap()
Get a list with all communities and their members using communities( )
Look to which community the characters are associated using membership()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
12 1 13 7 1 10 12 5 2 13 4 10 10 12 10 18 2 7 8 8 10 7 13 13 10 5
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
7 8 10 17 4 10 4 9 13 8 2 5 8 6 8 13 7 5 12 1 8 12 5 12 15 5
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
6 10 2 13 7 8 15 18 12 8 10 18 8 13 10 16 13 12 16 9 1 7 12 18 7 18
79 80 81 82 83 84 85
18 14 8 10 12 9 11
[ reached getOption("max.print") -- omitted 9919 entries ]
Results of the Louvain Community Detection Method: Clusters of item purchased
items <- as.character(unique(grocery_data$value))
cluster_df <- data.frame(Name = c(NA), Transactions = c(NA)) %>% na.omit()
for (i in 1:length(louvain_communities)) {
cluster_name <- paste0("Community id: ", i, " -- ")
cluster_members <- 0
for (member in louvain_communities[[i]]) {
if (member %in% items) {
cluster_name <- paste0(cluster_name, member, " + ")
} else {
cluster_members <- cluster_members + 1
}
}
cluster_str_loc <- str_locate_all(pattern = " -- ", cluster_name)
cluster_name <- substr(cluster_name, 1, nchar(cluster_name) - 2)
cluster_df <- rbind(cluster_df, data.frame(Name = cluster_name, Transactions = cluster_members))
}
cluster_df %>% arrange(desc(Transactions)) %>% kable(format = "html", caption = "<B>Clusters</B>") %>%
kable_styling()| Name | Transactions |
|---|---|
| Community id: 8 – chocolate + soda + specialty bar + pastry + salty snack + waffles + candy + dessert + chocolate marshmallow + specialty chocolate + popcorn + cake bar + snack products + finished products + make up remover + potato products + hair spray + light bulbs + baby food + tidbits | 1292 |
| Community id: 10 – other vegetables + rice + abrasive cleaner + flour + beef + chicken + root vegetables + bathroom cleaner + spices + pork + turkey + oil + curd cheese + onions + herbs + dog food + frozen fish + salad dressing + vinegar + roll products + frozen fruits | 1087 |
| Community id: 12 – ready soups + rolls/buns + frankfurter + sausage + spread cheese + hard cheese + canned fish + seasonal products + frozen potato products + sliced cheese + soft cheese + meat + mustard + mayonnaise + nut snack + ketchup + cream | 1053 |
| Community id: 13 – whole milk + butter + cereals + curd + detergent + hamburger meat + flower (seeds) + canned vegetables + pasta + softener + Instant food products + honey + cocoa drinks + cleaner + soups + soap + pudding powder | 857 |
| Community id: 5 – liquor (appetizer) + canned beer + shopping bags + misc. beverages + chewing gum + brandy + liqueur + whisky | 730 |
| Community id: 7 – yogurt + cream cheese + meat spreads + packaged fruit/vegetables + butter milk + berries + whipped/sour cream + baking powder + specialty cheese + instant coffee + organic sausage + cooking chocolate + kitchen utensil | 674 |
| Community id: 4 – tropical fruit + pip fruit + white bread + processed cheese + sweet spreads + beverages + ham + cookware + tea + syrup + baby cosmetics + specialty vegetables + sound storage medium | 624 |
| Community id: 15 – citrus fruit + hygiene articles + domestic eggs + cat food + cling film/bags + canned fruit + dental care + flower soil/fertilizer + female sanitary products + dish cleaner + house keeping products + rubbing alcohol + preservation products | 569 |
| Community id: 16 – bottled beer + red/blush wine + prosecco + liquor + rum | 432 |
| Community id: 11 – UHT-milk + bottled water + white wine + male cosmetics | 349 |
| Community id: 2 – long life bakery product + pot plants + fruit/vegetable juice + pickled vegetables + jam + bags | 341 |
| Community id: 3 – semi-finished bread + newspapers + pet care + nuts/prunes + toilet cleaner | 298 |
| Community id: 6 – dishes + napkins + grapes + zwieback + decalcifier | 293 |
| Community id: 1 – coffee + condensed milk + sparkling wine + fish + kitchen towels | 287 |
| Community id: 18 – sugar + frozen vegetables + salt + skin care + liver loaf + frozen chicken | 273 |
| Community id: 14 – frozen dessert + ice cream + frozen meals | 262 |
| Community id: 9 – margarine + artif. sweetener + specialty fat + candles + organic products | 207 |
| Community id: 17 – brown bread + sauces | 128 |
| Community id: 19 – photo/film | 79 |