Data 624 HW 10

library(knitr)
library(rmdformats)

## Global options
options(max.print="85")
opts_chunk$set(cache=TRUE,
               prompt=FALSE,
               tidy=TRUE,
               comment=NA,
               message=FALSE,
               warning=FALSE)
opts_knit$set(width=35)

library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(kableExtra)
library(igraph)
## 
## Attaching package: 'igraph'
## The following object is masked from 'package:arules':
## 
##     union
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(mlbench)    # for Ionosphere data
library(psych)      # for cor2dist
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:igraph':
## 
##     as_data_frame, groups, union
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:arules':
## 
##     intersect, recode, setdiff, setequal, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:igraph':
## 
##     crossing
## The following objects are masked from 'package:Matrix':
## 
##     expand, pack, unpack
library(geomnet)
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(visNetwork)
library(stringr)

Description of problem

The assignment is to use R to mine the data for association rules. It should report support, confidence and lift and your top 10 rules by lift.

Loading of data

transactions <- read.transactions("GroceryDataSet.csv", sep = ",")
itemFrequencyPlot(transactions, topN = 10, type = "absolute", main = "Top 10 Items sold")

Whole milk is the overwhelming best seller in terms of absolute unit sold.

Market Basket Analysis

# default min support of 0.01 didn't work. Need to adjust the level of support to
# be 0.001
apriori(transactions, parameter = list(support = 0.001, conf = 0.95), control = list(verbose = FALSE)) %>% 
    DATAFRAME() %>% arrange(desc(lift)) %>% top_n(10) %>% kable() %>% kable_styling()
LHS RHS support confidence coverage lift count
{brown bread,pip fruit,whipped/sour cream} {other vegetables} 0.0011185 1 0.0011185 5.168156 11
{ham,pip fruit,tropical fruit,whole milk} {other vegetables} 0.0011185 1 0.0011185 5.168156 11
{citrus fruit,root vegetables,tropical fruit,whipped/sour cream} {other vegetables} 0.0012201 1 0.0012201 5.168156 12
{rice,sugar} {whole milk} 0.0012201 1 0.0012201 3.913649 12
{canned fish,hygiene articles} {whole milk} 0.0011185 1 0.0011185 3.913649 11
{flour,root vegetables,whipped/sour cream} {whole milk} 0.0017285 1 0.0017285 3.913649 17
{cream cheese,domestic eggs,sugar} {whole milk} 0.0011185 1 0.0011185 3.913649 11
{cream cheese,domestic eggs,napkins} {whole milk} 0.0011185 1 0.0011185 3.913649 11
{oil,root vegetables,tropical fruit,yogurt} {whole milk} 0.0011185 1 0.0011185 3.913649 11
{oil,other vegetables,root vegetables,yogurt} {whole milk} 0.0014235 1 0.0014235 3.913649 14
{butter,domestic eggs,other vegetables,whipped/sour cream} {whole milk} 0.0012201 1 0.0012201 3.913649 12
{bottled water,other vegetables,pip fruit,root vegetables} {whole milk} 0.0011185 1 0.0011185 3.913649 11

As it’s shown above, the top 10 rules along with its support, confidence, and lift are presented.

Cluster Analysis

grocery_data <- read.csv("GroceryDataSet.csv", header = FALSE) %>% mutate(shopper_id = row_number()) %>% 
    pivot_longer(-shopper_id) %>% filter(value != "") %>% select(-name)
head(grocery_data)
# A tibble: 6 x 2
  shopper_id value              
       <int> <fct>              
1          1 citrus fruit       
2          1 semi-finished bread
3          1 margarine          
4          1 ready soups        
5          2 tropical fruit     
6          2 yogurt             

Leverage Louvain community detection in R to plot out the clusters of shoppers by item groupings.

clusterlouvain <- grocery_data %>% rename(to = value, from = shopper_id) %>% graph_from_data_frame(directed = FALSE) %>% 
    cluster_louvain()

Printing the sizes of the clusterlouvain

clusterlouvain %>% sizes()
Community sizes
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
 292  347  303  637  738  298  687 1312  212 1108  353 1070  874  265  582  437 
  17   18   19 
 130  279   80 

Community Detection with the functions cluster_louvain() and cluster_infomap()

Get a list with all communities and their members using communities( )

Look to which community the characters are associated using membership()

membership(clusterlouvain)
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
12  1 13  7  1 10 12  5  2 13  4 10 10 12 10 18  2  7  8  8 10  7 13 13 10  5 
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
 7  8 10 17  4 10  4  9 13  8  2  5  8  6  8 13  7  5 12  1  8 12  5 12 15  5 
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 
 6 10  2 13  7  8 15 18 12  8 10 18  8 13 10 16 13 12 16  9  1  7 12 18  7 18 
79 80 81 82 83 84 85 
18 14  8 10 12  9 11 
 [ reached getOption("max.print") -- omitted 9919 entries ]

Results of the Louvain Community Detection Method: Clusters of item purchased

items <- as.character(unique(grocery_data$value))

cluster_df <- data.frame(Name = c(NA), Transactions = c(NA)) %>% na.omit()

for (i in 1:length(louvain_communities)) {
    cluster_name <- paste0("Community id: ", i, " -- ")
    cluster_members <- 0
    for (member in louvain_communities[[i]]) {
        if (member %in% items) {
            cluster_name <- paste0(cluster_name, member, " + ")
        } else {
            cluster_members <- cluster_members + 1
        }
    }
    cluster_str_loc <- str_locate_all(pattern = " -- ", cluster_name)
    cluster_name <- substr(cluster_name, 1, nchar(cluster_name) - 2)
    cluster_df <- rbind(cluster_df, data.frame(Name = cluster_name, Transactions = cluster_members))
}

cluster_df %>% arrange(desc(Transactions)) %>% kable(format = "html", caption = "<B>Clusters</B>") %>% 
    kable_styling()
Clusters
Name Transactions
Community id: 8 – chocolate + soda + specialty bar + pastry + salty snack + waffles + candy + dessert + chocolate marshmallow + specialty chocolate + popcorn + cake bar + snack products + finished products + make up remover + potato products + hair spray + light bulbs + baby food + tidbits 1292
Community id: 10 – other vegetables + rice + abrasive cleaner + flour + beef + chicken + root vegetables + bathroom cleaner + spices + pork + turkey + oil + curd cheese + onions + herbs + dog food + frozen fish + salad dressing + vinegar + roll products + frozen fruits 1087
Community id: 12 – ready soups + rolls/buns + frankfurter + sausage + spread cheese + hard cheese + canned fish + seasonal products + frozen potato products + sliced cheese + soft cheese + meat + mustard + mayonnaise + nut snack + ketchup + cream 1053
Community id: 13 – whole milk + butter + cereals + curd + detergent + hamburger meat + flower (seeds) + canned vegetables + pasta + softener + Instant food products + honey + cocoa drinks + cleaner + soups + soap + pudding powder 857
Community id: 5 – liquor (appetizer) + canned beer + shopping bags + misc. beverages + chewing gum + brandy + liqueur + whisky 730
Community id: 7 – yogurt + cream cheese + meat spreads + packaged fruit/vegetables + butter milk + berries + whipped/sour cream + baking powder + specialty cheese + instant coffee + organic sausage + cooking chocolate + kitchen utensil 674
Community id: 4 – tropical fruit + pip fruit + white bread + processed cheese + sweet spreads + beverages + ham + cookware + tea + syrup + baby cosmetics + specialty vegetables + sound storage medium 624
Community id: 15 – citrus fruit + hygiene articles + domestic eggs + cat food + cling film/bags + canned fruit + dental care + flower soil/fertilizer + female sanitary products + dish cleaner + house keeping products + rubbing alcohol + preservation products 569
Community id: 16 – bottled beer + red/blush wine + prosecco + liquor + rum 432
Community id: 11 – UHT-milk + bottled water + white wine + male cosmetics 349
Community id: 2 – long life bakery product + pot plants + fruit/vegetable juice + pickled vegetables + jam + bags 341
Community id: 3 – semi-finished bread + newspapers + pet care + nuts/prunes + toilet cleaner 298
Community id: 6 – dishes + napkins + grapes + zwieback + decalcifier 293
Community id: 1 – coffee + condensed milk + sparkling wine + fish + kitchen towels 287
Community id: 18 – sugar + frozen vegetables + salt + skin care + liver loaf + frozen chicken 273
Community id: 14 – frozen dessert + ice cream + frozen meals 262
Community id: 9 – margarine + artif. sweetener + specialty fat + candles + organic products 207
Community id: 17 – brown bread + sauces 128
Community id: 19 – photo/film 79

Plotting the non-directed graph (Codes and output are intentionally hidden as the output is not very meaningful and takes a long time to generate the plot of the non-directed graph)

Edges unfortunately is not given (Skipped)