DATA 624 Homework 10

Name: Charles Ugiagbe.

Date: 12/16/23

Homework Intro

The aim of this assigment is to use is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift.

library(tidyverse)
library(kableExtra)
library(arules)
library(igraph)

load the data

receipt.df <- read.csv("GroceryDataSet.csv")

Take a head look at the data

head(receipt.df)

##       citrus.fruit semi.finished.bread      margarine              ready.soups
## 1   tropical fruit              yogurt         coffee                         
## 2       whole milk                                                            
## 3        pip fruit              yogurt  cream cheese              meat spreads
## 4 other vegetables          whole milk condensed milk long life bakery product
## 5       whole milk              butter         yogurt                     rice
## 6       rolls/buns                                                            
##                  X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9 X.10 X.11 X.12 X.13 X.14
## 1                                                                              
## 2                                                                              
## 3                                                                              
## 4                                                                              
## 5 abrasive cleaner                                                             
## 6                                                                              
##   X.15 X.16 X.17 X.18 X.19 X.20 X.21 X.22 X.23 X.24 X.25 X.26 X.27
## 1                                                                 
## 2                                                                 
## 3                                                                 
## 4                                                                 
## 5                                                                 
## 6

Market Analysis

we need to first read in and explore the data by looking at the top 15 item purchase.

data.df <- read.transactions("GroceryDataSet.csv", sep=",")
itemFrequencyPlot(data.df, topN=15, type="absolute", main="Top 15 Items", col=rainbow(15))

Whole milk is the most frequently purchased item.

Association Rules

In order to complete this market basket analysis, the Apriori algorithm is initiated to print out the top 10 rules with their support, confidence and lift. To find the association rules, we will use the ‘apriori’ function.

rules<- apriori(data.df, parameter=list(supp=0.001, conf=0.5) , control=list(verbose=FALSE))
summary(rules)

## set of 5668 rules
## 
## rule length distribution (lhs + rhs):sizes
##    2    3    4    5    6 
##   11 1461 3211  939   46 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    3.00    4.00    3.92    4.00    6.00 
## 
## summary of quality measures:
##     support           confidence        coverage             lift       
##  Min.   :0.001017   Min.   :0.5000   Min.   :0.001017   Min.   : 1.957  
##  1st Qu.:0.001118   1st Qu.:0.5455   1st Qu.:0.001729   1st Qu.: 2.464  
##  Median :0.001322   Median :0.6000   Median :0.002135   Median : 2.899  
##  Mean   :0.001668   Mean   :0.6250   Mean   :0.002788   Mean   : 3.262  
##  3rd Qu.:0.001729   3rd Qu.:0.6842   3rd Qu.:0.002949   3rd Qu.: 3.691  
##  Max.   :0.022267   Max.   :1.0000   Max.   :0.043416   Max.   :18.996  
##      count      
##  Min.   : 10.0  
##  1st Qu.: 11.0  
##  Median : 13.0  
##  Mean   : 16.4  
##  3rd Qu.: 17.0  
##  Max.   :219.0  
## 
## mining info:
##     data ntransactions support confidence
##  data.df          9835   0.001        0.5
##                                                                                                  call
##  apriori(data = data.df, parameter = list(supp = 0.001, conf = 0.5), control = list(verbose = FALSE))

apriori(data.df, parameter=list(supp=0.001, conf=0.5) , control=list(verbose=FALSE)) %>%
  DATAFRAME() %>%
  arrange(desc(lift)) %>%
  top_n(10) %>%
  kable() %>%
  kable_styling()

## Selecting by count

LHS	RHS	support	confidence	coverage	lift	count
{root vegetables,tropical fruit}	{other vegetables}	0.0123030	0.5845411	0.0210473	3.020999	121
{rolls/buns,root vegetables}	{other vegetables}	0.0122013	0.5020921	0.0243010	2.594890	120
{root vegetables,yogurt}	{other vegetables}	0.0129131	0.5000000	0.0258261	2.584078	127
{root vegetables,yogurt}	{whole milk}	0.0145399	0.5629921	0.0258261	2.203354	143
{domestic eggs,other vegetables}	{whole milk}	0.0123030	0.5525114	0.0222674	2.162336	121
{rolls/buns,root vegetables}	{whole milk}	0.0127097	0.5230126	0.0243010	2.046888	125
{other vegetables,pip fruit}	{whole milk}	0.0135231	0.5175097	0.0261312	2.025351	133
{tropical fruit,yogurt}	{whole milk}	0.0151500	0.5173611	0.0292832	2.024770	149
{other vegetables,yogurt}	{whole milk}	0.0222674	0.5128806	0.0434164	2.007235	219
{other vegetables,whipped/sour cream}	{whole milk}	0.0146416	0.5070423	0.0288765	1.984385	144

Cluster Analysis

We are to look for item groupings. A network graph can be use to preform the cluster analysis. First I will need to create a network graph from the transaction data. The I will detect the communities in the graph using the Louvain algorthym.

temp <- read.csv("GroceryDataSet.csv", header = FALSE) %>%
  mutate(shoper_id = row_number()) %>%
  pivot_longer(-shoper_id) %>%
  filter(value != "") %>%
  select(-name)

louvain_communities <- temp %>%
  rename(to = value, from = shoper_id) %>%
  graph_from_data_frame(directed = FALSE) %>%
  cluster_louvain() %>%
  communities()

items <- as.character(unique(temp$value))

cluster_df <- data.frame(name = c(NA), members = c(NA)) %>% na.omit()

for (i in 1:length(louvain_communities)){
  cluster_name <- paste0(i,": ")
  cluster_members <- 0
  for (member in louvain_communities[[i]]){
    if (member %in% items){
      cluster_name <- paste0(cluster_name, member, " + ")
    } else {
      cluster_members <- cluster_members + 1
    }
  }
  cluster_name <- substr(cluster_name,1,nchar(cluster_name)-3)
  cluster_df <- rbind(cluster_df, data.frame(name = cluster_name, members = cluster_members))
}

cluster_df %>%
  arrange(desc(members)) %>%
  kable() %>%
  kable_styling()

name	members
5: citrus fruit + other vegetables + rice + abrasive cleaner + beef + chicken + root vegetables + spices + pork + turkey + curd cheese + canned vegetables + onions + herbs + specialty cheese + dog food + frozen fish + salad dressing + vinegar + roll products + rubbing alcohol + jam + toilet cleaner + preservation products	1147
9: chocolate + soda + specialty bar + pastry + waffles + candy + beverages + chocolate marshmallow + frozen potato products + cake bar + snack products + finished products + potato products + baby food + tidbits + bags + sound storage medium	1104
3: margarine + whole milk + butter + cereals + curd + flour + sugar + detergent + whipped/sour cream + baking powder + specialty fat + flower (seeds) + salt + honey + cocoa drinks + skin care + soups + rum + soap + organic sausage + pudding powder + frozen fruits + cooking chocolate	1067
1: ready soups + rolls/buns + frankfurter + sausage + spread cheese + hard cheese + cat food + canned fish + sliced cheese + soft cheese + meat + mustard + mayonnaise + organic products + nut snack + kitchen towels + cream	1022
14: shopping bags + misc. beverages + chewing gum + specialty chocolate + sparkling wine + brandy + liqueur + whisky	517
4: yogurt + cream cheese + meat spreads + packaged fruit/vegetables + butter milk + bathroom cleaner + berries + fish + instant coffee + frozen chicken + kitchen utensil	507
6: liquor (appetizer) + canned beer + candles	433
16: frozen dessert + ice cream + frozen vegetables + frozen meals + cleaner + liver loaf	408
15: bottled beer + red/blush wine + prosecco + liquor	394
18: hygiene articles + domestic eggs + oil + canned fruit + dish cleaner + house keeping products + baby cosmetics + ketchup	382
11: UHT-milk + bottled water + artif. sweetener + white wine + male cosmetics	377
7: long life bakery product + pot plants + fruit/vegetable juice + sweet spreads + pickled vegetables	352
8: tropical fruit + white bread + processed cheese + ham + tea + syrup + specialty vegetables	341
13: dishes + napkins + grapes + zwieback + light bulbs + decalcifier	329
10: brown bread + hamburger meat + pasta + Instant food products + sauces + hair spray	323
19: semi-finished bread + newspapers + pet care + nuts/prunes	291
2: coffee + condensed milk + cling film/bags + female sanitary products	269
17: pip fruit + photo/film + softener + cookware	231
20: salty snack + dental care + popcorn + make up remover	175
12: dessert + seasonal products + flower soil/fertilizer	166