Libraries

library(tidyverse)
library(caret)
library(earth)
library(knitr)
library(caret)
library(pls)
library(glmnet)
library(Amelia)
library(knitr)
library(mice)
library(psych)
library(rattle)
library(arules)
library(igraph)

Instruction

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’.

That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. The data set is attached.

Your assignment is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift.

Extra credit: do a simple cluster analysis on the data as well. Use whichever packages you like. Due Nov 29 before midnight.

Data Exploration

I will load the data using read.transactions() and look at the top 20 items purchased.

grocery <- read.transactions('GroceryDataSet.csv', sep=',')
itemFrequencyPlot(grocery, topN=20, type="absolute", main="Top 10 Items", col = "Blue")

We can see from above that whole milk is the most frequently purchased item and that seems obvious as we know whole milk is a destination item. Our top 20 list matches to recent data i found on grocery items here: https://www.statista.com/statistics/962873/product-category-grocery-bought-online-most-popular-us/

Association rules

I will use apriori() to get top 10 rules with support, confidence and lift.

apriori(grocery, parameter=list(supp=0.001, conf=0.5) , control=list(verbose=FALSE)) %>%
  DATAFRAME() %>%
  arrange(desc(lift)) %>%
  top_n(10) %>%
  kable()
## Selecting by count
LHS RHS support confidence coverage lift count
{root vegetables,tropical fruit} {other vegetables} 0.0123030 0.5845411 0.0210473 3.020999 121
{rolls/buns,root vegetables} {other vegetables} 0.0122013 0.5020921 0.0243010 2.594890 120
{root vegetables,yogurt} {other vegetables} 0.0129131 0.5000000 0.0258261 2.584078 127
{root vegetables,yogurt} {whole milk} 0.0145399 0.5629921 0.0258261 2.203354 143
{domestic eggs,other vegetables} {whole milk} 0.0123030 0.5525114 0.0222674 2.162336 121
{rolls/buns,root vegetables} {whole milk} 0.0127097 0.5230126 0.0243010 2.046888 125
{other vegetables,pip fruit} {whole milk} 0.0135231 0.5175097 0.0261312 2.025351 133
{tropical fruit,yogurt} {whole milk} 0.0151500 0.5173611 0.0292832 2.024770 149
{other vegetables,yogurt} {whole milk} 0.0222674 0.5128806 0.0434164 2.007235 219
{other vegetables,whipped/sour cream} {whole milk} 0.0146416 0.5070423 0.0288765 1.984385 144

Cluster Analysis

I will look for item groupings to do cluster analysis. First step is to create a network graph from the grocery transaction data. Then i will use cluster_louvain() to detect communities.

grocery_data <- read.csv("GroceryDataSet.csv", header = FALSE) %>%
  mutate(shoper_id = row_number()) %>%
  pivot_longer(-shoper_id) %>%
  filter(value != "") %>%
  select(-name)

communities <- grocery_data %>%
  rename(to = value, from = shoper_id) %>%
  graph_from_data_frame(directed = FALSE) %>%
  cluster_louvain() %>%
  communities()

I have all all customers and items are assigned to 19 clusters. Now, I can summarize them with item names and count of shoppers in each cluster.

products <- as.character(unique(grocery_data$value)) # Get unique item list

df <- data.frame(name = c(NA), members = c(NA)) %>% na.omit() # create data frame

for (i in 1:length(communities)){
  cluster_name <- paste0(i,": ")
  cluster_members <- 0
  for (member in communities[[i]]){
    if (member %in% products){
      cluster_name <- paste0(cluster_name, member, " + ")
    } else {
      cluster_members <- cluster_members + 1
    }
  }
  cluster_name <- substr(cluster_name,1,nchar(cluster_name)-3)
  df <- rbind(df, data.frame(name = cluster_name, members = cluster_members))
}
  df %>%
  arrange(desc(members)) %>%
  kable()
name members
8: chocolate + soda + specialty bar + pastry + salty snack + waffles + candy + dessert + chocolate marshmallow + specialty chocolate + popcorn + cake bar + snack products + finished products + make up remover + potato products + hair spray + light bulbs + baby food + tidbits 1292
10: other vegetables + rice + abrasive cleaner + flour + beef + chicken + root vegetables + bathroom cleaner + spices + pork + turkey + oil + curd cheese + onions + herbs + dog food + frozen fish + salad dressing + vinegar + roll products + frozen fruits 1087
12: ready soups + rolls/buns + frankfurter + sausage + spread cheese + hard cheese + canned fish + seasonal products + frozen potato products + sliced cheese + soft cheese + meat + mustard + mayonnaise + nut snack + ketchup + cream 1053
13: whole milk + butter + cereals + curd + detergent + hamburger meat + flower (seeds) + canned vegetables + pasta + softener + Instant food products + honey + cocoa drinks + cleaner + soups + soap + pudding powder 857
5: liquor (appetizer) + canned beer + shopping bags + misc. beverages + chewing gum + brandy + liqueur + whisky 730
7: yogurt + cream cheese + meat spreads + packaged fruit/vegetables + butter milk + berries + whipped/sour cream + baking powder + specialty cheese + instant coffee + organic sausage + cooking chocolate + kitchen utensil 674
4: tropical fruit + pip fruit + white bread + processed cheese + sweet spreads + beverages + ham + cookware + tea + syrup + baby cosmetics + specialty vegetables + sound storage medium 624
15: citrus fruit + hygiene articles + domestic eggs + cat food + cling film/bags + canned fruit + dental care + flower soil/fertilizer + female sanitary products + dish cleaner + house keeping products + rubbing alcohol + preservation products 569
16: bottled beer + red/blush wine + prosecco + liquor + rum 432
11: UHT-milk + bottled water + white wine + male cosmetics 349
2: long life bakery product + pot plants + fruit/vegetable juice + pickled vegetables + jam + bags 341
3: semi-finished bread + newspapers + pet care + nuts/prunes + toilet cleaner 298
6: dishes + napkins + grapes + zwieback + decalcifier 293
1: coffee + condensed milk + sparkling wine + fish + kitchen towels 287
18: sugar + frozen vegetables + salt + skin care + liver loaf + frozen chicken 273
14: frozen dessert + ice cream + frozen meals 262
9: margarine + artif. sweetener + specialty fat + candles + organic products 207
17: brown bread + sauces 128
19: photo/film 79