Methods

A: Use R to mine the data for association rules

Sections

Load DataFrame as “transaction data” to use arules library

grocerydf <- read.transactions("C:/Users/vitug/OneDrive/Desktop/CUNY Masters/DATA_624/GroceryDataSet.csv", format = "basket", sep=",")

summary(grocerydf)

## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics

Plot top10 items using “itemFrequencyPlot” function.

itemFrequencyPlot(grocerydf, topN=10, type="absolute", main="Top 10 Items")

After plotting the list of transacctions, I can clearly see that whole milk, other vegetables, rolls/buns and soda were at top of purchased items based on item frequency.

Use the Apriori algorithm to find frequent itemsets

frequent_itemsets <- apriori(grocerydf, parameter = list(supp = 0.01, conf = 0.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

# Inspect the frequent itemsets
inspect(frequent_itemsets)

##      lhs                                       rhs                support   
## [1]  {curd, yogurt}                         => {whole milk}       0.01006609
## [2]  {butter, other vegetables}             => {whole milk}       0.01148958
## [3]  {domestic eggs, other vegetables}      => {whole milk}       0.01230300
## [4]  {whipped/sour cream, yogurt}           => {whole milk}       0.01087951
## [5]  {other vegetables, whipped/sour cream} => {whole milk}       0.01464159
## [6]  {other vegetables, pip fruit}          => {whole milk}       0.01352313
## [7]  {citrus fruit, root vegetables}        => {other vegetables} 0.01037112
## [8]  {root vegetables, tropical fruit}      => {other vegetables} 0.01230300
## [9]  {root vegetables, tropical fruit}      => {whole milk}       0.01199797
## [10] {tropical fruit, yogurt}               => {whole milk}       0.01514997
## [11] {root vegetables, yogurt}              => {other vegetables} 0.01291307
## [12] {root vegetables, yogurt}              => {whole milk}       0.01453991
## [13] {rolls/buns, root vegetables}          => {other vegetables} 0.01220132
## [14] {rolls/buns, root vegetables}          => {whole milk}       0.01270971
## [15] {other vegetables, yogurt}             => {whole milk}       0.02226741
##      confidence coverage   lift     count
## [1]  0.5823529  0.01728521 2.279125  99  
## [2]  0.5736041  0.02003050 2.244885 113  
## [3]  0.5525114  0.02226741 2.162336 121  
## [4]  0.5245098  0.02074225 2.052747 107  
## [5]  0.5070423  0.02887646 1.984385 144  
## [6]  0.5175097  0.02613116 2.025351 133  
## [7]  0.5862069  0.01769192 3.029608 102  
## [8]  0.5845411  0.02104728 3.020999 121  
## [9]  0.5700483  0.02104728 2.230969 118  
## [10] 0.5173611  0.02928317 2.024770 149  
## [11] 0.5000000  0.02582613 2.584078 127  
## [12] 0.5629921  0.02582613 2.203354 143  
## [13] 0.5020921  0.02430097 2.594890 120  
## [14] 0.5230126  0.02430097 2.046888 125  
## [15] 0.5128806  0.04341637 2.007235 219

Generate asociation rules

rules <- apriori(grocerydf, parameter = list(supp = 0.01, conf = 0.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

# Inspect the rules
inspect(rules)

##      lhs                                       rhs                support   
## [1]  {curd, yogurt}                         => {whole milk}       0.01006609
## [2]  {butter, other vegetables}             => {whole milk}       0.01148958
## [3]  {domestic eggs, other vegetables}      => {whole milk}       0.01230300
## [4]  {whipped/sour cream, yogurt}           => {whole milk}       0.01087951
## [5]  {other vegetables, whipped/sour cream} => {whole milk}       0.01464159
## [6]  {other vegetables, pip fruit}          => {whole milk}       0.01352313
## [7]  {citrus fruit, root vegetables}        => {other vegetables} 0.01037112
## [8]  {root vegetables, tropical fruit}      => {other vegetables} 0.01230300
## [9]  {root vegetables, tropical fruit}      => {whole milk}       0.01199797
## [10] {tropical fruit, yogurt}               => {whole milk}       0.01514997
## [11] {root vegetables, yogurt}              => {other vegetables} 0.01291307
## [12] {root vegetables, yogurt}              => {whole milk}       0.01453991
## [13] {rolls/buns, root vegetables}          => {other vegetables} 0.01220132
## [14] {rolls/buns, root vegetables}          => {whole milk}       0.01270971
## [15] {other vegetables, yogurt}             => {whole milk}       0.02226741
##      confidence coverage   lift     count
## [1]  0.5823529  0.01728521 2.279125  99  
## [2]  0.5736041  0.02003050 2.244885 113  
## [3]  0.5525114  0.02226741 2.162336 121  
## [4]  0.5245098  0.02074225 2.052747 107  
## [5]  0.5070423  0.02887646 1.984385 144  
## [6]  0.5175097  0.02613116 2.025351 133  
## [7]  0.5862069  0.01769192 3.029608 102  
## [8]  0.5845411  0.02104728 3.020999 121  
## [9]  0.5700483  0.02104728 2.230969 118  
## [10] 0.5173611  0.02928317 2.024770 149  
## [11] 0.5000000  0.02582613 2.584078 127  
## [12] 0.5629921  0.02582613 2.203354 143  
## [13] 0.5020921  0.02430097 2.594890 120  
## [14] 0.5230126  0.02430097 2.046888 125  
## [15] 0.5128806  0.04341637 2.007235 219

Inspect top10 rules using “lift” function

inspect(sort(rules, by = "lift")[1:10])

##      lhs                                  rhs                support   
## [1]  {citrus fruit, root vegetables}   => {other vegetables} 0.01037112
## [2]  {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [3]  {rolls/buns, root vegetables}     => {other vegetables} 0.01220132
## [4]  {root vegetables, yogurt}         => {other vegetables} 0.01291307
## [5]  {curd, yogurt}                    => {whole milk}       0.01006609
## [6]  {butter, other vegetables}        => {whole milk}       0.01148958
## [7]  {root vegetables, tropical fruit} => {whole milk}       0.01199797
## [8]  {root vegetables, yogurt}         => {whole milk}       0.01453991
## [9]  {domestic eggs, other vegetables} => {whole milk}       0.01230300
## [10] {whipped/sour cream, yogurt}      => {whole milk}       0.01087951
##      confidence coverage   lift     count
## [1]  0.5862069  0.01769192 3.029608 102  
## [2]  0.5845411  0.02104728 3.020999 121  
## [3]  0.5020921  0.02430097 2.594890 120  
## [4]  0.5000000  0.02582613 2.584078 127  
## [5]  0.5823529  0.01728521 2.279125  99  
## [6]  0.5736041  0.02003050 2.244885 113  
## [7]  0.5700483  0.02104728 2.230969 118  
## [8]  0.5629921  0.02582613 2.203354 143  
## [9]  0.5525114  0.02226741 2.162336 121  
## [10] 0.5245098  0.02074225 2.052747 107

Find number of rules using “summary” function.

summary(rules)

## set of 15 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3 
## 15 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       3       3       3       3       3       3 
## 
## summary of quality measures:
##     support          confidence        coverage            lift      
##  Min.   :0.01007   Min.   :0.5000   Min.   :0.01729   Min.   :1.984  
##  1st Qu.:0.01174   1st Qu.:0.5151   1st Qu.:0.02089   1st Qu.:2.036  
##  Median :0.01230   Median :0.5245   Median :0.02430   Median :2.203  
##  Mean   :0.01316   Mean   :0.5411   Mean   :0.02454   Mean   :2.299  
##  3rd Qu.:0.01403   3rd Qu.:0.5718   3rd Qu.:0.02598   3rd Qu.:2.432  
##  Max.   :0.02227   Max.   :0.5862   Max.   :0.04342   Max.   :3.030  
##      count      
##  Min.   : 99.0  
##  1st Qu.:115.5  
##  Median :121.0  
##  Mean   :129.4  
##  3rd Qu.:138.0  
##  Max.   :219.0  
## 
## mining info:
##       data ntransactions support confidence
##  grocerydf          9835    0.01        0.5
##                                                                  call
##  apriori(data = grocerydf, parameter = list(supp = 0.01, conf = 0.5))

Graph of top10 rules by “lift” using IGraph

plot(rules,by = "lift", method="graph", engine = "igraph", layout = igraph::in_circle(), limit = 10)

## Warning: Unknown control parameters: by

## Available control parameters (with default values):
## main  =  Graph for 10 rules
## max   =  100
## nodeCol   =  c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF",  "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF",  "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## itemnodeCol   =  #66CC66FF
## edgeCol   =  #ABABABFF
## labelCol  =  #000000B3
## measureLabels     =  FALSE
## precision     =  3
## arrowSize     =  0.5
## alpha     =  0.5
## cex   =  1
## layout    =  NULL
## layoutParams  =  list()
## engine    =  igraph
## plot  =  TRUE
## plot_options  =  list()
## verbose   =  FALSE

The plot above shows that the top 10 frequently purchased items are whole milk, other vegetables, rolls/buns, soda, yogurt,etc. The Apriori functions returns 15 association rules for the transaction data. After inspecting the top 10 rules by confidence shows that most of the associations are with “other vegetables” These relationships are pictured above in the plot of the network where most arrows are pointing toward whole milk and other vegetables categories.

B: Perform a Market Basket Analysis

Sections

Create a co-occurrence matrix from the transaction data

co_occurrence_matrix <- crossTable(grocerydf)

# Convert the co-occurrence matrix to a graph
g <- graph.adjacency(co_occurrence_matrix, mode = "undirected", weighted = TRUE)

## Warning: `graph.adjacency()` was deprecated in igraph 2.0.0.
## ℹ Please use `graph_from_adjacency_matrix()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Inspect the graph
summary(g)

## IGRAPH 793b7ef UNW- 169 9805 -- 
## + attr: name (v/c), weight (e/n)

Create a graph displaying the top 10 items(nodes) in the network

# Function to plot the top 10 items (nodes) based on degree centrality
plot_top_items_network <- function(g, top_n = 10, layout_type = "fr", transparency = 0.5) {
  
  # Calculate degree centrality for each node
  importance_scores <- degree(g)
  
  # Get the top10 nodes based on degree
  top_nodes <- order(importance_scores, decreasing = TRUE)[1:top_n]
  
  # Create a sub graph for the top10 nodes
  top_subgraph <- induced_subgraph(g, top_nodes)
  
  # Use specified layout type for the network
  if (layout_type == "fr") {
    layout_fr <- layout_with_fr(top_subgraph)  # Fruchterman-Reingold layout
  } else if (layout_type == "kk") {
    layout_fr <- layout_with_kk(top_subgraph)  # Kamada-Kawai layout
  } else {
    layout_fr <- layout_in_circle(top_subgraph)  # Circle layout as fallback
  }
  
  # Plot the top10 items (nodes)
  plot(top_subgraph,
       layout = layout_fr,
       vertex.size = 20,               # Size of nodes
       vertex.label.cex = 0.8,         # Size of labels
       vertex.color = "lightblue",     # Color for all nodes
       vertex.frame.color = "white",   # White frame around nodes
       edge.width = E(top_subgraph)$weight / 10,  # Width of edges
       edge.color = adjustcolor("gray", alpha.f = transparency), # Transparent edges
       edge.arrow.size = 1.5,          # Size of edge arrows
       main = paste("Top", top_n, "Items in Network"),
       vertex.label.color = "black",   # Label color
       vertex.label.dist = 1,          # Label distance from node
       edge.curved = 0.2)              # Curved edges
}

plot_top_items_network(g, top_n = 10, layout_type = "fr", transparency = 0.3)

Based on the graph above, whole milk and yogurt has a direct relationship as well as other vegetables and roll-buns, shopping bags, pastry and butter are more independent items in the top10 network.

Create a communtity detection using the “Louvain method”

communities <- cluster_louvain(g)

membership(communities)

##          abrasive cleaner          artif. sweetener            baby cosmetics 
##                         1                         2                         1 
##                 baby food                      bags             baking powder 
##                         3                         4                         2 
##          bathroom cleaner                      beef                   berries 
##                         1                         5                         6 
##                 beverages              bottled beer             bottled water 
##                         7                         8                         8 
##                    brandy               brown bread                    butter 
##                         9                        10                        11 
##               butter milk                  cake bar                   candles 
##                        12                         3                         2 
##                     candy               canned beer               canned fish 
##                        13                         9                        14 
##              canned fruit         canned vegetables                  cat food 
##                        14                        15                         1 
##                   cereals               chewing gum                   chicken 
##                        12                         9                        16 
##                 chocolate     chocolate marshmallow              citrus fruit 
##                        13                        13                        17 
##                   cleaner           cling film/bags              cocoa drinks 
##                         1                         1                         2 
##                    coffee            condensed milk         cooking chocolate 
##                         1                         1                         2 
##                  cookware                     cream              cream cheese 
##                        13                        18                        12 
##                      curd               curd cheese               decalcifier 
##                        12                         1                        19 
##               dental care                   dessert                 detergent 
##                        20                         3                        19 
##              dish cleaner                    dishes                  dog food 
##                         1                        21                        22 
##             domestic eggs  female sanitary products         finished products 
##                        11                         1                        13 
##                      fish                     flour            flower (seeds) 
##                         1                         2                        16 
##    flower soil/fertilizer               frankfurter            frozen chicken 
##                        19                        14                         3 
##            frozen dessert               frozen fish             frozen fruits 
##                        18                        18                         6 
##              frozen meals    frozen potato products         frozen vegetables 
##                        18                        18                        16 
##     fruit/vegetable juice                    grapes                hair spray 
##                        23                        22                        13 
##                       ham            hamburger meat               hard cheese 
##                        24                        15                        24 
##                     herbs                     honey    house keeping products 
##                         5                        25                        19 
##          hygiene articles                 ice cream            instant coffee 
##                        19                        18                        16 
##     Instant food products                       jam                   ketchup 
##                        15                         5                        14 
##            kitchen towels           kitchen utensil               light bulbs 
##                         1                        22                        19 
##                   liqueur                    liquor        liquor (appetizer) 
##                         9                         8                         9 
##                liver loaf  long life bakery product           make up remover 
##                        12                        13                        23 
##            male cosmetics                 margarine                mayonnaise 
##                         9                         2                        14 
##                      meat              meat spreads           misc. beverages 
##                        10                        12                         9 
##                   mustard                   napkins                newspapers 
##                        14                        19                        20 
##                 nut snack               nuts/prunes                       oil 
##                        18                        20                         2 
##                    onions          organic products           organic sausage 
##                         5                         2                        24 
##          other vegetables packaged fruit/vegetables                     pasta 
##                         5                        16                        15 
##                    pastry                  pet care                photo/film 
##                        26                         1                        19 
##        pickled vegetables                 pip fruit                   popcorn 
##                        14                        17                        18 
##                      pork                pot plants           potato products 
##                         5                        22                        26 
##     preservation products          processed cheese                  prosecco 
##                        13                        24                         8 
##            pudding powder               ready soups            red/blush wine 
##                         2                        22                         8 
##                      rice             roll products                rolls/buns 
##                         2                         2                         4 
##           root vegetables           rubbing alcohol                       rum 
##                         5                        19                         8 
##            salad dressing                      salt               salty snack 
##                         5                         2                        18 
##                    sauces                   sausage         seasonal products 
##                        15                         4                         3 
##       semi-finished bread             shopping bags                 skin care 
##                        22                         9                         2 
##             sliced cheese            snack products                      soap 
##                        24                        13                        23 
##                      soda               soft cheese                  softener 
##                         9                        24                        19 
##      sound storage medium                     soups            sparkling wine 
##                        24                         2                         1 
##             specialty bar          specialty cheese       specialty chocolate 
##                        13                        24                        13 
##             specialty fat      specialty vegetables                    spices 
##                         2                        24                        16 
##             spread cheese                     sugar             sweet spreads 
##                         4                         2                        15 
##                     syrup                       tea                   tidbits 
##                         1                         8                         4 
##            toilet cleaner            tropical fruit                    turkey 
##                        20                        17                        22 
##                  UHT-milk                   vinegar                   waffles 
##                         1                         2                        13 
##        whipped/sour cream                    whisky               white bread 
##                         6                         9                        24 
##                white wine                whole milk                    yogurt 
##                         8                        25                        12 
##                  zwieback 
##                        21

Create a Plot with the top3 communities with its items

# Function to plot the top 10 communities in the network
plot_top_communities_network <- function(g, top_n = 3, layout_type = "fr", transparency = 0.5) {
  
  # Perform community detection using the Louvain method
  communities <- cluster_louvain(g)
  
  # Get community membership (which community each node belongs to)
  community_membership <- membership(communities)
  
  # Identify the top N communities based on size (number of nodes)
  community_sizes <- table(community_membership)
  top_communities <- order(community_sizes, decreasing = TRUE)[1:top_n]
  
  # Create a subgraph that includes only the nodes in the top N communities
  nodes_in_top_communities <- which(community_membership %in% top_communities)
  top_community_subgraph <- induced_subgraph(g, nodes_in_top_communities)
  
  # Use specified layout type for the network
  if (layout_type == "fr") {
    layout_fr <- layout_with_fr(top_community_subgraph)  # Fruchterman-Reingold layout
  } else if (layout_type == "kk") {
    layout_fr <- layout_with_kk(top_community_subgraph)  # Kamada-Kawai layout
  } else {
    layout_fr <- layout_in_circle(top_community_subgraph)  # Circle layout as fallback
  }
  
  # Plot the network of the top N communities
  plot(top_community_subgraph,
       layout = layout_fr,
       vertex.size = 10,               # Size of nodes
       vertex.label.cex = 0.8,         # Size of labels
       vertex.color = community_membership[nodes_in_top_communities] + 1,  # Color by community
       vertex.frame.color = "white",   # White frame around nodes
       edge.width = E(top_community_subgraph)$weight / 10,  # Width of edges
       edge.color = adjustcolor("gray", alpha.f = transparency), # Transparent edges
       edge.arrow.size = 0.5,          # Size of edge arrows
       main = paste("Top", top_n, "Communities in Network"),
       vertex.label.color = "black",   # Label color
       vertex.label.dist = 1,          # Label distance from node
       edge.curved = 0.3)              # Curved edges
}
# Plot the top 10 communities in the network
plot_top_communities_network(g, top_n = 3, layout_type = "fr", transparency = 0.5)

The table show 25 communities in the data, with community number 3 containing the majority of items(candy, chocolate, chocolate marshmallows, baby cosmetics, waffles,specialty chocolate, etc). The graph show baby food, baby cosmetics, and cream within communities but more independent.

Create a Plot to display, top5 communities and items(nodes)

# Function to plot top items and top communities together
plot_top_items_and_communities <- function(g, top_items_n = 5, top_communities_n = 5, layout_type = "fr", transparency = 0.5) {
  
  # Calculate degree centrality for each node (for top items)
  importance_scores <- degree(g)
  top_items <- order(importance_scores, decreasing = TRUE)[1:top_items_n]
  
  # Perform community detection
  communities <- cluster_louvain(g)
  community_membership <- membership(communities)
  
  # Get community sizes and top N communities based on size
  community_sizes <- table(community_membership)
  top_communities <- order(community_sizes, decreasing = TRUE)[1:top_communities_n]
  
  # Subgraph for top items
  top_items_subgraph <- induced_subgraph(g, top_items)
  
  # Subgraph for top communities
  nodes_in_top_communities <- which(community_membership %in% top_communities)
  top_communities_subgraph <- induced_subgraph(g, nodes_in_top_communities)
  
  # Combine the two subgraphs (top items and top communities)
  combined_subgraph <- union(top_items_subgraph, top_communities_subgraph)
  
  # Plot combined subgraph
  layout_fr <- layout_with_fr(combined_subgraph)
  plot(combined_subgraph,
       layout = layout_fr,
       vertex.size = 10,               # Size of nodes
       vertex.label.cex = 0.8,         # Size of labels
       vertex.color = community_membership[nodes_in_top_communities] + 1,  # Color by community
       vertex.frame.color = "white",   # White frame around nodes
       edge.width = E(combined_subgraph)$weight / 10,  # Width of edges
       edge.color = adjustcolor("gray", alpha.f = transparency), # Transparent edges
       edge.arrow.size = 0.7,          # Size of edge arrows
       main = paste("Top Items and Top Communities"),
       vertex.label.color = "black",   # Label color
       vertex.label.dist = 1,          # Label distance from node
       edge.curved = 0.3)              # Curved edges
}
# Plot the top 10 items and the top 10 communities together
plot_top_items_and_communities(g, top_items_n = 5, top_communities_n = 5, layout_type = "fr", transparency = 0.5)

The graph above is a bit messy, however, we can see items that are more independent such as soda,whole milk, bun-rolls.

Create a function to analysis of top10 items based on degree and centrality

# Function to plot the top N most important items in the network
plot_top_items_network <- function(g, criterion = "degree", top_n = 10, layout_type = "fr", transparency = 0.3) {
  
  # Check criterion and calculate corresponding values (degree, betweenness, etc.)
  if (criterion == "degree") {
    importance_scores <- degree(g)
  } else if (criterion == "betweenness") {
    importance_scores <- betweenness(g)
  } else if (criterion == "closeness") {
    importance_scores <- closeness(g)
  } else {
    stop("Unknown criterion. Please choose 'degree', 'betweenness', or 'closeness'.")
  }
  
  # Get the top N nodes based on the selected criterion
  top_nodes <- order(importance_scores, decreasing = TRUE)[1:top_n]
  
  # Create a subgraph for the top N nodes
  top_subgraph <- induced_subgraph(g, top_nodes)
  
  # Perform community detection on the subgraph
  communities_subgraph <- cluster_louvain(top_subgraph)
  
  # Use specified layout type for the subgraph
  if (layout_type == "fr") {
    layout_fr <- layout_with_fr(top_subgraph)  # Fruchterman-Reingold layout
  } else if (layout_type == "kk") {
    layout_fr <- layout_with_kk(top_subgraph)  # Kamada-Kawai layout
  } else {
    layout_fr <- layout_in_circle(top_subgraph)  # Circle layout as fallback
  }
  
  # Plot the network with selected settings
  plot(top_subgraph,
       layout = layout_fr,
       vertex.size = 10,               # Size of nodes
       vertex.label.cex = 0.8,         # Size of labels
       vertex.color = membership(communities_subgraph) + 1,  # Color by community membership
       vertex.frame.color = "white",   # White frame around nodes
       edge.width = E(top_subgraph)$weight / 10,    # Width of edges
       edge.color = adjustcolor("gray", alpha.f = transparency), # Transparent edges
       edge.arrow.size = 0.5,          # Size of edge arrows
       main = paste("Top", top_n, "Items by", criterion),
       vertex.label.color = "black",   # Label color
       vertex.label.dist = 1,          # Label distance from node
       edge.curved = 0.2)              # Curved edges
}

Graph of top10 items by degree

plot_top_items_network(g, criterion = "degree", top_n = 10, layout_type = "fr", transparency = 0.3)

Plot of top10 items by betweenness

# Plot the top 10 most central items (based on betweenness centrality)
plot_top_items_network(g, criterion = "betweenness", top_n = 10, layout_type = "kk", transparency = 0.2)

Based on the last analysis, the top items on the list are whole milk and other vegetables, there’s a different result when I get the top10 items by betweenness with ready soups as the main item. The results varies based on the analysis type, I am really glad that I made this portion of the project, because I just realized that I can use several approaches based on project’s needs with market basket analysis.

Homework10

Victor Torres

2024-11-19

Instructions

Load Libraries for this Project.

Methods

A: Use R to mine the data for association rules

Sections

Load DataFrame as “transaction data” to use arules library

Plot top10 items using “itemFrequencyPlot” function.

Use the Apriori algorithm to find frequent itemsets

Generate asociation rules

Inspect top10 rules using “lift” function

Find number of rules using “summary” function.

Graph of top10 rules by “lift” using IGraph

B: Perform a Market Basket Analysis

Sections

Create a co-occurrence matrix from the transaction data

Create a graph displaying the top 10 items(nodes) in the network

Create a communtity detection using the “Louvain method”

Create a Plot with the top3 communities with its items

Create a Plot to display, top5 communities and items(nodes)

Create a function to analysis of top10 items based on degree and centrality

Graph of top10 items by degree

Plot of top10 items by betweenness