Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’.
That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. The data set is attached.
Your assignment is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift.
Extra credit: do a simple cluster analysis on the data as well. Use whichever packages you like.
library(tidyverse)
library(arules)
library(arulesViz)
library(visNetwork)
library(igraph)
grocerydf <- read.transactions("C:/Users/vitug/OneDrive/Desktop/CUNY Masters/DATA_624/GroceryDataSet.csv", format = "basket", sep=",")
summary(grocerydf)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46
## 17 18 19 20 21 22 23 24 26 27 28 29 32
## 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3 baby cosmetics
itemFrequencyPlot(grocerydf, topN=10, type="absolute", main="Top 10 Items")
After plotting the list of transacctions, I can clearly see that whole milk, other vegetables, rolls/buns and soda were at top of purchased items based on item frequency.
frequent_itemsets <- apriori(grocerydf, parameter = list(supp = 0.01, conf = 0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 98
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Inspect the frequent itemsets
inspect(frequent_itemsets)
## lhs rhs support
## [1] {curd, yogurt} => {whole milk} 0.01006609
## [2] {butter, other vegetables} => {whole milk} 0.01148958
## [3] {domestic eggs, other vegetables} => {whole milk} 0.01230300
## [4] {whipped/sour cream, yogurt} => {whole milk} 0.01087951
## [5] {other vegetables, whipped/sour cream} => {whole milk} 0.01464159
## [6] {other vegetables, pip fruit} => {whole milk} 0.01352313
## [7] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [8] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [9] {root vegetables, tropical fruit} => {whole milk} 0.01199797
## [10] {tropical fruit, yogurt} => {whole milk} 0.01514997
## [11] {root vegetables, yogurt} => {other vegetables} 0.01291307
## [12] {root vegetables, yogurt} => {whole milk} 0.01453991
## [13] {rolls/buns, root vegetables} => {other vegetables} 0.01220132
## [14] {rolls/buns, root vegetables} => {whole milk} 0.01270971
## [15] {other vegetables, yogurt} => {whole milk} 0.02226741
## confidence coverage lift count
## [1] 0.5823529 0.01728521 2.279125 99
## [2] 0.5736041 0.02003050 2.244885 113
## [3] 0.5525114 0.02226741 2.162336 121
## [4] 0.5245098 0.02074225 2.052747 107
## [5] 0.5070423 0.02887646 1.984385 144
## [6] 0.5175097 0.02613116 2.025351 133
## [7] 0.5862069 0.01769192 3.029608 102
## [8] 0.5845411 0.02104728 3.020999 121
## [9] 0.5700483 0.02104728 2.230969 118
## [10] 0.5173611 0.02928317 2.024770 149
## [11] 0.5000000 0.02582613 2.584078 127
## [12] 0.5629921 0.02582613 2.203354 143
## [13] 0.5020921 0.02430097 2.594890 120
## [14] 0.5230126 0.02430097 2.046888 125
## [15] 0.5128806 0.04341637 2.007235 219
rules <- apriori(grocerydf, parameter = list(supp = 0.01, conf = 0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 98
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Inspect the rules
inspect(rules)
## lhs rhs support
## [1] {curd, yogurt} => {whole milk} 0.01006609
## [2] {butter, other vegetables} => {whole milk} 0.01148958
## [3] {domestic eggs, other vegetables} => {whole milk} 0.01230300
## [4] {whipped/sour cream, yogurt} => {whole milk} 0.01087951
## [5] {other vegetables, whipped/sour cream} => {whole milk} 0.01464159
## [6] {other vegetables, pip fruit} => {whole milk} 0.01352313
## [7] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [8] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [9] {root vegetables, tropical fruit} => {whole milk} 0.01199797
## [10] {tropical fruit, yogurt} => {whole milk} 0.01514997
## [11] {root vegetables, yogurt} => {other vegetables} 0.01291307
## [12] {root vegetables, yogurt} => {whole milk} 0.01453991
## [13] {rolls/buns, root vegetables} => {other vegetables} 0.01220132
## [14] {rolls/buns, root vegetables} => {whole milk} 0.01270971
## [15] {other vegetables, yogurt} => {whole milk} 0.02226741
## confidence coverage lift count
## [1] 0.5823529 0.01728521 2.279125 99
## [2] 0.5736041 0.02003050 2.244885 113
## [3] 0.5525114 0.02226741 2.162336 121
## [4] 0.5245098 0.02074225 2.052747 107
## [5] 0.5070423 0.02887646 1.984385 144
## [6] 0.5175097 0.02613116 2.025351 133
## [7] 0.5862069 0.01769192 3.029608 102
## [8] 0.5845411 0.02104728 3.020999 121
## [9] 0.5700483 0.02104728 2.230969 118
## [10] 0.5173611 0.02928317 2.024770 149
## [11] 0.5000000 0.02582613 2.584078 127
## [12] 0.5629921 0.02582613 2.203354 143
## [13] 0.5020921 0.02430097 2.594890 120
## [14] 0.5230126 0.02430097 2.046888 125
## [15] 0.5128806 0.04341637 2.007235 219
inspect(sort(rules, by = "lift")[1:10])
## lhs rhs support
## [1] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [2] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [3] {rolls/buns, root vegetables} => {other vegetables} 0.01220132
## [4] {root vegetables, yogurt} => {other vegetables} 0.01291307
## [5] {curd, yogurt} => {whole milk} 0.01006609
## [6] {butter, other vegetables} => {whole milk} 0.01148958
## [7] {root vegetables, tropical fruit} => {whole milk} 0.01199797
## [8] {root vegetables, yogurt} => {whole milk} 0.01453991
## [9] {domestic eggs, other vegetables} => {whole milk} 0.01230300
## [10] {whipped/sour cream, yogurt} => {whole milk} 0.01087951
## confidence coverage lift count
## [1] 0.5862069 0.01769192 3.029608 102
## [2] 0.5845411 0.02104728 3.020999 121
## [3] 0.5020921 0.02430097 2.594890 120
## [4] 0.5000000 0.02582613 2.584078 127
## [5] 0.5823529 0.01728521 2.279125 99
## [6] 0.5736041 0.02003050 2.244885 113
## [7] 0.5700483 0.02104728 2.230969 118
## [8] 0.5629921 0.02582613 2.203354 143
## [9] 0.5525114 0.02226741 2.162336 121
## [10] 0.5245098 0.02074225 2.052747 107
summary(rules)
## set of 15 rules
##
## rule length distribution (lhs + rhs):sizes
## 3
## 15
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3 3 3 3 3 3
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.01007 Min. :0.5000 Min. :0.01729 Min. :1.984
## 1st Qu.:0.01174 1st Qu.:0.5151 1st Qu.:0.02089 1st Qu.:2.036
## Median :0.01230 Median :0.5245 Median :0.02430 Median :2.203
## Mean :0.01316 Mean :0.5411 Mean :0.02454 Mean :2.299
## 3rd Qu.:0.01403 3rd Qu.:0.5718 3rd Qu.:0.02598 3rd Qu.:2.432
## Max. :0.02227 Max. :0.5862 Max. :0.04342 Max. :3.030
## count
## Min. : 99.0
## 1st Qu.:115.5
## Median :121.0
## Mean :129.4
## 3rd Qu.:138.0
## Max. :219.0
##
## mining info:
## data ntransactions support confidence
## grocerydf 9835 0.01 0.5
## call
## apriori(data = grocerydf, parameter = list(supp = 0.01, conf = 0.5))
plot(rules,by = "lift", method="graph", engine = "igraph", layout = igraph::in_circle(), limit = 10)
## Warning: Unknown control parameters: by
## Available control parameters (with default values):
## main = Graph for 10 rules
## max = 100
## nodeCol = c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF", "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF", "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## itemnodeCol = #66CC66FF
## edgeCol = #ABABABFF
## labelCol = #000000B3
## measureLabels = FALSE
## precision = 3
## arrowSize = 0.5
## alpha = 0.5
## cex = 1
## layout = NULL
## layoutParams = list()
## engine = igraph
## plot = TRUE
## plot_options = list()
## verbose = FALSE
The plot above shows that the top 10 frequently purchased items are whole milk, other vegetables, rolls/buns, soda, yogurt,etc. The Apriori functions returns 15 association rules for the transaction data. After inspecting the top 10 rules by confidence shows that most of the associations are with “other vegetables” These relationships are pictured above in the plot of the network where most arrows are pointing toward whole milk and other vegetables categories.
co_occurrence_matrix <- crossTable(grocerydf)
# Convert the co-occurrence matrix to a graph
g <- graph.adjacency(co_occurrence_matrix, mode = "undirected", weighted = TRUE)
## Warning: `graph.adjacency()` was deprecated in igraph 2.0.0.
## ℹ Please use `graph_from_adjacency_matrix()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Inspect the graph
summary(g)
## IGRAPH 793b7ef UNW- 169 9805 --
## + attr: name (v/c), weight (e/n)
# Function to plot the top 10 items (nodes) based on degree centrality
plot_top_items_network <- function(g, top_n = 10, layout_type = "fr", transparency = 0.5) {
# Calculate degree centrality for each node
importance_scores <- degree(g)
# Get the top10 nodes based on degree
top_nodes <- order(importance_scores, decreasing = TRUE)[1:top_n]
# Create a sub graph for the top10 nodes
top_subgraph <- induced_subgraph(g, top_nodes)
# Use specified layout type for the network
if (layout_type == "fr") {
layout_fr <- layout_with_fr(top_subgraph) # Fruchterman-Reingold layout
} else if (layout_type == "kk") {
layout_fr <- layout_with_kk(top_subgraph) # Kamada-Kawai layout
} else {
layout_fr <- layout_in_circle(top_subgraph) # Circle layout as fallback
}
# Plot the top10 items (nodes)
plot(top_subgraph,
layout = layout_fr,
vertex.size = 20, # Size of nodes
vertex.label.cex = 0.8, # Size of labels
vertex.color = "lightblue", # Color for all nodes
vertex.frame.color = "white", # White frame around nodes
edge.width = E(top_subgraph)$weight / 10, # Width of edges
edge.color = adjustcolor("gray", alpha.f = transparency), # Transparent edges
edge.arrow.size = 1.5, # Size of edge arrows
main = paste("Top", top_n, "Items in Network"),
vertex.label.color = "black", # Label color
vertex.label.dist = 1, # Label distance from node
edge.curved = 0.2) # Curved edges
}
plot_top_items_network(g, top_n = 10, layout_type = "fr", transparency = 0.3)
Based on the graph above, whole milk and yogurt has a direct relationship as well as other vegetables and roll-buns, shopping bags, pastry and butter are more independent items in the top10 network.
communities <- cluster_louvain(g)
membership(communities)
## abrasive cleaner artif. sweetener baby cosmetics
## 1 2 1
## baby food bags baking powder
## 3 4 2
## bathroom cleaner beef berries
## 1 5 6
## beverages bottled beer bottled water
## 7 8 8
## brandy brown bread butter
## 9 10 11
## butter milk cake bar candles
## 12 3 2
## candy canned beer canned fish
## 13 9 14
## canned fruit canned vegetables cat food
## 14 15 1
## cereals chewing gum chicken
## 12 9 16
## chocolate chocolate marshmallow citrus fruit
## 13 13 17
## cleaner cling film/bags cocoa drinks
## 1 1 2
## coffee condensed milk cooking chocolate
## 1 1 2
## cookware cream cream cheese
## 13 18 12
## curd curd cheese decalcifier
## 12 1 19
## dental care dessert detergent
## 20 3 19
## dish cleaner dishes dog food
## 1 21 22
## domestic eggs female sanitary products finished products
## 11 1 13
## fish flour flower (seeds)
## 1 2 16
## flower soil/fertilizer frankfurter frozen chicken
## 19 14 3
## frozen dessert frozen fish frozen fruits
## 18 18 6
## frozen meals frozen potato products frozen vegetables
## 18 18 16
## fruit/vegetable juice grapes hair spray
## 23 22 13
## ham hamburger meat hard cheese
## 24 15 24
## herbs honey house keeping products
## 5 25 19
## hygiene articles ice cream instant coffee
## 19 18 16
## Instant food products jam ketchup
## 15 5 14
## kitchen towels kitchen utensil light bulbs
## 1 22 19
## liqueur liquor liquor (appetizer)
## 9 8 9
## liver loaf long life bakery product make up remover
## 12 13 23
## male cosmetics margarine mayonnaise
## 9 2 14
## meat meat spreads misc. beverages
## 10 12 9
## mustard napkins newspapers
## 14 19 20
## nut snack nuts/prunes oil
## 18 20 2
## onions organic products organic sausage
## 5 2 24
## other vegetables packaged fruit/vegetables pasta
## 5 16 15
## pastry pet care photo/film
## 26 1 19
## pickled vegetables pip fruit popcorn
## 14 17 18
## pork pot plants potato products
## 5 22 26
## preservation products processed cheese prosecco
## 13 24 8
## pudding powder ready soups red/blush wine
## 2 22 8
## rice roll products rolls/buns
## 2 2 4
## root vegetables rubbing alcohol rum
## 5 19 8
## salad dressing salt salty snack
## 5 2 18
## sauces sausage seasonal products
## 15 4 3
## semi-finished bread shopping bags skin care
## 22 9 2
## sliced cheese snack products soap
## 24 13 23
## soda soft cheese softener
## 9 24 19
## sound storage medium soups sparkling wine
## 24 2 1
## specialty bar specialty cheese specialty chocolate
## 13 24 13
## specialty fat specialty vegetables spices
## 2 24 16
## spread cheese sugar sweet spreads
## 4 2 15
## syrup tea tidbits
## 1 8 4
## toilet cleaner tropical fruit turkey
## 20 17 22
## UHT-milk vinegar waffles
## 1 2 13
## whipped/sour cream whisky white bread
## 6 9 24
## white wine whole milk yogurt
## 8 25 12
## zwieback
## 21
# Function to plot the top 10 communities in the network
plot_top_communities_network <- function(g, top_n = 3, layout_type = "fr", transparency = 0.5) {
# Perform community detection using the Louvain method
communities <- cluster_louvain(g)
# Get community membership (which community each node belongs to)
community_membership <- membership(communities)
# Identify the top N communities based on size (number of nodes)
community_sizes <- table(community_membership)
top_communities <- order(community_sizes, decreasing = TRUE)[1:top_n]
# Create a subgraph that includes only the nodes in the top N communities
nodes_in_top_communities <- which(community_membership %in% top_communities)
top_community_subgraph <- induced_subgraph(g, nodes_in_top_communities)
# Use specified layout type for the network
if (layout_type == "fr") {
layout_fr <- layout_with_fr(top_community_subgraph) # Fruchterman-Reingold layout
} else if (layout_type == "kk") {
layout_fr <- layout_with_kk(top_community_subgraph) # Kamada-Kawai layout
} else {
layout_fr <- layout_in_circle(top_community_subgraph) # Circle layout as fallback
}
# Plot the network of the top N communities
plot(top_community_subgraph,
layout = layout_fr,
vertex.size = 10, # Size of nodes
vertex.label.cex = 0.8, # Size of labels
vertex.color = community_membership[nodes_in_top_communities] + 1, # Color by community
vertex.frame.color = "white", # White frame around nodes
edge.width = E(top_community_subgraph)$weight / 10, # Width of edges
edge.color = adjustcolor("gray", alpha.f = transparency), # Transparent edges
edge.arrow.size = 0.5, # Size of edge arrows
main = paste("Top", top_n, "Communities in Network"),
vertex.label.color = "black", # Label color
vertex.label.dist = 1, # Label distance from node
edge.curved = 0.3) # Curved edges
}
# Plot the top 10 communities in the network
plot_top_communities_network(g, top_n = 3, layout_type = "fr", transparency = 0.5)
The table show 25 communities in the data, with community number 3 containing the majority of items(candy, chocolate, chocolate marshmallows, baby cosmetics, waffles,specialty chocolate, etc). The graph show baby food, baby cosmetics, and cream within communities but more independent.
# Function to plot top items and top communities together
plot_top_items_and_communities <- function(g, top_items_n = 5, top_communities_n = 5, layout_type = "fr", transparency = 0.5) {
# Calculate degree centrality for each node (for top items)
importance_scores <- degree(g)
top_items <- order(importance_scores, decreasing = TRUE)[1:top_items_n]
# Perform community detection
communities <- cluster_louvain(g)
community_membership <- membership(communities)
# Get community sizes and top N communities based on size
community_sizes <- table(community_membership)
top_communities <- order(community_sizes, decreasing = TRUE)[1:top_communities_n]
# Subgraph for top items
top_items_subgraph <- induced_subgraph(g, top_items)
# Subgraph for top communities
nodes_in_top_communities <- which(community_membership %in% top_communities)
top_communities_subgraph <- induced_subgraph(g, nodes_in_top_communities)
# Combine the two subgraphs (top items and top communities)
combined_subgraph <- union(top_items_subgraph, top_communities_subgraph)
# Plot combined subgraph
layout_fr <- layout_with_fr(combined_subgraph)
plot(combined_subgraph,
layout = layout_fr,
vertex.size = 10, # Size of nodes
vertex.label.cex = 0.8, # Size of labels
vertex.color = community_membership[nodes_in_top_communities] + 1, # Color by community
vertex.frame.color = "white", # White frame around nodes
edge.width = E(combined_subgraph)$weight / 10, # Width of edges
edge.color = adjustcolor("gray", alpha.f = transparency), # Transparent edges
edge.arrow.size = 0.7, # Size of edge arrows
main = paste("Top Items and Top Communities"),
vertex.label.color = "black", # Label color
vertex.label.dist = 1, # Label distance from node
edge.curved = 0.3) # Curved edges
}
# Plot the top 10 items and the top 10 communities together
plot_top_items_and_communities(g, top_items_n = 5, top_communities_n = 5, layout_type = "fr", transparency = 0.5)
The graph above is a bit messy, however, we can see items that are more independent such as soda,whole milk, bun-rolls.
# Function to plot the top N most important items in the network
plot_top_items_network <- function(g, criterion = "degree", top_n = 10, layout_type = "fr", transparency = 0.3) {
# Check criterion and calculate corresponding values (degree, betweenness, etc.)
if (criterion == "degree") {
importance_scores <- degree(g)
} else if (criterion == "betweenness") {
importance_scores <- betweenness(g)
} else if (criterion == "closeness") {
importance_scores <- closeness(g)
} else {
stop("Unknown criterion. Please choose 'degree', 'betweenness', or 'closeness'.")
}
# Get the top N nodes based on the selected criterion
top_nodes <- order(importance_scores, decreasing = TRUE)[1:top_n]
# Create a subgraph for the top N nodes
top_subgraph <- induced_subgraph(g, top_nodes)
# Perform community detection on the subgraph
communities_subgraph <- cluster_louvain(top_subgraph)
# Use specified layout type for the subgraph
if (layout_type == "fr") {
layout_fr <- layout_with_fr(top_subgraph) # Fruchterman-Reingold layout
} else if (layout_type == "kk") {
layout_fr <- layout_with_kk(top_subgraph) # Kamada-Kawai layout
} else {
layout_fr <- layout_in_circle(top_subgraph) # Circle layout as fallback
}
# Plot the network with selected settings
plot(top_subgraph,
layout = layout_fr,
vertex.size = 10, # Size of nodes
vertex.label.cex = 0.8, # Size of labels
vertex.color = membership(communities_subgraph) + 1, # Color by community membership
vertex.frame.color = "white", # White frame around nodes
edge.width = E(top_subgraph)$weight / 10, # Width of edges
edge.color = adjustcolor("gray", alpha.f = transparency), # Transparent edges
edge.arrow.size = 0.5, # Size of edge arrows
main = paste("Top", top_n, "Items by", criterion),
vertex.label.color = "black", # Label color
vertex.label.dist = 1, # Label distance from node
edge.curved = 0.2) # Curved edges
}
plot_top_items_network(g, criterion = "degree", top_n = 10, layout_type = "fr", transparency = 0.3)
# Plot the top 10 most central items (based on betweenness centrality)
plot_top_items_network(g, criterion = "betweenness", top_n = 10, layout_type = "kk", transparency = 0.2)
Based on the last analysis, the top items on the list are whole milk and other vegetables, there’s a different result when I get the top10 items by betweenness with ready soups as the main item. The results varies based on the analysis type, I am really glad that I made this portion of the project, because I just realized that I can use several approaches based on project’s needs with market basket analysis.