library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
This project applies association rule mining to analyze fathers’ knowledge about child nutrition and animal-source foods. The data come from a large survey conducted in Rwanda and focus on foods that are important for child growth and development.
The aim of the analysis is to identify common combinations of food items mentioned by fathers and to explore meaningful relationships between these items using association rules.
This project uses survey data from the Engaging Men study, obtained from Harvard Dataverse. The dataset includes fathers’ responses to questions about child feeding and animal-source foods. For the purpose of association rule mining, six food-related variables are selected. These variables represent key food groups that are important for child nutrition and are suitable for analyzing co-occurrence patterns.
data_raw <- read.delim( “Engaging_men_baseend_labeled.tab”, sep = “, header = TRUE, stringsAsFactors = FALSE )
head(data_raw)
The following six variables are used in the analysis: Milk Meat Fish Eggs Porridge with milk Fruits These variables capture both animal-source foods and complementary foods and are therefore relevant for identifying meaningful association rules.
data_raw <- read.delim(
"Engaging_men_baseend_labeled.tab",
sep = "\t",
header = TRUE,
stringsAsFactors = FALSE
)
library(dplyr)
# Select and rename variables to match the analysis description
data_food <- data_raw %>%
select(
Milk = q_205_1,
Meat = q_205_2,
Fish = q_205_3,
Eggs = q_205_4,
Porridge = q_204_2, # "Porridge with milk"
Fruits = q_204_6
)
# Preview the first few rows
head(data_food)
## Milk Meat Fish Eggs Porridge Fruits
## 1 1 1 1 1 0 1
## 2 1 1 0 1 0 1
## 3 1 0 0 0 0 0
## 4 1 1 1 0 1 1
## 5 1 1 0 0 0 0
## 6 0 1 1 1 1 1
Convert the selected food variables into a transaction format. Each row represents one father, and each item represents a food mentioned.
library(dplyr)
data_food <- data_raw %>%
select(
Milk = q_205_1,
Meat = q_205_2,
Fish = q_205_3,
Eggs = q_205_4,
Porridge = q_204_2,
Fruits = q_204_6
)
head(data_food)
## Milk Meat Fish Eggs Porridge Fruits
## 1 1 1 1 1 0 1
## 2 1 1 0 1 0 1
## 3 1 0 0 0 0 0
## 4 1 1 1 0 1 1
## 5 1 1 0 0 0 0
## 6 0 1 1 1 1 1
The binary food variables are converted into transaction data.
library(arules)
data_food_logical <- data_food == 1
food_transactions <- as(data_food_logical, "transactions")
summary(food_transactions)
## transactions as itemMatrix in sparse format with
## 298 rows (elements/itemsets/transactions) and
## 6 columns (items) and a density of 0.606264
##
## most frequent items:
## Meat Eggs Fruits Milk Fish (Other)
## 271 215 202 192 136 68
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6
## 15 23 83 118 52 7
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 3.000 4.000 3.638 4.000 6.000
##
## includes extended item information - examples:
## labels
## 1 Milk
## 2 Meat
## 3 Fish
The Apriori algorithm is applied to identify frequent food combinations.
rules <- apriori(
food_transactions,
parameter = list(
support = 0.1,
confidence = 0.6,
minlen = 2
)
)
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.1 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 29
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 298 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [64 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
length(rules)
## [1] 64
The rules are sorted by confidence to highlight the strongest associations.
rules_conf <- sort(rules, by = "confidence", decreasing = TRUE)
inspect(rules_conf[1:10])
## lhs rhs support confidence coverage lift
## [1] {Fish, Porridge} => {Meat} 0.1073826 1.0000000 0.1073826 1.099631
## [2] {Milk, Porridge} => {Meat} 0.1308725 0.9750000 0.1342282 1.072140
## [3] {Eggs, Porridge} => {Meat} 0.1778523 0.9636364 0.1845638 1.059644
## [4] {Porridge} => {Meat} 0.2181208 0.9558824 0.2281879 1.051118
## [5] {Porridge, Fruits} => {Meat} 0.1442953 0.9555556 0.1510067 1.050759
## [6] {Eggs, Porridge, Fruits} => {Meat} 0.1174497 0.9459459 0.1241611 1.040191
## [7] {Eggs} => {Meat} 0.6812081 0.9441860 0.7214765 1.038256
## [8] {Eggs, Fruits} => {Meat} 0.4798658 0.9407895 0.5100671 1.034521
## [9] {Fish, Eggs} => {Meat} 0.3456376 0.9363636 0.3691275 1.029654
## [10] {Fish, Eggs, Fruits} => {Meat} 0.2416107 0.9350649 0.2583893 1.028226
## count
## [1] 32
## [2] 39
## [3] 53
## [4] 65
## [5] 43
## [6] 35
## [7] 203
## [8] 143
## [9] 103
## [10] 72
This section presents different visualizations to better understand the association rules from several perspectives.
Before looking at individual rules, it is useful to examine the overall distribution of all generated rules. Support and confidence are the two main quality measures, and their relationship helps to evaluate whether the chosen parameters are reasonable.
plot(
rules,
measure = c("support", "confidence"),
shading = "lift"
)
The plot shows that most rules have relatively low support but moderate
to high confidence. This indicates that many food combinations are not
very frequent, but when they occur, they are relatively reliable. The
use of lift as shading helps to identify stronger rules among them.
While support and confidence describe frequency and reliability, lift measures how strong an association is compared to random co-occurrence. Therefore, examining rules with the highest lift allows us to focus on the most informative patterns.
plot(
sort(rules, by = "lift", decreasing = TRUE)[1:10],
measure = "lift"
)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
The top rules show lift values clearly greater than 1, which suggests
meaningful associations between food items. These rules are stronger
than what would be expected by chance and are therefore more interesting
for interpretation.
To better understand how different food items are connected, a network graph is used. This visualization focuses on the structure of the strongest rules rather than their numerical values.
library(arulesViz)
plot(
rules_conf[1:10],
method = "graph",
engine = "htmlwidget"
)
The network graph shows that some food items, such as milk and eggs, appear more frequently as central nodes. This suggests that these foods often co-occur with other items and may play an important role in child nutrition patterns.
Rule length shows how many items are included in a rule. Analyzing rule length helps to understand how complex the rules are and whether they are easy to interpret.
# Calculate rule length manually
rule_length <- size(rules)
# Plot the distribution of rule length
hist(
rule_length,
breaks = seq(min(rule_length) - 0.5, max(rule_length) + 0.5, by = 1),
xlab = "Rule length (number of items)",
ylab = "Frequency",
main = "Distribution of Rule Length"
)
### 7.5 Support Threshold Sensitivity Analysis The choice of the support
threshold strongly affects the number of generated rules. To understand
this effect, a simple sensitivity analysis is conducted by varying the
support value.
support_values <- c(0.05, 0.1, 0.15, 0.2)
rule_counts <- sapply(
support_values,
function(s) {
length(
apriori(
food_transactions,
parameter = list(
support = s,
confidence = 0.6
)
)
)
}
)
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 14
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 298 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [92 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 29
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 298 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [68 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.15 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 44
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 298 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [51 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.2 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 59
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 298 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [43 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
plot(
support_values,
rule_counts,
type = "b",
xlab = "Support threshold",
ylab = "Number of rules"
)
As the support threshold increases, the number of rules decreases
significantly. This shows a clear trade-off between capturing more
patterns and keeping only the most frequent ones. The selected support
value represents a balance between these two goals. Based on this
sensitivity analysis, a support threshold of 0.1 was selected as it
provides a reasonable balance between capturing a sufficient number of
rules and maintaining interpretability.
Parallel coordinates allow us to compare multiple rule quality measures at the same time. This visualization helps to see how support, confidence, and lift vary across different rules.
library(arulesViz)
plot(
rules,
method = "paracoord",
control = list(
reorder = TRUE
)
)
The parallel plot shows that rules with higher confidence often have
lower support. At the same time, rules with higher lift tend to stand
out from the rest. This confirms the trade-off between frequency and
strength in association rule mining.
To further interpret the extracted association rules, we focus on a specific food item and examine which other foods are most likely to co-occur with it. In particular, rules involving meat show that it frequently appears together with eggs and milk-based foods. This suggests that fathers tend to report animal-source foods as part of broader food combinations rather than isolated items. Such patterns indicate that food choices are often structured around common dietary bundles, which may be relevant when interpreting reported child nutrition practices.
Overall, the different visualizations provide a comprehensive view of the generated association rules and highlight clear trade-offs between support, confidence, and lift. The results indicate that stronger rules tend to be less frequent but offer more informative insights into food co-occurrence patterns. The network and parallel coordinate plots further help to reveal the structure and quality of the rules from multiple perspectives, making it easier to identify central food items and dominant associations. In addition, the sensitivity analysis confirms that the chosen parameter values achieve a reasonable balance between the number of extracted rules and their interpretability. Taken together, these findings suggest that fathers tend to report child nutrition not as isolated food items, but as combinations of foods that commonly appear together. Such patterns provide meaningful insights into reported dietary practices and support further interpretation in the context of child nutrition.
This project applies association rule mining to explore patterns in fathers’ reported child nutrition practices. Using survey data from the Engaging Men study, a set of food-related variables was selected and transformed into transaction data suitable for association rule analysis. The Apriori algorithm was used to identify frequent food combinations, and the resulting rules were evaluated using support, confidence, and lift. Multiple visualizations, including scatter plots, network graphs, parallel coordinate plots, and sensitivity analysis, were employed to examine the quality, structure, and robustness of the extracted rules. The analysis highlights clear trade-offs between rule frequency and strength and shows that meaningful rules often involve combinations of animal-source and complementary foods. Overall, the results suggest that reported food choices are structured around common food bundles rather than isolated items, providing useful insights into reported child nutrition patterns.