This report shows the initial finding of the descriptive analysis of child malnutrition in Pakistan,focusing distribution of stunting, wasting and underweight across different wealth groups.
This section prepares the environment by loading necessary R libraries and importing the data-set, ensuring variables are in the correct format for analysis.
library(haven)
## Warning: package 'haven' was built under R version 4.5.1
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(psych)
## Warning: package 'psych' was built under R version 4.5.1
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(tidyr)
library(naniar)
## Warning: package 'naniar' was built under R version 4.5.1
In this section Kids data, Household data and Women data is loading in R.
kids <- read_dta("C:/Users/ameen/OneDrive - Higher Education Commission/Documents/Talha csc scholarship china/open door/itmo docs/introduction to R and more/1 Term/reaserch in ternship/Pakistan reaserch/PKKR71DT kids/PKKR71FL.DTA")
household <- read_dta("C:/Users/ameen/OneDrive - Higher Education Commission/Documents/Talha csc scholarship china/open door/itmo docs/introduction to R and more/1 Term/reaserch in ternship/Pakistan reaserch/PKHR71DT household/PKHR71FL.DTA")
women <- read_dta("C:/Users/ameen/OneDrive - Higher Education Commission/Documents/Talha csc scholarship china/open door/itmo docs/introduction to R and more/1 Term/reaserch in ternship/Pakistan reaserch/PKIR71DT women/PKIR71FL.DTA")
In this step variables that are useful for analysis are seperated from the raw data.
kids_selective <- kids %>%
select(v001, v002, v003, b16, midx,
hw70, hw71, hw72, # nutrition outcomes
hw1, hw13, # child age + reason for missing
v190, h2:h9, m70)
household_selective <- household %>%
select(hv001, hv002, hv201, hv202, hv204, hv205, hv226) %>%
rename(v001 = hv001, v002 = hv002)
women_selective <- women %>%
select(v001, v002, v003, v467a:v467f)
This step check while merging Kids and women data is data loss during merging?
# --- 2.1. Linked check ---
kids_selective$linked <- ifelse(
paste(kids_selective$v001, kids_selective$v002, kids_selective$v003) %in%
paste(women_selective$v001, women_selective$v002, women_selective$v003),
1, 0
)
cat("\nLinked vs Unlinked children (pre-merge):\n")
##
## Linked vs Unlinked children (pre-merge):
print(table(kids_selective$linked))
##
## 1
## 12708
The results show that all 12708 children merged successfully with their respective mothers no data is lost during merging.
In this step all selective data sets are merged.
kids_selective <- kids_selective %>% mutate(b16 = as.numeric(b16))
merged_datasets <- kids_selective %>%
left_join(household_selective, by = c("v001","v002")) %>%
left_join(women_selective, by = c("v001","v002","v003"))
Summarize the variable hw1 (child’s age in months) to understand its distribution and check for missing values.
summary(merged_datasets$hw1)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 14.00 29.00 29.16 44.00 59.00 8209
The variable hw1 (children’s age in months) ranges from 0 to 59 months. The median age is 29 months, close to the mean of 29.16, suggesting a roughly symmetric distribution. The first and third quartiles are 14 and 44 months, respectively. There are 8,209 missing values, indicating a substantial amount of incomplete data.
hist(merged_datasets$hw1, main = "Age distribution of children", xlab = "Age in months")
This histogram shows the number of children at different ages (in months). Overall, the sample is well-distributed across the full 0–60 month range, with slightly more infants (0–5 months) and slightly fewer children in the oldest age group (55–60 months). This indicates a large and fairly even sample, which is suitable for studying age-related changes in malnutrition.
This counts how many children don’t have household information (hv201), showing if the merge missed any matches.
sum(is.na(merged_datasets$hv201))
## [1] 0
The result [1] 0 means that all children in the data-set have matching household information. There are no missing values in hv201, so the merge worked correctly.
In this step, we cleaned and prepared the merged dat-aset. Variables were renamed to meaningful names for easier interpretation. Children were grouped into age categories (0–11 months, 12–23 months, etc.), and anthropometry scores (stunting, underweight, wasting) were converted back to z-scores and cleaned for implausible values. Wealth index was recoded into 5 ordered categories from poorest to richest for analysis.
merged_datasets <- merged_datasets %>%
rename(
cluster_number = v001,
household_number = v002,
child_index = b16,
mother_index = v003,
stunting = hw70,
underweight = hw71,
wasting = hw72,
wealth_index = v190,
water_source = hv201,
time_to_water = hv202,
electricity = hv204,
toilet_facility = hv205,
cooking_fuel = hv226,
postnatal_care = m70,
know_where_to_go = v467a,
getting_permission_to_go = v467b,
money_needed = v467c,
distance_ti_health = v467d,
transport = v467e,
wait_to_go_alone = v467f
) %>%
mutate(
# 5.1 Child age groups
age_group = cut(hw1,
breaks = seq(0, 60, by = 12),
labels = c("0–11m", "12–23m", "24–35m", "36–47m", "48–59m"),
right = FALSE),
#5.2 Convert anthropometry scores back to z-scores
stunting = stunting / 100,
underweight = underweight / 100,
wasting = wasting / 100,
#5.3 Clean implausible values
stunting = ifelse(stunting < -6 | stunting > 6, NA, stunting),
underweight = ifelse(underweight < -6 | underweight > 6, NA, underweight),
wasting = ifelse(wasting < -5 | wasting > 5, NA, wasting),
# Wealth categories (5 groups)
wealth_cat = case_when(
wealth_index == 1 ~ "poorest",
wealth_index == 2 ~ "poorer",
wealth_index == 3 ~ "middle",
wealth_index == 4 ~ "richer",
wealth_index == 5 ~ "richest",
TRUE ~ NA_character_
),
wealth_cat = factor(wealth_cat, levels = c("poorest","poorer","middle","richer","richest"))
)
In this section, we examine how child malnutrition indicators vary across age groups (0–11 months, 12–23 months, etc.). Three key nutrition outcomes are considered: stunting (HAZ), underweight (WAZ), and wasting (WHZ).
ggplot(merged_datasets, aes(x = age_group, y = stunting, fill = age_group)) +
geom_boxplot() + theme_minimal() +
labs(title = "Stunting (HAZ) distribution by child age group", x="Age group (months)", y="HAZ")
## Warning: Removed 8567 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
ggplot(merged_datasets, aes(x = age_group, y = underweight, fill = age_group)) +
geom_boxplot() + theme_minimal() +
labs(title = "Underweight (WAZ) distribution by child age group", x="Age group (months)", y="WAZ")
## Warning: Removed 8443 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
ggplot(merged_datasets, aes(x = age_group, y = wasting, fill = age_group)) +
geom_boxplot() + theme_minimal() +
labs(title = "Wasting (WHZ) distribution by child age group", x="Age group (months)", y="WHZ")
## Warning: Removed 8557 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Stunting (HAZ) – Chronic Malnutrition: Stunting increases with age. Infants (0–11 months) are close to the healthy range, but from 12–35 months, median HAZ drops below -1.5. For older children (36–59 months), stunting remains low, showing that once it occurs, it persists.
Underweight (WAZ) – Combined Malnutrition: Underweight also worsens with age. Median WAZ is below 0 for all groups, and the lower quartiles are near -2, indicating that many children are underweight across all ages.
Wasting (WHZ) – Acute Malnutrition: Wasting is more stable, with median WHZ near 0. However, some children in all age groups fall below -2, showing ongoing risk of acute malnutrition, often due to short-term illness or food shortage.
make_line_chart <- function(data, var, var_label){
line_data <- data %>%
group_by(age_group) %>%
summarise(mean_val = mean(.data[[var]], na.rm = TRUE),
sd_val = sd(.data[[var]], na.rm = TRUE),
n = n()) %>%
mutate(se_val = sd_val / sqrt(n))
ggplot(line_data, aes(x = age_group, y = mean_val, group = 1)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 3, color = "red") +
geom_errorbar(aes(ymin = mean_val - se_val, ymax = mean_val + se_val),
width = 0.1, color = "black") +
theme_minimal() +
labs(title = paste("Average", var_label, "Z-score by Child Age Group"),
x = "Age group (months)",
y = paste("Mean", var_label, "Z-score")) +
geom_hline(yintercept = -2, linetype = "dashed", color = "darkred") +
annotate("text", x = 2.5, y = -2.1,
label = paste(var_label, "cutoff (Z = -2)"),
color = "darkred", hjust = 0)
}
# Run charts
p_stunting_line <- make_line_chart(merged_datasets, "stunting", "Stunting (HAZ)")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p_underweight_line <- make_line_chart(merged_datasets, "underweight", "Underweight (WAZ)")
p_wasting_line <- make_line_chart(merged_datasets, "wasting", "Wasting (WHZ)")
print(p_stunting_line)
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
print(p_underweight_line)
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
print(p_wasting_line)
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
Stunting (HAZ): Stunting gets worse with age. The mean HAZ drops sharply from infancy and crosses the −2 cutoff around 36–47 months, meaning older children are on average chronically malnourished.
Underweight (WAZ): Underweight also declines with age but less severely. The mean WAZ stays below the healthy average (0) in all groups but does not cross the severe −2 cutoff.
Wasting (WHZ): Wasting remains fairly stable across age groups, with the mean WHZ close to 0 and well above the −2 cutoff. This shows that acute malnutrition is less of a population-wide problem and occurs more as short-term cases.
We used heatmaps to examine how child nutrition outcomes differ
across wealth categories.
The plots show the percentage distribution of stunting, underweight, and
wasting across different levels of household wealth.
This helps visualize socioeconomic disparities in child
malnutrition.
make_single_heatmap <- function(data, var, var_label){
df <- data %>%
filter(!is.na(!!rlang::sym(var))) %>% # drop NA values
mutate(value100 = round(!!rlang::sym(var) * 100))
# DHS-style binning (-600 to 600)
breaks <- seq(-600, 600, by = 100)
labels <- paste0(breaks[-length(breaks)], "-", (breaks[-1]-1))
df <- df %>%
mutate(value_bin = cut(value100, breaks = breaks, labels = labels,
include.lowest = TRUE, right = TRUE))
heat_data <- df %>%
group_by(wealth_cat, value_bin) %>%
summarise(n = n(), .groups = "drop") %>%
complete(wealth_cat, value_bin, fill = list(n = 0)) %>%
group_by(wealth_cat) %>%
mutate(total_in_row = sum(n),
pct_within_wealth = ifelse(total_in_row==0, NA, 100*n/total_in_row)) %>%
ungroup()
ggplot(heat_data, aes(x=value_bin, y=wealth_cat, fill=pct_within_wealth)) +
geom_tile(color="white", size=0.35) +
geom_text(aes(label = ifelse(is.na(pct_within_wealth), "", sprintf("%.1f%%", pct_within_wealth))),
color="black", size=3.1) +
geom_text(aes(label = ifelse(is.na(pct_within_wealth), "", sprintf("%.1f%%", pct_within_wealth))),
color="white", size=2.6) +
scale_fill_viridis_c(name="% within wealth", option="viridis", na.value="white") +
labs(title=var_label, x=paste(var_label,"(binned)"), y="Wealth") +
theme_minimal() +
theme(axis.text.x = element_text(angle=45, hjust=1),
panel.grid = element_blank()) +
scale_y_discrete(limits = rev(levels(heat_data$wealth_cat)))
}
# Run heatmaps
p_stunting <- make_single_heatmap(merged_datasets, "stunting", "Stunting")
p_underweight <- make_single_heatmap(merged_datasets, "underweight", "Underweight")
p_wasting <- make_single_heatmap(merged_datasets, "wasting", "Wasting")
print(p_stunting)
print(p_underweight)
print(p_wasting)
Stunting (HAZ) – Clear Wealth Gradient
Stunting shows the strongest wealth gradient. Children from poorer
households are much more likely to have severe stunting, while richer
households cluster closer to the healthy range.
Underweight (WAZ) – Strong Wealth Gradient
Underweight also declines as wealth increases. Poorer children are more
concentrated in the severe underweight bins, while richer children are
closer to normal values.
Wasting (WHZ) – Weak Wealth Gradient
Wasting shows the weakest wealth gradient. Most children in all wealth
groups fall near normal values, though the poorest still face slightly
higher risks of severe wasting.
We examined missing data across all variables. The table below highlights the variables with the highest percentage of missing values to identify potential data quality issues.
missing_counts <- sapply(merged_datasets, function(x) sum(is.na(x)))
missing_percent <- round(sapply(merged_datasets, function(x) mean(is.na(x))*100), 2)
missing_table <- data.frame(
Variable = names(missing_counts),
Missing_Count = missing_counts,
Missing_Percent = missing_percent
)
missing_table[order(-missing_table$Missing_Percent), ]
## Variable Missing_Count Missing_Percent
## know_where_to_go know_where_to_go 12708 100.00
## transport transport 12708 100.00
## time_to_water time_to_water 12457 98.02
## h7d h7d 9930 78.14
## h7m h7m 9930 78.14
## h7y h7y 9930 78.14
## h8d h8d 9896 77.87
## h8m h8m 9896 77.87
## h8y h8y 9896 77.87
## h5d h5d 9604 75.57
## h5m h5m 9604 75.57
## h5y h5y 9604 75.57
## h6d h6d 9574 75.34
## h6m h6m 9574 75.34
## h6y h6y 9574 75.34
## h3d h3d 9272 72.96
## h3m h3m 9272 72.96
## h3y h3y 9272 72.96
## h4d h4d 9233 72.66
## h4m h4m 9233 72.66
## h4y h4y 9233 72.66
## h2d h2d 8961 70.51
## h2m h2m 8961 70.51
## h2y h2y 8961 70.51
## stunting stunting 8567 67.41
## wasting wasting 8557 67.34
## underweight underweight 8443 66.44
## hw1 hw1 8209 64.60
## age_group age_group 8209 64.60
## h5 h5 5557 43.73
## h7 h7 5557 43.73
## h4 h4 5556 43.72
## h6 h6 5556 43.72
## h8 h8 5556 43.72
## h2 h2 5555 43.71
## h3 h3 5555 43.71
## h9 h9 5555 43.71
## postnatal_care postnatal_care 4433 34.88
## child_index child_index 719 5.66
## distance_ti_health distance_ti_health 12 0.09
## getting_permission_to_go getting_permission_to_go 10 0.08
## money_needed money_needed 10 0.08
## wait_to_go_alone wait_to_go_alone 10 0.08
## cooking_fuel cooking_fuel 4 0.03
## hw13 hw13 3 0.02
## cluster_number cluster_number 0 0.00
## household_number household_number 0 0.00
## mother_index mother_index 0 0.00
## midx midx 0 0.00
## wealth_index wealth_index 0 0.00
## linked linked 0 0.00
## water_source water_source 0 0.00
## electricity electricity 0 0.00
## toilet_facility toilet_facility 0 0.00
## wealth_cat wealth_cat 0 0.00
In this section we check reason for messingness.
merged_datasets <- merged_datasets %>%
mutate(hw13_reason = case_when(
hw13 == 0 ~ "Measured",
hw13 == 1 ~ "Child died",
hw13 == 2 ~ "Child sick",
hw13 == 3 ~ "Not present",
hw13 == 4 ~ "Refused",
hw13 == 5 ~ "Mother refused",
hw13 == 6 ~ "Other",
hw13 == 7 ~ "No measurement in household",
TRUE ~ NA_character_
))
cat("\nReasons for missing stunting/underweight data (hw13):\n")
##
## Reasons for missing stunting/underweight data (hw13):
print(table(merged_datasets$hw13_reason, useNA="ifany"))
##
## Child died Measured
## 719 4227
## No measurement in household Not present
## 7490 111
## Other Refused
## 72 86
## <NA>
## 3
Most missing cases are due to children not being measured, followed by child deaths, refusals, or absence.
In this section different plot help to visualize missingness pattern.
# Missingness across variables
gg_miss_var(merged_datasets)
# Upset plot for overlapping missingness
gg_miss_upset(merged_datasets)
# Missingness by wealth group
gg_miss_var(merged_datasets, facet = "wealth_cat")
Missingness Across Wealth Groups
The missingness in stunting, underweight, and wasting is roughly uniform across wealth categories. This is expected because:
Survey Design: PDHS/MICS uses standardized data collection protocols, so the likelihood of a measurement being missing (due to absence, refusal, sickness, or other reasons) is generally independent of household wealth.
Wealth Index Construction: The wealth index is internally defined within the survey sample, splitting households into quintiles relative to others in the same country. Therefore, “poorest” and “richest” are relative categories, not absolute wealth levels. Missingness patterns caused by logistical factors (like a child being absent or sick) affect all households similarly.
Field Practicalities: Measurement errors, refusals, or children not present at the time of survey occur randomly across households, making the missingness appear uniform across all wealth groups.
Conclusion: Uniform missingness across wealth groups does not indicate data fabrication; rather, it reflects the design and practical constraints of the survey.
To explore how household and healthcare-related factors cluster together, we calculated Spearman correlations among key structural variables (e.g., water source, electricity, sanitation, cooking fuel). Correlations were also examined separately by wealth groups to assess whether these relationships vary by socioeconomic status.
library(dplyr)
library(ggcorrplot)
## Warning: package 'ggcorrplot' was built under R version 4.5.1
# --- Select required variables including new ones ---
structural_variables <- merged_datasets %>%
select(any_of(c("toilet_facility", "water_source", "postnatal_care", "cooking_fuel",
"distance_ti_health", "wait_to_go_alone",
"getting_permission_to_go", "money_needed")))
# Convert all to numeric
struct_vars_num <- structural_variables %>% mutate(across(everything(), as.numeric))
# --- Overall correlation ---
corr_matrix <- cor(struct_vars_num, use = "pairwise.complete.obs", method = "spearman")
corr_matrix[is.na(corr_matrix)] <- 0 # replace NAs
# Plot overall correlation heatmap
ggcorrplot(corr_matrix,
lab = TRUE, lab_size = 4,
hc.order = TRUE,
type = "lower",
colors = c("red", "white", "blue"),
title = "Overall Correlation of Structural Variables",
ggtheme = theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.text.y = element_text(size = 10)))
# --- Correlation by wealth group ---
for (grp in levels(merged_datasets$wealth_cat)) {
subset_data <- struct_vars_num[merged_datasets$wealth_cat == grp, ]
# Correlation for this wealth group
corr_matrix_grp <- cor(subset_data, use = "pairwise.complete.obs", method = "spearman")
corr_matrix_grp[is.na(corr_matrix_grp)] <- 0 # replace NAs
print(
ggcorrplot(corr_matrix_grp,
lab = TRUE, lab_size = 4,
hc.order = TRUE,
type = "lower",
colors = c("red", "white", "blue"),
title = paste("Correlation by Wealth Group:", grp),
ggtheme = theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.text.y = element_text(size = 10)))
)
}
The overall heatmap shows that access barriers are strongly linked. For example, needing permission and money are closely related (0.64), and distance to a health facility is linked with time or needing to go alone (0.50–0.58). Structural factors like toilet, water, and cooking fuel are largely independent of these access barriers, with weak or negative correlations (-0.21 to -0.23).
Poorest: Barriers are tightly clustered. Permission, money, and distance show strong correlations (0.53–0.62). Postnatal care use is slightly positively linked with overcoming money (0.2) and permission barriers (0.13).
Richest: Correlations among barriers are weaker, often near zero (e.g., Permission–Money: 0.32). Barriers are less consistently linked.
Poorer, Middle, Richer: These groups show moderate clustering among barriers (Permission–Distance: 0.60–0.67). Structural factors and postnatal care remain mostly unlinked (near zero or slightly negative).
Conclusion: Access barriers tend to cluster more in poorer households, while richer households experience barriers more independently. Structural household factors generally do not predict healthcare access.
I created a summary barrier index by combining all reported healthcare access barriers, including getting permission to go, money needed, distance to the health facility, and not wanting to go alone. The index ranges from 0 to 4, where higher values indicate a greater number of barriers faced.
# --- Clean and create Summary Barrier Index ---
library(dplyr)
# Step 1: Check unique values
sapply(merged_datasets[, c("getting_permission_to_go", "money_needed",
"distance_ti_health", "transport",
"wait_to_go_alone")], unique)
## $getting_permission_to_go
## <labelled<double>[3]>: getting medical help for self: getting permission to go
## [1] 2 1 NA
##
## Labels:
## value label
## 0 no problem
## 1 big problem
## 2 not a big problem
##
## $money_needed
## <labelled<double>[3]>: getting medical help for self: getting money needed for treatment
## [1] 2 1 NA
##
## Labels:
## value label
## 0 no problem
## 1 big problem
## 2 not a big problem
##
## $distance_ti_health
## <labelled<double>[3]>: getting medical help for self: distance to health facility
## [1] 2 1 NA
##
## Labels:
## value label
## 0 no problem
## 1 big problem
## 2 not a big problem
##
## $transport
## <labelled<double>[1]>: na - getting medical help for self: having to take transport
## [1] NA
##
## Labels:
## value label
## 0 no problem
## 1 big problem
## 2 not a big problem
##
## $wait_to_go_alone
## <labelled<double>[3]>: getting medical help for self: not wanting to go alone
## [1] 2 1 NA
##
## Labels:
## value label
## 0 no problem
## 1 big problem
## 2 not a big problem
library(dplyr)
# Recode and create the summary barrier index
merged_datasets <- merged_datasets %>%
mutate(across(c(getting_permission_to_go, money_needed, distance_ti_health,
transport, wait_to_go_alone),
~ case_when(
. == 1 ~ 1, # Big problem → barrier present
. %in% c(0, 2) ~ 0, # No or small problem → no barrier
TRUE ~ NA_real_ # Keep NA
))) %>%
mutate(barrier_summary = rowSums(across(c(getting_permission_to_go,
money_needed,
distance_ti_health,
transport,
wait_to_go_alone)),
na.rm = TRUE))
# View summary of the new variable
summary(merged_datasets$barrier_summary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.000 1.833 3.000 4.000
table(merged_datasets$barrier_summary)
##
## 0 1 2 3 4
## 3379 2354 2700 1556 2719
# Average number of barriers by wealth group
merged_datasets %>%
group_by(wealth_cat) %>%
summarise(
mean_barriers = mean(barrier_summary, na.rm = TRUE),
sd_barriers = sd(barrier_summary, na.rm = TRUE),
n = n()
)
## # A tibble: 5 × 4
## wealth_cat mean_barriers sd_barriers n
## <fct> <dbl> <dbl> <int>
## 1 poorest 2.40 1.45 2907
## 2 poorer 2.33 1.41 2905
## 3 middle 1.85 1.44 2510
## 4 richer 1.31 1.36 2222
## 5 richest 0.919 1.10 2164
ggplot(merged_datasets, aes(x = wealth_cat, y = barrier_summary, fill = wealth_cat)) +
geom_boxplot() +
theme_minimal() +
labs(
title = "Distribution of Access Barriers by Wealth Group",
x = "Wealth Group",
y = "Number of Barriers"
) +
scale_fill_brewer(palette = "Blues")
The mean number of healthcare access barriers decreases steadily with higher wealth status — from 2.40 in the poorest to 0.92 in the richest group. This shows that poorer households face more overlapping barriers to healthcare, while wealthier households experience fewer and more isolated ones.
I also created an any barrier index, which shows whether a woman experienced at least one barrier to healthcare access (1 = any barrier, 0 = no barrier). This provides a simpler, binary view of the overall access problems.
# --- Create "Any Barrier" Index ---
merged_datasets <- merged_datasets %>%
mutate(
any_barrier = ifelse(barrier_summary > 0, 1, 0)
)
# View basic results
table(merged_datasets$any_barrier)
##
## 0 1
## 3379 9329
prop.table(table(merged_datasets$any_barrier))
##
## 0 1
## 0.2658955 0.7341045
# by wealth groups
merged_datasets %>%
group_by(wealth_cat) %>%
summarise(
percent_with_any_barrier = mean(any_barrier, na.rm = TRUE) * 100,
n = n()
)
## # A tibble: 5 × 3
## wealth_cat percent_with_any_barrier n
## <fct> <dbl> <int>
## 1 poorest 84.3 2907
## 2 poorer 84.6 2905
## 3 middle 75.5 2510
## 4 richer 61.7 2222
## 5 richest 53.3 2164
library(ggplot2)
# Bar plot of % women with any barrier by wealth group
merged_datasets %>%
group_by(wealth_cat) %>%
summarise(percent_with_any_barrier = mean(any_barrier, na.rm = TRUE) * 100) %>%
ggplot(aes(x = wealth_cat, y = percent_with_any_barrier, fill = wealth_cat)) +
geom_col(width = 0.7) +
geom_text(aes(label = sprintf("%.1f%%", percent_with_any_barrier)),
vjust = -0.5, size = 4) +
scale_fill_brewer(palette = "Blues") +
theme_minimal() +
labs(
title = "Percentage of Women Facing Any Barrier to Healthcare Access",
x = "Wealth Group",
y = "Percent with Any Barrier"
) +
theme(
legend.position = "none",
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
axis.text.x = element_text(size = 11),
axis.text.y = element_text(size = 11)
) +
ylim(0, 100)
The share of households facing at least one healthcare access barrier declines sharply with wealth — from 84% among the poorest to 53% among the richest. This means that most poor households encounter some difficulty accessing healthcare, while barriers are less common among wealthier groups.
In this section, I test the association between categorical variables such as child malnutrition indicators (stunted, underweight, wasted), household & Enviromental factors using the Chi-square test of independence.
I convert nutritional outcomes into binary categories based on standard cutoffs (Z < -2 indicates malnutrition).
# --- Drop variables with >95% missing ---
vars_to_drop <- missing_table$Variable[missing_table$Missing_Percent > 95]
merged_datasets <- merged_datasets %>% select(-all_of(vars_to_drop))
cat("\nDropped variables due to excessive missingness (>95%):\n")
##
## Dropped variables due to excessive missingness (>95%):
print(vars_to_drop)
## [1] "time_to_water" "know_where_to_go" "transport"
#binary
merged_datasets <- merged_datasets %>%
mutate(
stunted = ifelse(stunting < -2, 1, 0),
underweight_bin = ifelse(underweight < -2, 1, 0),
wasting_bin = ifelse(wasting < -2, 1, 0)
)
# Count number of stunted children (1 = stunted)
table(merged_datasets$stunted)
##
## 0 1
## 2561 1580
# Count number of underweight children
table(merged_datasets$underweight_bin)
##
## 0 1
## 3281 984
# Count number of wasted children
table(merged_datasets$wasting_bin)
##
## 0 1
## 3817 334
# --- Install and load required packages ---
if (!requireNamespace("DescTools", quietly = TRUE)) {
install.packages("DescTools")
}
library(DescTools)
## Warning: package 'DescTools' was built under R version 4.5.1
##
## Attaching package: 'DescTools'
## The following objects are masked from 'package:psych':
##
## AUC, ICC, SD
library(dplyr)
dep_vars <- c("stunted", "underweight_bin", "wasting_bin")
indep_vars <- c("wealth_cat", "wait_to_go_alone", "money_needed",
"getting_permission_to_go", "distance_ti_health",
"postnatal_care", "cooking_fuel", "toilet_facility", "water_source")
# create empty results data frame
chi_results <- data.frame()
for (dv in dep_vars) {
for (iv in indep_vars) {
# check both variables exist
if (!(dv %in% names(merged_datasets)) || !(iv %in% names(merged_datasets))) next
# drop missing data
df_sub <- merged_datasets %>% select(all_of(c(dv, iv))) %>% drop_na()
# skip if empty
if (nrow(df_sub) == 0) next
# contingency table
tbl <- table(df_sub[[dv]], df_sub[[iv]])
if (sum(tbl) == 0) next
# chi-square test
chi <- suppressWarnings(chisq.test(tbl))
cv <- suppressWarnings(CramerV(tbl))
# store results
chi_results <- rbind(chi_results, data.frame(
Dependent = dv,
Independent = iv,
Chi_Square = round(chi$statistic, 3),
df = chi$parameter,
p_value = round(chi$p.value, 5),
Cramers_V = round(cv, 3)
))
}
}
# print results in console
print(chi_results)
## Dependent Independent Chi_Square df p_value
## X-squared stunted wealth_cat 282.733 4 0.00000
## X-squared1 stunted wait_to_go_alone 22.849 1 0.00000
## X-squared2 stunted money_needed 94.337 1 0.00000
## X-squared3 stunted getting_permission_to_go 31.053 1 0.00000
## X-squared4 stunted distance_ti_health 45.459 1 0.00000
## X-squared5 stunted postnatal_care 17.167 2 0.00019
## X-squared6 stunted cooking_fuel 121.319 10 0.00000
## X-squared7 stunted toilet_facility 190.003 12 0.00000
## X-squared8 stunted water_source 88.448 15 0.00000
## X-squared9 underweight_bin wealth_cat 285.524 4 0.00000
## X-squared10 underweight_bin wait_to_go_alone 12.443 1 0.00042
## X-squared11 underweight_bin money_needed 60.703 1 0.00000
## X-squared12 underweight_bin getting_permission_to_go 44.550 1 0.00000
## X-squared13 underweight_bin distance_ti_health 20.993 1 0.00000
## X-squared14 underweight_bin postnatal_care 5.647 2 0.05938
## X-squared15 underweight_bin cooking_fuel 150.952 10 0.00000
## X-squared16 underweight_bin toilet_facility 206.585 12 0.00000
## X-squared17 underweight_bin water_source 110.240 15 0.00000
## X-squared18 wasting_bin wealth_cat 32.504 4 0.00000
## X-squared19 wasting_bin wait_to_go_alone 4.054 1 0.04407
## X-squared20 wasting_bin money_needed 11.803 1 0.00059
## X-squared21 wasting_bin getting_permission_to_go 11.235 1 0.00080
## X-squared22 wasting_bin distance_ti_health 8.481 1 0.00359
## X-squared23 wasting_bin postnatal_care 7.237 2 0.02682
## X-squared24 wasting_bin cooking_fuel 17.070 10 0.07283
## X-squared25 wasting_bin toilet_facility 43.494 12 0.00002
## X-squared26 wasting_bin water_source 73.901 15 0.00000
## Cramers_V
## X-squared 0.261
## X-squared1 0.075
## X-squared2 0.152
## X-squared3 0.087
## X-squared4 0.105
## X-squared5 0.079
## X-squared6 0.171
## X-squared7 0.214
## X-squared8 0.146
## X-squared9 0.259
## X-squared10 0.055
## X-squared11 0.120
## X-squared12 0.103
## X-squared13 0.071
## X-squared14 0.045
## X-squared15 0.188
## X-squared16 0.220
## X-squared17 0.161
## X-squared18 0.088
## X-squared19 0.032
## X-squared20 0.054
## X-squared21 0.053
## X-squared22 0.046
## X-squared23 0.051
## X-squared24 0.064
## X-squared25 0.102
## X-squared26 0.133
Chi-square tests examined whether child malnutrition (stunting, underweight, wasting) is associated with household and environmental factors. Household wealth showed the strongest association with stunting (V = 0.261) and underweight (V = 0.259), while toilet facility was also strongly linked to both outcomes (V = 0.214–0.220). Overall, 25 of 27 tests were statistically significant (p < 0.05), indicating most factors are related to nutritional status. Only underweight vs. postnatal care (p = 0.059) and wasting vs. cooking fuel (p = 0.073) were not significant. Associations with wasting were generally weaker (e.g., water source V = 0.133, toilet V = 0.102), suggesting different factors may drive acute malnutrition. In short, wealth and sanitation are the key determinants of chronic malnutrition, while acute malnutrition shows weaker associations.
Although Chi square show association between different variables but it is not trusworthy because of large sample size so we will more rely on cram V results.
Cramér’s V values indicated that the strength of association between malnutrition indicators and socioeconomic or environmental factors ranged from very weak to moderate. For stunting and underweight, wealth category (V ≈ 0.26) and toilet facility (V ≈ 0.21–0.22) showed moderate associations, highlighting the importance of economic status and sanitation in child nutrition. Other variables such as money needed for treatment, distance to healthcare, cooking fuel, and water source demonstrated weak relationships. In contrast, all associations with wasting were weak or very weak (V ≤ 0.13), suggesting that acute malnutrition is influenced by additional short-term or biological factors. Overall, the findings indicate that while socioeconomic and environmental factors are related to child nutritional outcomes, their effects are generally moderate or weak, reflecting the multifactorial nature of malnutrition.
In this section we remove all missing values in all variables wether its dependent or independent and create a full clean data set which used further in logistic regression models.
# Household independent variable
household_vars <- c("wealth_cat")
# Structural independent variables
structural_vars <- c("toilet_facility", "water_source", "postnatal_care",
"cooking_fuel", "distance_ti_health", "wait_to_go_alone",
"getting_permission_to_go", "money_needed")
# Remove rows with missing household variable
household_clean <- merged_datasets %>%
select(all_of(household_vars)) %>%
drop_na()
# Remove rows with missing structural variables
structural_clean <- merged_datasets %>%
select(all_of(structural_vars)) %>%
drop_na()
# Check the dimensions
cat("Household variable cleaned data dimensions:", dim(household_clean), "\n")
## Household variable cleaned data dimensions: 12708 1
cat("Structural variables cleaned data dimensions:", dim(structural_clean), "\n")
## Structural variables cleaned data dimensions: 8265 8
# Inspect first few rows
head(household_clean)
## # A tibble: 6 × 1
## wealth_cat
## <fct>
## 1 poorest
## 2 poorer
## 3 poorer
## 4 poorest
## 5 poorest
## 6 poorer
head(structural_clean)
## # A tibble: 6 × 8
## toilet_facility water_source postnatal_care cooking_fuel distance_ti_health
## <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl>
## 1 13 [flush to pit … 42 [unprote… 0 [no] 8 [wood] 0
## 2 13 [flush to pit … 43 [river/d… 1 [yes] 8 [wood] 1
## 3 13 [flush to pit … 43 [river/d… 0 [no] 8 [wood] 1
## 4 13 [flush to pit … 42 [unprote… 0 [no] 8 [wood] 1
## 5 13 [flush to pit … 41 [protect… 0 [no] 8 [wood] 1
## 6 13 [flush to pit … 41 [protect… 0 [no] 8 [wood] 1
## # ℹ 3 more variables: wait_to_go_alone <dbl>, getting_permission_to_go <dbl>,
## # money_needed <dbl>
# Number of rows before removing NAs
n_before <- nrow(merged_datasets %>% select(all_of(structural_vars)))
cat("Number of rows with structural variables before removing NAs:", n_before, "\n")
## Number of rows with structural variables before removing NAs: 12708
# Number of rows after removing NAs
n_after <- nrow(structural_clean)
cat("Number of rows with structural variables after removing NAs:", n_after, "\n")
## Number of rows with structural variables after removing NAs: 8265
# Optional: how many rows were removed
cat("Number of rows removed due to missing structural variables:", n_before - n_after, "\n")
## Number of rows removed due to missing structural variables: 4443
# Remove rows with missing values in malnutrition outcomes
malnutrition_clean <- merged_datasets %>%
drop_na(stunting, underweight, wasting)
# Number of rows before removing NAs
n_before <- nrow(merged_datasets)
cat("Number of rows before removing missing malnutrition outcomes:", n_before, "\n")
## Number of rows before removing missing malnutrition outcomes: 12708
# Number of rows after removing NAs
n_after <- nrow(malnutrition_clean)
cat("Number of rows after removing missing malnutrition outcomes:", n_after, "\n")
## Number of rows after removing missing malnutrition outcomes: 4098
# Number of rows removed
cat("Number of rows removed due to missing malnutrition outcomes:", n_before - n_after, "\n")
## Number of rows removed due to missing malnutrition outcomes: 8610
# Number of missing values in each malnutrition outcome
na_stunting <- sum(is.na(merged_datasets$stunting))
na_underweight <- sum(is.na(merged_datasets$underweight))
na_wasting <- sum(is.na(merged_datasets$wasting))
cat("Missing stunting values:", na_stunting, "\n")
## Missing stunting values: 8567
cat("Missing underweight values:", na_underweight, "\n")
## Missing underweight values: 8443
cat("Missing wasting values:", na_wasting, "\n")
## Missing wasting values: 8557
# Check missing values in the cleaned dataset
sum(is.na(malnutrition_clean$stunting)) # Should be 0
## [1] 0
sum(is.na(malnutrition_clean$underweight)) # Should be 0
## [1] 0
sum(is.na(malnutrition_clean$wasting)) # Should be 0
## [1] 0
# Remove rows with missing child age (hw1) in merged_datasets
merged_datasets <- merged_datasets %>%
drop_na(hw1)
# Count missing values in hw1
sum(is.na(merged_datasets$hw1))
## [1] 0
## converting to binary variables;
merged_datasets <- merged_datasets %>%
mutate(
stunted = ifelse(stunting < -2, 1, 0),
underweight_bin = ifelse(underweight < -2, 1, 0),
wasting_bin = ifelse(wasting < -2, 1, 0)
)
# Define all variables to keep
all_vars <- c("hw1",
"stunting", "underweight", "wasting",
"stunted", "underweight_bin", "wasting_bin",
"wealth_cat",
"toilet_facility", "water_source", "postnatal_care",
"cooking_fuel", "distance_ti_health", "wait_to_go_alone",
"getting_permission_to_go", "money_needed")
# Create a fully cleaned dataset with no missing values in any of the variables
full_clean <- merged_datasets %>%
select(all_of(all_vars)) %>%
drop_na()
# Check number of rows after cleaning
cat("Number of rows in fully cleaned dataset:", nrow(full_clean), "\n")
## Number of rows in fully cleaned dataset: 2709
# Check first few rows
head(full_clean)
## # A tibble: 6 × 16
## hw1 stunting underweight wasting stunted underweight_bin wasting_bin
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 42 -3.48 -2.33 -0.41 1 1 0
## 2 26 -2.67 -2.43 -1.24 1 1 0
## 3 10 2.33 -1.35 -3.31 0 0 1
## 4 34 -3.04 -1.65 0.13 1 0 0
## 5 15 -0.71 -2.92 -3.6 0 1 1
## 6 4 3.77 -1.41 -4.61 0 0 1
## # ℹ 9 more variables: wealth_cat <fct>, toilet_facility <dbl+lbl>,
## # water_source <dbl+lbl>, postnatal_care <dbl+lbl>, cooking_fuel <dbl+lbl>,
## # distance_ti_health <dbl>, wait_to_go_alone <dbl>,
## # getting_permission_to_go <dbl>, money_needed <dbl>
# Optional: summary of cleaned dataset
summary(full_clean)
## hw1 stunting underweight wasting
## Min. : 0.0 Min. :-5.99 Min. :-5.900 Min. :-4.9400
## 1st Qu.: 9.0 1st Qu.:-2.50 1st Qu.:-1.810 1st Qu.:-1.0500
## Median :20.0 Median :-1.39 Median :-0.990 Median :-0.2300
## Mean :22.8 Mean :-1.41 Mean :-1.024 Mean :-0.2851
## 3rd Qu.:35.0 3rd Qu.:-0.32 3rd Qu.:-0.150 3rd Qu.: 0.5700
## Max. :59.0 Max. : 5.81 Max. : 3.900 Max. : 4.9900
## stunted underweight_bin wasting_bin wealth_cat
## Min. :0.0000 Min. :0.0000 Min. :0.00000 poorest:555
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 poorer :665
## Median :0.0000 Median :0.0000 Median :0.00000 middle :506
## Mean :0.3536 Mean :0.2097 Mean :0.09413 richer :481
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 richest:502
## Max. :1.0000 Max. :1.0000 Max. :1.00000
## toilet_facility water_source postnatal_care cooking_fuel
## Min. :11.00 Min. :11.00 Min. :0.000 Min. : 1.000
## 1st Qu.:12.00 1st Qu.:12.00 1st Qu.:0.000 1st Qu.: 3.000
## Median :13.00 Median :21.00 Median :0.000 Median : 8.000
## Mean :15.97 Mean :23.45 Mean :0.285 Mean : 5.779
## 3rd Qu.:13.00 3rd Qu.:21.00 3rd Qu.:1.000 3rd Qu.: 8.000
## Max. :96.00 Max. :96.00 Max. :8.000 Max. :11.000
## distance_ti_health wait_to_go_alone getting_permission_to_go money_needed
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :0.0000 Median :0.0000
## Mean :0.5109 Mean :0.6704 Mean :0.2794 Mean :0.3736
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## household factors :
# Load required packages
library(dplyr)
library(broom) # for tidy model output
library(margins) # for marginal effects
## Warning: package 'margins' was built under R version 4.5.1
library(pscl) # for pseudo R-squared
## Warning: package 'pscl' was built under R version 4.5.1
## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
# 1. Logistic regression models WITHOUT child age (hw1)
model_stunted_wealth <- glm(stunted ~ wealth_cat, data = full_clean, family = binomial)
model_underweight_wealth <- glm(underweight_bin ~ wealth_cat, data = full_clean, family = binomial)
model_wasting_wealth <- glm(wasting_bin ~ wealth_cat, data = full_clean, family = binomial)
# 2. Logistic regression models WITH child age (hw1)
model_stunted_wealth_age <- glm(stunted ~ wealth_cat + hw1, data = full_clean, family = binomial)
model_underweight_wealth_age <- glm(underweight_bin ~ wealth_cat + hw1, data = full_clean, family = binomial)
model_wasting_wealth_age <- glm(wasting_bin ~ wealth_cat + hw1, data = full_clean, family = binomial)
# 3. Model summaries
summary(model_stunted_wealth)
##
## Call:
## glm(formula = stunted ~ wealth_cat, family = binomial, data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1698 0.0852 1.993 0.0463 *
## wealth_catpoorer -0.6702 0.1169 -5.734 9.78e-09 ***
## wealth_catmiddle -0.8600 0.1271 -6.768 1.30e-11 ***
## wealth_catricher -1.1421 0.1330 -8.584 < 2e-16 ***
## wealth_catrichest -1.4997 0.1389 -10.795 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3519.9 on 2708 degrees of freedom
## Residual deviance: 3371.7 on 2704 degrees of freedom
## AIC: 3381.7
##
## Number of Fisher Scoring iterations: 4
summary(model_stunted_wealth_age)
##
## Call:
## glm(formula = stunted ~ wealth_cat + hw1, family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.481891 0.104488 -4.612 3.99e-06 ***
## wealth_catpoorer -0.724487 0.120273 -6.024 1.70e-09 ***
## wealth_catmiddle -0.888319 0.130440 -6.810 9.75e-12 ***
## wealth_catricher -1.233227 0.137104 -8.995 < 2e-16 ***
## wealth_catrichest -1.565836 0.142601 -10.981 < 2e-16 ***
## hw1 0.029355 0.002644 11.103 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3519.9 on 2708 degrees of freedom
## Residual deviance: 3243.4 on 2703 degrees of freedom
## AIC: 3255.4
##
## Number of Fisher Scoring iterations: 4
summary(model_underweight_wealth)
##
## Call:
## glm(formula = underweight_bin ~ wealth_cat, family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.55820 0.08822 -6.327 2.50e-10 ***
## wealth_catpoorer -0.55046 0.12587 -4.373 1.22e-05 ***
## wealth_catmiddle -1.04178 0.14807 -7.036 1.98e-12 ***
## wealth_catricher -1.11495 0.15300 -7.287 3.17e-13 ***
## wealth_catrichest -1.88849 0.18694 -10.102 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2782.3 on 2708 degrees of freedom
## Residual deviance: 2629.9 on 2704 degrees of freedom
## AIC: 2639.9
##
## Number of Fisher Scoring iterations: 5
summary(model_underweight_wealth_age)
##
## Call:
## glm(formula = underweight_bin ~ wealth_cat + hw1, family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.899767 0.113112 -7.955 1.80e-15 ***
## wealth_catpoorer -0.566187 0.126735 -4.467 7.91e-06 ***
## wealth_catmiddle -1.044240 0.148829 -7.016 2.28e-12 ***
## wealth_catricher -1.141753 0.153984 -7.415 1.22e-13 ***
## wealth_catrichest -1.900188 0.187634 -10.127 < 2e-16 ***
## hw1 0.014760 0.002968 4.972 6.62e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2782.3 on 2708 degrees of freedom
## Residual deviance: 2605.3 on 2703 degrees of freedom
## AIC: 2617.3
##
## Number of Fisher Scoring iterations: 5
summary(model_wasting_wealth)
##
## Call:
## glm(formula = wasting_bin ~ wealth_cat, family = binomial, data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.95210 0.12865 -15.174 < 2e-16 ***
## wealth_catpoorer 0.05844 0.17255 0.339 0.73487
## wealth_catmiddle -0.47636 0.20758 -2.295 0.02174 *
## wealth_catricher -0.62410 0.21954 -2.843 0.00447 **
## wealth_catrichest -1.03945 0.24557 -4.233 2.31e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1690.4 on 2708 degrees of freedom
## Residual deviance: 1655.9 on 2704 degrees of freedom
## AIC: 1665.9
##
## Number of Fisher Scoring iterations: 5
summary(model_wasting_wealth_age)
##
## Call:
## glm(formula = wasting_bin ~ wealth_cat + hw1, family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.621652 0.155227 -10.447 < 2e-16 ***
## wealth_catpoorer 0.067392 0.173125 0.389 0.697077
## wealth_catmiddle -0.486002 0.208105 -2.335 0.019524 *
## wealth_catricher -0.614165 0.220071 -2.791 0.005258 **
## wealth_catrichest -1.048078 0.246018 -4.260 2.04e-05 ***
## hw1 -0.015612 0.004374 -3.570 0.000357 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1690.4 on 2708 degrees of freedom
## Residual deviance: 1642.5 on 2703 degrees of freedom
## AIC: 1654.5
##
## Number of Fisher Scoring iterations: 5
# 4. Pseudo R-squared
pR2(model_stunted_wealth)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML
## -1.685863e+03 -1.759952e+03 1.481777e+02 4.209709e-02 5.322926e-02
## r2CU
## 7.318875e-02
pR2(model_stunted_wealth_age)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML
## -1.621718e+03 -1.759952e+03 2.764672e+02 7.854397e-02 9.702019e-02
## r2CU
## 1.334001e-01
pR2(model_underweight_wealth)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML
## -1.314954e+03 -1.391129e+03 1.523484e+02 5.475711e-02 5.468574e-02
## r2CU
## 8.518885e-02
pR2(model_underweight_wealth_age)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML
## -1.302641e+03 -1.391129e+03 1.769751e+02 6.360845e-02 6.324037e-02
## r2CU
## 9.851515e-02
pR2(model_wasting_wealth)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML
## -827.93319436 -845.18615278 34.50591683 0.02041321 0.01265673
## r2CU
## 0.02726602
pR2(model_wasting_wealth_age)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML
## -821.24752102 -845.18615278 47.87726352 0.02832350 0.01751815
## r2CU
## 0.03773882
# 5. AIC/BIC for model comparison
AIC(model_stunted_wealth, model_stunted_wealth_age)
## df AIC
## model_stunted_wealth 5 3381.726
## model_stunted_wealth_age 6 3255.437
BIC(model_stunted_wealth, model_stunted_wealth_age)
## df BIC
## model_stunted_wealth 5 3411.248
## model_stunted_wealth_age 6 3290.863
AIC(model_underweight_wealth, model_underweight_wealth_age)
## df AIC
## model_underweight_wealth 5 2639.909
## model_underweight_wealth_age 6 2617.282
BIC(model_underweight_wealth, model_underweight_wealth_age)
## df BIC
## model_underweight_wealth 5 2669.430
## model_underweight_wealth_age 6 2652.708
AIC(model_wasting_wealth, model_wasting_wealth_age)
## df AIC
## model_wasting_wealth 5 1665.866
## model_wasting_wealth_age 6 1654.495
BIC(model_wasting_wealth, model_wasting_wealth_age)
## df BIC
## model_wasting_wealth 5 1695.388
## model_wasting_wealth_age 6 1689.921
# 6. Marginal effects
margins(model_stunted_wealth)
## Average marginal effects
## glm(formula = stunted ~ wealth_cat, family = binomial, data = full_clean)
## wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
## -0.1649 -0.2084 -0.2679 -0.3332
margins(model_stunted_wealth_age)
## Average marginal effects
## glm(formula = stunted ~ wealth_cat + hw1, family = binomial, data = full_clean)
## hw1 wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
## 0.006038 -0.1689 -0.2045 -0.2738 -0.3318
margins(model_underweight_wealth)
## Average marginal effects
## glm(formula = underweight_bin ~ wealth_cat, family = binomial, data = full_clean)
## wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
## -0.1158 -0.196 -0.206 -0.2843
margins(model_underweight_wealth_age)
## Average marginal effects
## glm(formula = underweight_bin ~ wealth_cat + hw1, family = binomial, data = full_clean)
## hw1 wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
## 0.002287 -0.1176 -0.1949 -0.2081 -0.2842
margins(model_wasting_wealth)
## Average marginal effects
## glm(formula = wasting_bin ~ wealth_cat, family = binomial, data = full_clean)
## wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
## 0.006503 -0.0433 -0.05364 -0.07652
margins(model_wasting_wealth_age)
## Average marginal effects
## glm(formula = wasting_bin ~ wealth_cat + hw1, family = binomial, data = full_clean)
## hw1 wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
## -0.001308 0.007463 -0.04375 -0.05269 -0.07657
# 7. Create tidy tables with coefficients and 95% confidence intervals
tidy(model_stunted_wealth, conf.int = TRUE)
## # A tibble: 5 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.170 0.0852 1.99 4.63e- 2 0.00308 0.337
## 2 wealth_catpoorer -0.670 0.117 -5.73 9.78e- 9 -0.900 -0.442
## 3 wealth_catmiddle -0.860 0.127 -6.77 1.30e-11 -1.11 -0.612
## 4 wealth_catricher -1.14 0.133 -8.58 9.16e-18 -1.40 -0.883
## 5 wealth_catrichest -1.50 0.139 -10.8 3.64e-27 -1.78 -1.23
tidy(model_stunted_wealth_age, conf.int = TRUE)
## # A tibble: 6 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.482 0.104 -4.61 3.99e- 6 -0.687 -0.277
## 2 wealth_catpoorer -0.724 0.120 -6.02 1.70e- 9 -0.961 -0.489
## 3 wealth_catmiddle -0.888 0.130 -6.81 9.75e-12 -1.15 -0.634
## 4 wealth_catricher -1.23 0.137 -8.99 2.37e-19 -1.50 -0.966
## 5 wealth_catrichest -1.57 0.143 -11.0 4.74e-28 -1.85 -1.29
## 6 hw1 0.0294 0.00264 11.1 1.21e-28 0.0242 0.0346
tidy(model_underweight_wealth, conf.int = TRUE)
## # A tibble: 5 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.558 0.0882 -6.33 2.50e-10 -0.733 -0.387
## 2 wealth_catpoorer -0.550 0.126 -4.37 1.22e- 5 -0.798 -0.304
## 3 wealth_catmiddle -1.04 0.148 -7.04 1.98e-12 -1.34 -0.755
## 4 wealth_catricher -1.11 0.153 -7.29 3.17e-13 -1.42 -0.819
## 5 wealth_catrichest -1.89 0.187 -10.1 5.42e-24 -2.27 -1.53
tidy(model_underweight_wealth_age, conf.int = TRUE)
## # A tibble: 6 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.900 0.113 -7.95 1.80e-15 -1.12 -0.680
## 2 wealth_catpoorer -0.566 0.127 -4.47 7.91e- 6 -0.815 -0.318
## 3 wealth_catmiddle -1.04 0.149 -7.02 2.28e-12 -1.34 -0.756
## 4 wealth_catricher -1.14 0.154 -7.41 1.22e-13 -1.45 -0.844
## 5 wealth_catrichest -1.90 0.188 -10.1 4.19e-24 -2.28 -1.54
## 6 hw1 0.0148 0.00297 4.97 6.62e- 7 0.00894 0.0206
tidy(model_wasting_wealth, conf.int = TRUE)
## # A tibble: 5 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -1.95 0.129 -15.2 5.26e-52 -2.21 -1.71
## 2 wealth_catpoorer 0.0584 0.173 0.339 7.35e- 1 -0.279 0.399
## 3 wealth_catmiddle -0.476 0.208 -2.29 2.17e- 2 -0.890 -0.0742
## 4 wealth_catricher -0.624 0.220 -2.84 4.47e- 3 -1.06 -0.201
## 5 wealth_catrichest -1.04 0.246 -4.23 2.31e- 5 -1.54 -0.572
tidy(model_wasting_wealth_age, conf.int = TRUE)
## # A tibble: 6 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -1.62 0.155 -10.4 1.51e-25 -1.93 -1.32
## 2 wealth_catpoorer 0.0674 0.173 0.389 6.97e- 1 -0.271 0.409
## 3 wealth_catmiddle -0.486 0.208 -2.34 1.95e- 2 -0.901 -0.0828
## 4 wealth_catricher -0.614 0.220 -2.79 5.26e- 3 -1.06 -0.190
## 5 wealth_catrichest -1.05 0.246 -4.26 2.04e- 5 -1.55 -0.580
## 6 hw1 -0.0156 0.00437 -3.57 3.57e- 4 -0.0243 -0.00716
## structural only:
# List of structural factors
struct_vars <- c("toilet_facility", "water_source", "postnatal_care",
"cooking_fuel", "distance_ti_health", "wait_to_go_alone",
"getting_permission_to_go", "money_needed")
# Function to create formulas
create_formula <- function(outcome, include_age = FALSE) {
if(include_age) {
as.formula(paste(outcome, "~", paste(struct_vars, collapse = " + "), "+ hw1"))
} else {
as.formula(paste(outcome, "~", paste(struct_vars, collapse = " + ")))
}
}
# Outcomes
outcomes <- c("stunted", "underweight_bin", "wasting_bin")
# Run models
models <- list()
for(outcome in outcomes){
models[[paste0(outcome, "_struct")]] <- glm(create_formula(outcome, FALSE),
family = binomial, data = full_clean)
models[[paste0(outcome, "_struct_age")]] <- glm(create_formula(outcome, TRUE),
family = binomial, data = full_clean)
}
# Check pseudo R² to see model improvement
library(pscl)
lapply(models, pR2)
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## $stunted_struct
## llh llhNull G2 McFadden r2ML
## -1.699649e+03 -1.759952e+03 1.206064e+02 3.426413e-02 4.354415e-02
## r2CU
## 5.987200e-02
##
## $stunted_struct_age
## llh llhNull G2 McFadden r2ML
## -1.637142e+03 -1.759952e+03 2.456200e+02 6.978032e-02 8.667926e-02
## r2CU
## 1.191816e-01
##
## $underweight_bin_struct
## llh llhNull G2 McFadden r2ML
## -1.346337e+03 -1.391129e+03 8.958356e+01 3.219816e-02 3.252807e-02
## r2CU
## 5.067187e-02
##
## $underweight_bin_struct_age
## llh llhNull G2 McFadden r2ML
## -1.334007e+03 -1.391129e+03 1.142427e+02 4.106115e-02 4.129468e-02
## r2CU
## 6.432840e-02
##
## $wasting_bin_struct
## llh llhNull G2 McFadden r2ML
## -831.10604103 -845.18615278 28.16022350 0.01665918 0.01034122
## r2CU
## 0.02227778
##
## $wasting_bin_struct_age
## llh llhNull G2 McFadden r2ML
## -825.27958923 -845.18615278 39.81312710 0.02355287 0.01458915
## r2CU
## 0.03142896
# Compare AIC/BIC
AIC(models$stunted_struct, models$stunted_struct_age)
## df AIC
## models$stunted_struct 9 3417.297
## models$stunted_struct_age 10 3294.284
BIC(models$stunted_struct, models$stunted_struct_age)
## df BIC
## models$stunted_struct 9 3470.436
## models$stunted_struct_age 10 3353.327
AIC(models$underweight_bin_struct, models$underweight_bin_struct_age)
## df AIC
## models$underweight_bin_struct 9 2710.674
## models$underweight_bin_struct_age 10 2688.014
BIC(models$underweight_bin_struct, models$underweight_bin_struct_age)
## df BIC
## models$underweight_bin_struct 9 2763.813
## models$underweight_bin_struct_age 10 2747.058
AIC(models$wasting_bin_struct, models$wasting_bin_struct_age)
## df AIC
## models$wasting_bin_struct 9 1680.212
## models$wasting_bin_struct_age 10 1670.559
BIC(models$wasting_bin_struct, models$wasting_bin_struct_age)
## df BIC
## models$wasting_bin_struct 9 1733.351
## models$wasting_bin_struct_age 10 1729.603
# Check coefficients and 95% CI
library(broom)
lapply(models, tidy, conf.int = TRUE)
## $stunted_struct
## # A tibble: 9 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -1.47 0.138 -10.7 1.67e-26 -1.75 -1.20
## 2 toilet_facility 0.0299 0.00514 5.82 5.90e- 9 0.0200 0.0402
## 3 water_source -0.00479 0.00295 -1.62 1.04e- 1 -0.0106 0.000930
## 4 postnatal_care -0.0193 0.0741 -0.260 7.95e- 1 -0.167 0.125
## 5 cooking_fuel 0.0519 0.0172 3.02 2.53e- 3 0.0182 0.0856
## 6 distance_ti_health -0.0949 0.114 -0.836 4.03e- 1 -0.318 0.128
## 7 wait_to_go_alone 0.129 0.109 1.18 2.38e- 1 -0.0852 0.343
## 8 getting_permission_t… -0.117 0.123 -0.945 3.45e- 1 -0.359 0.125
## 9 money_needed 0.491 0.118 4.17 3.10e- 5 0.260 0.722
##
## $stunted_struct_age
## # A tibble: 10 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -2.21 0.159 -13.8 1.40e-43 -2.52 -1.90
## 2 toilet_facility 0.0291 0.00529 5.51 3.68e- 8 0.0190 0.0397
## 3 water_source -0.00527 0.00302 -1.75 8.08e- 2 -0.0112 0.000585
## 4 postnatal_care -0.0169 0.0754 -0.224 8.23e- 1 -0.166 0.131
## 5 cooking_fuel 0.0543 0.0176 3.08 2.07e- 3 0.0197 0.0888
## 6 distance_ti_health -0.110 0.116 -0.945 3.45e- 1 -0.337 0.118
## 7 wait_to_go_alone 0.210 0.112 1.88 6.05e- 2 -0.00923 0.430
## 8 getting_permission_… -0.133 0.126 -1.05 2.92e- 1 -0.380 0.114
## 9 money_needed 0.527 0.120 4.37 1.23e- 5 0.291 0.763
## 10 hw1 0.0289 0.00263 11.0 4.98e-28 0.0237 0.0340
##
## $underweight_bin_struct
## # A tibble: 9 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -2.39 0.168 -14.2 1.40e-45 -2.72 -2.06
## 2 toilet_facility 0.0149 0.00497 3.00 2.69e- 3 0.00508 0.0247
## 3 water_source -0.00210 0.00349 -0.601 5.48e- 1 -0.00908 0.00463
## 4 postnatal_care 0.0456 0.0809 0.564 5.73e- 1 -0.120 0.201
## 5 cooking_fuel 0.107 0.0203 5.27 1.37e- 7 0.0673 0.147
## 6 distance_ti_health -0.145 0.134 -1.09 2.78e- 1 -0.407 0.117
## 7 wait_to_go_alone 0.139 0.131 1.06 2.87e- 1 -0.117 0.396
## 8 getting_permission_t… 0.279 0.140 2.00 4.59e- 2 0.00566 0.554
## 9 money_needed 0.225 0.137 1.65 9.98e- 2 -0.0443 0.492
##
## $underweight_bin_struct_age
## # A tibble: 10 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -2.75 0.186 -14.8 2.89e-49 -3.12 -2.39
## 2 toilet_facility 0.0140 0.00499 2.80 5.15e- 3 0.00408 0.0237
## 3 water_source -0.00220 0.00351 -0.627 5.30e- 1 -0.00921 0.00456
## 4 postnatal_care 0.0451 0.0804 0.561 5.75e- 1 -0.118 0.200
## 5 cooking_fuel 0.108 0.0204 5.29 1.22e- 7 0.0680 0.148
## 6 distance_ti_health -0.150 0.134 -1.12 2.63e- 1 -0.412 0.113
## 7 wait_to_go_alone 0.177 0.132 1.35 1.78e- 1 -0.0804 0.436
## 8 getting_permission_… 0.276 0.140 1.97 4.90e- 2 0.00177 0.552
## 9 money_needed 0.233 0.137 1.70 8.92e- 2 -0.0370 0.501
## 10 hw1 0.0146 0.00294 4.98 6.40e- 7 0.00888 0.0204
##
## $wasting_bin_struct
## # A tibble: 9 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -2.96 0.235 -12.6 2.25e-36 -3.43 -2.51
## 2 toilet_facility -0.00917 0.00830 -1.11 2.69e- 1 -0.0264 0.00616
## 3 water_source 0.00583 0.00445 1.31 1.90e- 1 -0.00313 0.0143
## 4 postnatal_care 0.0627 0.112 0.557 5.77e- 1 -0.181 0.264
## 5 cooking_fuel 0.0452 0.0279 1.62 1.05e- 1 -0.00924 0.100
## 6 distance_ti_health 0.180 0.185 0.973 3.30e- 1 -0.180 0.546
## 7 wait_to_go_alone 0.235 0.188 1.25 2.12e- 1 -0.132 0.606
## 8 getting_permission_t… 0.203 0.190 1.07 2.84e- 1 -0.167 0.577
## 9 money_needed 0.197 0.189 1.04 2.98e- 1 -0.177 0.564
##
## $wasting_bin_struct_age
## # A tibble: 10 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -2.65 0.251 -10.6 4.77e-26 -3.15 -2.17
## 2 toilet_facility -0.00815 0.00835 -0.977 3.29e- 1 -0.0254 0.00727
## 3 water_source 0.00597 0.00446 1.34 1.80e- 1 -0.00300 0.0145
## 4 postnatal_care 0.0626 0.116 0.541 5.89e- 1 -0.187 0.270
## 5 cooking_fuel 0.0455 0.0280 1.63 1.04e- 1 -0.00912 0.101
## 6 distance_ti_health 0.182 0.186 0.975 3.30e- 1 -0.181 0.550
## 7 wait_to_go_alone 0.201 0.189 1.06 2.87e- 1 -0.167 0.573
## 8 getting_permission_… 0.210 0.190 1.10 2.70e- 1 -0.161 0.586
## 9 money_needed 0.193 0.190 1.01 3.11e- 1 -0.183 0.562
## 10 hw1 -0.0146 0.00438 -3.34 8.46e- 4 -0.0233 -0.00615
# 1. Function to create tidy table with odds ratios
tidy_or <- function(model) {
tidy(model, conf.int = TRUE) %>%
dplyr::mutate(
odds_ratio = exp(estimate),
conf.low.or = exp(conf.low),
conf.high.or = exp(conf.high)
)
}
# 2. Apply to your models
tidy_or(model_stunted_wealth)
## # A tibble: 5 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… 0.170 0.0852 1.99 4.63e- 2 0.00308 0.337 1.19
## 2 wealth_ca… -0.670 0.117 -5.73 9.78e- 9 -0.900 -0.442 0.512
## 3 wealth_ca… -0.860 0.127 -6.77 1.30e-11 -1.11 -0.612 0.423
## 4 wealth_ca… -1.14 0.133 -8.58 9.16e-18 -1.40 -0.883 0.319
## 5 wealth_ca… -1.50 0.139 -10.8 3.64e-27 -1.78 -1.23 0.223
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
tidy_or(model_stunted_wealth_age)
## # A tibble: 6 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -0.482 0.104 -4.61 3.99e- 6 -0.687 -0.277 0.618
## 2 wealth_ca… -0.724 0.120 -6.02 1.70e- 9 -0.961 -0.489 0.485
## 3 wealth_ca… -0.888 0.130 -6.81 9.75e-12 -1.15 -0.634 0.411
## 4 wealth_ca… -1.23 0.137 -8.99 2.37e-19 -1.50 -0.966 0.291
## 5 wealth_ca… -1.57 0.143 -11.0 4.74e-28 -1.85 -1.29 0.209
## 6 hw1 0.0294 0.00264 11.1 1.21e-28 0.0242 0.0346 1.03
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
tidy_or(model_underweight_wealth)
## # A tibble: 5 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -0.558 0.0882 -6.33 2.50e-10 -0.733 -0.387 0.572
## 2 wealth_ca… -0.550 0.126 -4.37 1.22e- 5 -0.798 -0.304 0.577
## 3 wealth_ca… -1.04 0.148 -7.04 1.98e-12 -1.34 -0.755 0.353
## 4 wealth_ca… -1.11 0.153 -7.29 3.17e-13 -1.42 -0.819 0.328
## 5 wealth_ca… -1.89 0.187 -10.1 5.42e-24 -2.27 -1.53 0.151
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
tidy_or(model_underweight_wealth_age)
## # A tibble: 6 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -0.900 0.113 -7.95 1.80e-15 -1.12 -0.680 0.407
## 2 wealth_ca… -0.566 0.127 -4.47 7.91e- 6 -0.815 -0.318 0.568
## 3 wealth_ca… -1.04 0.149 -7.02 2.28e-12 -1.34 -0.756 0.352
## 4 wealth_ca… -1.14 0.154 -7.41 1.22e-13 -1.45 -0.844 0.319
## 5 wealth_ca… -1.90 0.188 -10.1 4.19e-24 -2.28 -1.54 0.150
## 6 hw1 0.0148 0.00297 4.97 6.62e- 7 0.00894 0.0206 1.01
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
tidy_or(model_wasting_wealth)
## # A tibble: 5 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -1.95 0.129 -15.2 5.26e-52 -2.21 -1.71 0.142
## 2 wealth_ca… 0.0584 0.173 0.339 7.35e- 1 -0.279 0.399 1.06
## 3 wealth_ca… -0.476 0.208 -2.29 2.17e- 2 -0.890 -0.0742 0.621
## 4 wealth_ca… -0.624 0.220 -2.84 4.47e- 3 -1.06 -0.201 0.536
## 5 wealth_ca… -1.04 0.246 -4.23 2.31e- 5 -1.54 -0.572 0.354
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
tidy_or(model_wasting_wealth_age)
## # A tibble: 6 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -1.62 0.155 -10.4 1.51e-25 -1.93 -1.32 0.198
## 2 wealth_ca… 0.0674 0.173 0.389 6.97e- 1 -0.271 0.409 1.07
## 3 wealth_ca… -0.486 0.208 -2.34 1.95e- 2 -0.901 -0.0828 0.615
## 4 wealth_ca… -0.614 0.220 -2.79 5.26e- 3 -1.06 -0.190 0.541
## 5 wealth_ca… -1.05 0.246 -4.26 2.04e- 5 -1.55 -0.580 0.351
## 6 hw1 -0.0156 0.00437 -3.57 3.57e- 4 -0.0243 -0.00716 0.985
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
# 3. For structural models
lapply(models, tidy_or)
## $stunted_struct
## # A tibble: 9 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -1.47 0.138 -10.7 1.67e-26 -1.75 -1.20 0.229
## 2 toilet_fa… 0.0299 0.00514 5.82 5.90e- 9 0.0200 0.0402 1.03
## 3 water_sou… -0.00479 0.00295 -1.62 1.04e- 1 -0.0106 0.000930 0.995
## 4 postnatal… -0.0193 0.0741 -0.260 7.95e- 1 -0.167 0.125 0.981
## 5 cooking_f… 0.0519 0.0172 3.02 2.53e- 3 0.0182 0.0856 1.05
## 6 distance_… -0.0949 0.114 -0.836 4.03e- 1 -0.318 0.128 0.909
## 7 wait_to_g… 0.129 0.109 1.18 2.38e- 1 -0.0852 0.343 1.14
## 8 getting_p… -0.117 0.123 -0.945 3.45e- 1 -0.359 0.125 0.890
## 9 money_nee… 0.491 0.118 4.17 3.10e- 5 0.260 0.722 1.63
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $stunted_struct_age
## # A tibble: 10 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Interce… -2.21 0.159 -13.8 1.40e-43 -2.52 -1.90 0.110
## 2 toilet_f… 0.0291 0.00529 5.51 3.68e- 8 0.0190 0.0397 1.03
## 3 water_so… -0.00527 0.00302 -1.75 8.08e- 2 -0.0112 0.000585 0.995
## 4 postnata… -0.0169 0.0754 -0.224 8.23e- 1 -0.166 0.131 0.983
## 5 cooking_… 0.0543 0.0176 3.08 2.07e- 3 0.0197 0.0888 1.06
## 6 distance… -0.110 0.116 -0.945 3.45e- 1 -0.337 0.118 0.896
## 7 wait_to_… 0.210 0.112 1.88 6.05e- 2 -0.00923 0.430 1.23
## 8 getting_… -0.133 0.126 -1.05 2.92e- 1 -0.380 0.114 0.876
## 9 money_ne… 0.527 0.120 4.37 1.23e- 5 0.291 0.763 1.69
## 10 hw1 0.0289 0.00263 11.0 4.98e-28 0.0237 0.0340 1.03
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $underweight_bin_struct
## # A tibble: 9 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -2.39 0.168 -14.2 1.40e-45 -2.72 -2.06 0.0920
## 2 toilet_fa… 0.0149 0.00497 3.00 2.69e- 3 0.00508 0.0247 1.02
## 3 water_sou… -0.00210 0.00349 -0.601 5.48e- 1 -0.00908 0.00463 0.998
## 4 postnatal… 0.0456 0.0809 0.564 5.73e- 1 -0.120 0.201 1.05
## 5 cooking_f… 0.107 0.0203 5.27 1.37e- 7 0.0673 0.147 1.11
## 6 distance_… -0.145 0.134 -1.09 2.78e- 1 -0.407 0.117 0.865
## 7 wait_to_g… 0.139 0.131 1.06 2.87e- 1 -0.117 0.396 1.15
## 8 getting_p… 0.279 0.140 2.00 4.59e- 2 0.00566 0.554 1.32
## 9 money_nee… 0.225 0.137 1.65 9.98e- 2 -0.0443 0.492 1.25
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $underweight_bin_struct_age
## # A tibble: 10 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Interce… -2.75 0.186 -14.8 2.89e-49 -3.12 -2.39 0.0640
## 2 toilet_f… 0.0140 0.00499 2.80 5.15e- 3 0.00408 0.0237 1.01
## 3 water_so… -0.00220 0.00351 -0.627 5.30e- 1 -0.00921 0.00456 0.998
## 4 postnata… 0.0451 0.0804 0.561 5.75e- 1 -0.118 0.200 1.05
## 5 cooking_… 0.108 0.0204 5.29 1.22e- 7 0.0680 0.148 1.11
## 6 distance… -0.150 0.134 -1.12 2.63e- 1 -0.412 0.113 0.861
## 7 wait_to_… 0.177 0.132 1.35 1.78e- 1 -0.0804 0.436 1.19
## 8 getting_… 0.276 0.140 1.97 4.90e- 2 0.00177 0.552 1.32
## 9 money_ne… 0.233 0.137 1.70 8.92e- 2 -0.0370 0.501 1.26
## 10 hw1 0.0146 0.00294 4.98 6.40e- 7 0.00888 0.0204 1.01
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $wasting_bin_struct
## # A tibble: 9 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -2.96 0.235 -12.6 2.25e-36 -3.43 -2.51 0.0517
## 2 toilet_fa… -0.00917 0.00830 -1.11 2.69e- 1 -0.0264 0.00616 0.991
## 3 water_sou… 0.00583 0.00445 1.31 1.90e- 1 -0.00313 0.0143 1.01
## 4 postnatal… 0.0627 0.112 0.557 5.77e- 1 -0.181 0.264 1.06
## 5 cooking_f… 0.0452 0.0279 1.62 1.05e- 1 -0.00924 0.100 1.05
## 6 distance_… 0.180 0.185 0.973 3.30e- 1 -0.180 0.546 1.20
## 7 wait_to_g… 0.235 0.188 1.25 2.12e- 1 -0.132 0.606 1.26
## 8 getting_p… 0.203 0.190 1.07 2.84e- 1 -0.167 0.577 1.23
## 9 money_nee… 0.197 0.189 1.04 2.98e- 1 -0.177 0.564 1.22
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $wasting_bin_struct_age
## # A tibble: 10 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Interce… -2.65 0.251 -10.6 4.77e-26 -3.15 -2.17 0.0706
## 2 toilet_f… -0.00815 0.00835 -0.977 3.29e- 1 -0.0254 0.00727 0.992
## 3 water_so… 0.00597 0.00446 1.34 1.80e- 1 -0.00300 0.0145 1.01
## 4 postnata… 0.0626 0.116 0.541 5.89e- 1 -0.187 0.270 1.06
## 5 cooking_… 0.0455 0.0280 1.63 1.04e- 1 -0.00912 0.101 1.05
## 6 distance… 0.182 0.186 0.975 3.30e- 1 -0.181 0.550 1.20
## 7 wait_to_… 0.201 0.189 1.06 2.87e- 1 -0.167 0.573 1.22
## 8 getting_… 0.210 0.190 1.10 2.70e- 1 -0.161 0.586 1.23
## 9 money_ne… 0.193 0.190 1.01 3.11e- 1 -0.183 0.562 1.21
## 10 hw1 -0.0146 0.00438 -3.34 8.46e- 4 -0.0233 -0.00615 0.985
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
Above code presents the odds ratios (ORs) from the household models examining the relationship between wealth status and child malnutrition. For stunting, children from wealthier households were significantly less likely to be stunted than those from the poorest households. Compared to the poorest, the odds of stunting decreased progressively across the wealth categories—poorer (OR = 0.51, 95% CI: 0.41–0.64), middle (0.42, 0.33–0.54), richer (0.32, 0.25–0.41), and richest (0.22, 0.17–0.29). After adjusting for child age, the protective effect of wealth persisted, and each additional month of age increased the odds of stunting slightly (OR = 1.03, 95% CI: 1.02–1.04). For underweight, the pattern was similar, with children from the richest households showing an 85% lower likelihood of being underweight compared to the poorest (OR = 0.15, 95% CI: 0.10–0.21). The effect of wealth remained significant after controlling for age, which also showed a modest positive association with underweight (OR = 1.01, 95% CI: 1.01–1.02). For wasting, only the middle, richer, and richest categories had significantly lower odds than the poorest (ORs = 0.62, 0.54, and 0.35, respectively). After adding child age, wasting risk declined with age (OR = 0.98, 95% CI: 0.98–0.99). Overall, wealthier households were strongly protected against chronic malnutrition (stunting and underweight), while effects on acute malnutrition (wasting) were weaker but still evident among higher wealth groups.
Above code summarizes odds ratios for environmental and healthcare-access predictors. For stunting, children living in households with improved toilet facilities (OR = 1.03, 95% CI: 1.02–1.04) and cleaner cooking fuel (OR = 1.05, 95% CI: 1.02–1.09) had slightly higher odds of stunting, possibly reflecting confounding with socioeconomic factors ( got correct in combine model below because wealth quantile is a confounding factor here but as we are russing model independtly that why this results occure). Financial difficulty in obtaining medical care significantly increased stunting risk (OR = 1.63, 95% CI: 1.30–2.06), and this effect persisted after age adjustment (OR = 1.69, 95% CI: 1.34–2.15). Child age was again positively associated with stunting (OR = 1.03, 95% CI: 1.02–1.03). For underweight, access to improved sanitation (OR = 1.02, 95% CI: 1.01–1.02) and use of clean fuels (OR = 1.11, 95% CI: 1.07–1.16) were linked to lower odds of underweight, whereas needing permission to seek care slightly increased risk (OR = 1.32, 95% CI: 1.01–1.74). After adjusting for age, these associations remained stable. For wasting, structural predictors were not statistically significant, and all odds ratios were close to unity, suggesting that acute malnutrition was less influenced by environmental or healthcare-access factors in this sample.
# List of structural factors
struct_vars <- c("toilet_facility", "water_source", "postnatal_care",
"cooking_fuel", "distance_ti_health", "wait_to_go_alone",
"getting_permission_to_go", "money_needed")
# List of household factors
household_vars <- c("wealth_cat")
# Function to create formula for combined model
create_combined_formula <- function(outcome, include_age = FALSE) {
all_vars <- c(household_vars, struct_vars)
if(include_age) {
all_vars <- c(all_vars, "hw1")
}
as.formula(paste(outcome, "~", paste(all_vars, collapse = " + ")))
}
# Outcomes
outcomes <- c("stunted", "underweight_bin", "wasting_bin")
# Run combined models
combined_models <- list()
for(outcome in outcomes){
combined_models[[paste0(outcome, "_combined")]] <- glm(
create_combined_formula(outcome, FALSE),
family = binomial, data = full_clean
)
combined_models[[paste0(outcome, "_combined_age")]] <- glm(
create_combined_formula(outcome, TRUE),
family = binomial, data = full_clean
)
}
# Check summary
lapply(combined_models, summary)
## $stunted_combined
##
## Call:
## glm(formula = create_combined_formula(outcome, FALSE), family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.027331 0.252333 0.108 0.913747
## wealth_catpoorer -0.576659 0.126903 -4.544 5.52e-06 ***
## wealth_catmiddle -0.753248 0.152258 -4.947 7.53e-07 ***
## wealth_catricher -1.029594 0.175489 -5.867 4.44e-09 ***
## wealth_catrichest -1.358477 0.193856 -7.008 2.42e-12 ***
## toilet_facility 0.013902 0.005359 2.594 0.009488 **
## water_source -0.005025 0.003047 -1.649 0.099158 .
## postnatal_care -0.008726 0.073687 -0.118 0.905739
## cooking_fuel -0.029068 0.021491 -1.353 0.176193
## distance_ti_health -0.172321 0.115916 -1.487 0.137120
## wait_to_go_alone 0.129734 0.110462 1.174 0.240207
## getting_permission_to_go -0.119481 0.124299 -0.961 0.336432
## money_needed 0.400615 0.119625 3.349 0.000811 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3519.9 on 2708 degrees of freedom
## Residual deviance: 3346.4 on 2696 degrees of freedom
## AIC: 3372.4
##
## Number of Fisher Scoring iterations: 4
##
##
## $stunted_combined_age
##
## Call:
## glm(formula = create_combined_formula(outcome, TRUE), family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.608540 0.264708 -2.299 0.021510 *
## wealth_catpoorer -0.649408 0.130652 -4.971 6.68e-07 ***
## wealth_catmiddle -0.808851 0.156404 -5.172 2.32e-07 ***
## wealth_catricher -1.145143 0.180528 -6.343 2.25e-10 ***
## wealth_catrichest -1.441170 0.198589 -7.257 3.96e-13 ***
## toilet_facility 0.011671 0.005482 2.129 0.033261 *
## water_source -0.005732 0.003125 -1.834 0.066663 .
## postnatal_care -0.007086 0.075848 -0.093 0.925570
## cooking_fuel -0.031134 0.022021 -1.414 0.157405
## distance_ti_health -0.191844 0.118582 -1.618 0.105702
## wait_to_go_alone 0.211544 0.113546 1.863 0.062452 .
## getting_permission_to_go -0.136124 0.127347 -1.069 0.285106
## money_needed 0.432594 0.122558 3.530 0.000416 ***
## hw1 0.029888 0.002671 11.190 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3519.9 on 2708 degrees of freedom
## Residual deviance: 3215.9 on 2695 degrees of freedom
## AIC: 3243.9
##
## Number of Fisher Scoring iterations: 4
##
##
## $underweight_bin_combined
##
## Call:
## glm(formula = create_combined_formula(outcome, FALSE), family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.450846 0.291582 -1.546 0.1221
## wealth_catpoorer -0.587293 0.137867 -4.260 2.05e-05 ***
## wealth_catmiddle -1.113333 0.176938 -6.292 3.13e-10 ***
## wealth_catricher -1.187482 0.202997 -5.850 4.92e-09 ***
## wealth_catrichest -1.950846 0.245037 -7.961 1.70e-15 ***
## toilet_facility -0.002845 0.005856 -0.486 0.6271
## water_source -0.002459 0.003678 -0.668 0.5038
## postnatal_care 0.064758 0.080971 0.800 0.4239
## cooking_fuel -0.011219 0.025569 -0.439 0.6608
## distance_ti_health -0.255728 0.136730 -1.870 0.0614 .
## wait_to_go_alone 0.141789 0.132915 1.067 0.2861
## getting_permission_to_go 0.282221 0.141480 1.995 0.0461 *
## money_needed 0.102038 0.139222 0.733 0.4636
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2782.3 on 2708 degrees of freedom
## Residual deviance: 2617.7 on 2696 degrees of freedom
## AIC: 2643.7
##
## Number of Fisher Scoring iterations: 5
##
##
## $underweight_bin_combined_age
##
## Call:
## glm(formula = create_combined_formula(outcome, TRUE), family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.777528 0.301281 -2.581 0.00986 **
## wealth_catpoorer -0.616734 0.139018 -4.436 9.15e-06 ***
## wealth_catmiddle -1.134373 0.178404 -6.358 2.04e-10 ***
## wealth_catricher -1.233722 0.204802 -6.024 1.70e-09 ***
## wealth_catrichest -1.977885 0.246345 -8.029 9.83e-16 ***
## toilet_facility -0.004429 0.005913 -0.749 0.45380
## water_source -0.002727 0.003699 -0.737 0.46089
## postnatal_care 0.063964 0.080979 0.790 0.42959
## cooking_fuel -0.012069 0.025721 -0.469 0.63891
## distance_ti_health -0.258915 0.137016 -1.890 0.05880 .
## wait_to_go_alone 0.175779 0.133686 1.315 0.18856
## getting_permission_to_go 0.282135 0.142101 1.985 0.04709 *
## money_needed 0.106493 0.139627 0.763 0.44565
## hw1 0.015253 0.002989 5.102 3.36e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2782.3 on 2708 degrees of freedom
## Residual deviance: 2591.8 on 2695 degrees of freedom
## AIC: 2619.8
##
## Number of Fisher Scoring iterations: 5
##
##
## $wasting_bin_combined
##
## Call:
## glm(formula = create_combined_formula(outcome, FALSE), family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.537318 0.408726 -3.761 0.000169 ***
## wealth_catpoorer -0.128137 0.192326 -0.666 0.505254
## wealth_catmiddle -0.774785 0.249877 -3.101 0.001931 **
## wealth_catricher -0.954514 0.287137 -3.324 0.000887 ***
## wealth_catrichest -1.418415 0.328054 -4.324 1.53e-05 ***
## toilet_facility -0.022335 0.009885 -2.260 0.023848 *
## water_source 0.006527 0.004611 1.416 0.156915
## postnatal_care 0.087126 0.110835 0.786 0.431816
## cooking_fuel -0.055517 0.034273 -1.620 0.105268
## distance_ti_health 0.066911 0.187441 0.357 0.721112
## wait_to_go_alone 0.240598 0.188891 1.274 0.202754
## getting_permission_to_go 0.206353 0.190928 1.081 0.279792
## money_needed 0.075113 0.190894 0.393 0.693964
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1690.4 on 2708 degrees of freedom
## Residual deviance: 1635.7 on 2696 degrees of freedom
## AIC: 1661.7
##
## Number of Fisher Scoring iterations: 5
##
##
## $wasting_bin_combined_age
##
## Call:
## glm(formula = create_combined_formula(outcome, TRUE), family = binomial,
## data = full_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.243023 0.416243 -2.986 0.002824 **
## wealth_catpoorer -0.113244 0.192427 -0.589 0.556195
## wealth_catmiddle -0.767300 0.249575 -3.074 0.002109 **
## wealth_catricher -0.933540 0.287107 -3.252 0.001148 **
## wealth_catrichest -1.415741 0.328435 -4.311 1.63e-05 ***
## toilet_facility -0.021014 0.009901 -2.122 0.033798 *
## water_source 0.006779 0.004623 1.466 0.142541
## postnatal_care 0.087624 0.114331 0.766 0.443435
## cooking_fuel -0.054607 0.034302 -1.592 0.111403
## distance_ti_health 0.063642 0.188856 0.337 0.736126
## wait_to_go_alone 0.210495 0.189692 1.110 0.267143
## getting_permission_to_go 0.210957 0.191899 1.099 0.271632
## money_needed 0.072131 0.192070 0.376 0.707253
## hw1 -0.014743 0.004405 -3.347 0.000817 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1690.4 on 2708 degrees of freedom
## Residual deviance: 1624.0 on 2695 degrees of freedom
## AIC: 1652
##
## Number of Fisher Scoring iterations: 5
# Odds ratios with 95% CI
tidy_or <- function(model) {
broom::tidy(model, conf.int = TRUE) %>%
dplyr::mutate(
odds_ratio = exp(estimate),
conf.low.or = exp(conf.low),
conf.high.or = exp(conf.high)
)
}
lapply(combined_models, tidy_or)
## $stunted_combined
## # A tibble: 13 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Interce… 0.0273 0.252 0.108 9.14e- 1 -0.468 0.522 1.03
## 2 wealth_c… -0.577 0.127 -4.54 5.52e- 6 -0.826 -0.328 0.562
## 3 wealth_c… -0.753 0.152 -4.95 7.53e- 7 -1.05 -0.456 0.471
## 4 wealth_c… -1.03 0.175 -5.87 4.44e- 9 -1.38 -0.687 0.357
## 5 wealth_c… -1.36 0.194 -7.01 2.42e-12 -1.74 -0.980 0.257
## 6 toilet_f… 0.0139 0.00536 2.59 9.49e- 3 0.00353 0.0246 1.01
## 7 water_so… -0.00503 0.00305 -1.65 9.92e- 2 -0.0111 0.000885 0.995
## 8 postnata… -0.00873 0.0737 -0.118 9.06e- 1 -0.155 0.135 0.991
## 9 cooking_… -0.0291 0.0215 -1.35 1.76e- 1 -0.0713 0.0130 0.971
## 10 distance… -0.172 0.116 -1.49 1.37e- 1 -0.400 0.0547 0.842
## 11 wait_to_… 0.130 0.110 1.17 2.40e- 1 -0.0868 0.346 1.14
## 12 getting_… -0.119 0.124 -0.961 3.36e- 1 -0.364 0.124 0.887
## 13 money_ne… 0.401 0.120 3.35 8.11e- 4 0.166 0.635 1.49
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $stunted_combined_age
## # A tibble: 14 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Interce… -0.609 0.265 -2.30 2.15e- 2 -1.13 -0.0907 0.544
## 2 wealth_c… -0.649 0.131 -4.97 6.68e- 7 -0.906 -0.394 0.522
## 3 wealth_c… -0.809 0.156 -5.17 2.32e- 7 -1.12 -0.503 0.445
## 4 wealth_c… -1.15 0.181 -6.34 2.25e-10 -1.50 -0.793 0.318
## 5 wealth_c… -1.44 0.199 -7.26 3.96e-13 -1.83 -1.05 0.237
## 6 toilet_f… 0.0117 0.00548 2.13 3.33e- 2 0.00106 0.0226 1.01
## 7 water_so… -0.00573 0.00313 -1.83 6.67e- 2 -0.0119 0.000330 0.994
## 8 postnata… -0.00709 0.0758 -0.0934 9.26e- 1 -0.156 0.143 0.993
## 9 cooking_… -0.0311 0.0220 -1.41 1.57e- 1 -0.0744 0.0120 0.969
## 10 distance… -0.192 0.119 -1.62 1.06e- 1 -0.425 0.0403 0.825
## 11 wait_to_… 0.212 0.114 1.86 6.25e- 2 -0.0109 0.434 1.24
## 12 getting_… -0.136 0.127 -1.07 2.85e- 1 -0.386 0.113 0.873
## 13 money_ne… 0.433 0.123 3.53 4.16e- 4 0.192 0.673 1.54
## 14 hw1 0.0299 0.00267 11.2 4.57e-29 0.0247 0.0352 1.03
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $underweight_bin_combined
## # A tibble: 13 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Interce… -0.451 0.292 -1.55 1.22e- 1 -1.02 0.121 0.637
## 2 wealth_c… -0.587 0.138 -4.26 2.05e- 5 -0.859 -0.318 0.556
## 3 wealth_c… -1.11 0.177 -6.29 3.13e-10 -1.46 -0.770 0.328
## 4 wealth_c… -1.19 0.203 -5.85 4.92e- 9 -1.59 -0.793 0.305
## 5 wealth_c… -1.95 0.245 -7.96 1.70e-15 -2.44 -1.48 0.142
## 6 toilet_f… -0.00284 0.00586 -0.486 6.27e- 1 -0.0147 0.00835 0.997
## 7 water_so… -0.00246 0.00368 -0.668 5.04e- 1 -0.00979 0.00464 0.998
## 8 postnata… 0.0648 0.0810 0.800 4.24e- 1 -0.101 0.221 1.07
## 9 cooking_… -0.0112 0.0256 -0.439 6.61e- 1 -0.0613 0.0389 0.989
## 10 distance… -0.256 0.137 -1.87 6.14e- 2 -0.524 0.0125 0.774
## 11 wait_to_… 0.142 0.133 1.07 2.86e- 1 -0.118 0.403 1.15
## 12 getting_… 0.282 0.141 1.99 4.61e- 2 0.00555 0.560 1.33
## 13 money_ne… 0.102 0.139 0.733 4.64e- 1 -0.172 0.374 1.11
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $underweight_bin_combined_age
## # A tibble: 14 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Interce… -0.778 0.301 -2.58 9.86e- 3 -1.37 -0.187 0.460
## 2 wealth_c… -0.617 0.139 -4.44 9.15e- 6 -0.890 -0.345 0.540
## 3 wealth_c… -1.13 0.178 -6.36 2.04e-10 -1.49 -0.788 0.322
## 4 wealth_c… -1.23 0.205 -6.02 1.70e- 9 -1.64 -0.836 0.291
## 5 wealth_c… -1.98 0.246 -8.03 9.83e-16 -2.47 -1.50 0.138
## 6 toilet_f… -0.00443 0.00591 -0.749 4.54e- 1 -0.0164 0.00687 0.996
## 7 water_so… -0.00273 0.00370 -0.737 4.61e- 1 -0.0101 0.00441 0.997
## 8 postnata… 0.0640 0.0810 0.790 4.30e- 1 -0.100 0.221 1.07
## 9 cooking_… -0.0121 0.0257 -0.469 6.39e- 1 -0.0625 0.0384 0.988
## 10 distance… -0.259 0.137 -1.89 5.88e- 2 -0.528 0.00981 0.772
## 11 wait_to_… 0.176 0.134 1.31 1.89e- 1 -0.0859 0.438 1.19
## 12 getting_… 0.282 0.142 1.99 4.71e- 2 0.00422 0.562 1.33
## 13 money_ne… 0.106 0.140 0.763 4.46e- 1 -0.169 0.379 1.11
## 14 hw1 0.0153 0.00299 5.10 3.36e- 7 0.00939 0.0211 1.02
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $wasting_bin_combined
## # A tibble: 13 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -1.54 0.409 -3.76 1.69e-4 -2.34 -0.738 0.215
## 2 wealth_ca… -0.128 0.192 -0.666 5.05e-1 -0.505 0.250 0.880
## 3 wealth_ca… -0.775 0.250 -3.10 1.93e-3 -1.27 -0.290 0.461
## 4 wealth_ca… -0.955 0.287 -3.32 8.87e-4 -1.52 -0.397 0.385
## 5 wealth_ca… -1.42 0.328 -4.32 1.53e-5 -2.07 -0.783 0.242
## 6 toilet_fa… -0.0223 0.00988 -2.26 2.38e-2 -0.0426 -0.00402 0.978
## 7 water_sou… 0.00653 0.00461 1.42 1.57e-1 -0.00274 0.0154 1.01
## 8 postnatal… 0.0871 0.111 0.786 4.32e-1 -0.153 0.287 1.09
## 9 cooking_f… -0.0555 0.0343 -1.62 1.05e-1 -0.123 0.0119 0.946
## 10 distance_… 0.0669 0.187 0.357 7.21e-1 -0.298 0.437 1.07
## 11 wait_to_g… 0.241 0.189 1.27 2.03e-1 -0.128 0.614 1.27
## 12 getting_p… 0.206 0.191 1.08 2.80e-1 -0.166 0.583 1.23
## 13 money_nee… 0.0751 0.191 0.393 6.94e-1 -0.302 0.446 1.08
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
##
## $wasting_bin_combined_age
## # A tibble: 14 × 10
## term estimate std.error statistic p.value conf.low conf.high odds_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercep… -1.24 0.416 -2.99 2.82e-3 -2.06 -0.429 0.289
## 2 wealth_ca… -0.113 0.192 -0.589 5.56e-1 -0.490 0.265 0.893
## 3 wealth_ca… -0.767 0.250 -3.07 2.11e-3 -1.26 -0.283 0.464
## 4 wealth_ca… -0.934 0.287 -3.25 1.15e-3 -1.50 -0.377 0.393
## 5 wealth_ca… -1.42 0.328 -4.31 1.63e-5 -2.07 -0.780 0.243
## 6 toilet_fa… -0.0210 0.00990 -2.12 3.38e-2 -0.0413 -0.00266 0.979
## 7 water_sou… 0.00678 0.00462 1.47 1.43e-1 -0.00251 0.0156 1.01
## 8 postnatal… 0.0876 0.114 0.766 4.43e-1 -0.159 0.293 1.09
## 9 cooking_f… -0.0546 0.0343 -1.59 1.11e-1 -0.122 0.0128 0.947
## 10 distance_… 0.0636 0.189 0.337 7.36e-1 -0.304 0.437 1.07
## 11 wait_to_g… 0.210 0.190 1.11 2.67e-1 -0.160 0.585 1.23
## 12 getting_p… 0.211 0.192 1.10 2.72e-1 -0.163 0.590 1.23
## 13 money_nee… 0.0721 0.192 0.376 7.07e-1 -0.308 0.446 1.07
## 14 hw1 -0.0147 0.00441 -3.35 8.17e-4 -0.0235 -0.00622 0.985
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
# Pseudo R² to see model fit improvement
library(pscl)
lapply(combined_models, pscl::pR2)
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## $stunted_combined
## llh llhNull G2 McFadden r2ML
## -1.673180e+03 -1.759952e+03 1.735440e+02 4.930362e-02 6.205317e-02
## r2CU
## 8.532138e-02
##
## $stunted_combined_age
## llh llhNull G2 McFadden r2ML
## -1.607927e+03 -1.759952e+03 3.040502e+02 8.638026e-02 1.061677e-01
## r2CU
## 1.459776e-01
##
## $underweight_bin_combined
## llh llhNull G2 McFadden r2ML
## -1.308864e+03 -1.391129e+03 1.645292e+02 5.913517e-02 5.892676e-02
## r2CU
## 9.179545e-02
##
## $underweight_bin_combined_age
## llh llhNull G2 McFadden r2ML
## -1.295894e+03 -1.391129e+03 1.904696e+02 6.845865e-02 6.789510e-02
## r2CU
## 1.057662e-01
##
## $wasting_bin_combined
## llh llhNull G2 McFadden r2ML
## -817.84826069 -845.18615278 54.67578418 0.03234541 0.01998070
## r2CU
## 0.04304382
##
## $wasting_bin_combined_age
## llh llhNull G2 McFadden r2ML
## -811.98966492 -845.18615278 66.39297571 0.03927713 0.02421041
## r2CU
## 0.05215575
# Compare AIC/BIC for models with and without age
AIC(combined_models$stunted_combined, combined_models$stunted_combined_age)
## df AIC
## combined_models$stunted_combined 13 3372.360
## combined_models$stunted_combined_age 14 3243.854
BIC(combined_models$stunted_combined, combined_models$stunted_combined_age)
## df BIC
## combined_models$stunted_combined 13 3449.116
## combined_models$stunted_combined_age 14 3326.514
Across the combined logistic regression models, wealth status consistently showed a strong inverse association with all three malnutrition outcomes—stunting, underweight, and wasting. Children from poorer, middle, richer, and richest households had significantly lower odds of being malnourished compared to those from the poorest households, with odds ratios decreasing progressively with higher wealth categories (e.g., OR = 0.56 to 0.26 for stunting). This pattern indicates that improved household wealth substantially reduces the likelihood of malnutrition. Additionally, limited access to resources, such as difficulties obtaining money for healthcare, increased the odds of stunting (OR = 1.49, 95% CI 1.18–1.89), highlighting economic barriers as a key risk factor. Toilet facilities also showed a small but significant protective effect for wasting (OR = 0.98, 95% CI 0.96–1.00), emphasizing the role of sanitation in child health. When child age (hw1) was added to the models, it became a strong predictor of nutritional outcomes. Each unit increase in age was associated with higher odds of stunting (OR = 1.03, 95% CI 1.02–1.04) and underweight (OR = 1.02, 95% CI 1.01–1.02) but lower odds of wasting (OR = 0.98, 95% CI 0.96–1.00), suggesting that while older children are more prone to chronic forms of malnutrition, acute malnutrition is more common among younger children. Overall, the models demonstrate that socioeconomic status, sanitation, and child age are key determinants of child nutritional outcomes in this population.
In simple terms, the results show that a family’s wealth has the strongest effect on a child’s nutrition — children from richer households are much less likely to be stunted, underweight, or wasted compared to those from poorer families. Financial problems, especially difficulty paying for medical care, also increase the risk of poor nutrition. Better sanitation and cleaner cooking fuels help protect children, but their effects are smaller and sometimes not statistically significant after accounting for wealth. Age also matters: as children grow older, long-term nutrition problems like stunting become more common, while short-term issues like wasting are more frequent among younger children. Overall, poverty and barriers to healthcare are the most important challenges for improving child nutrition.
The logistic regression results largely support the associations observed in the Cramér’s V analysis. Both approaches identified household wealth and toilet facility as the most consistent predictors of child malnutrition. In the Cramér’s V results, these variables showed moderate associations with stunting and underweight, and the regression models confirmed that children from wealthier households had substantially lower odds of being stunted, underweight, or wasted. Similarly, improved sanitation showed a small but significant protective effect, aligning with its moderate Cramér’s V values. Other environmental and access variables—such as cooking fuel, distance to healthcare, and permission to seek care—displayed weak associations in both methods, indicating limited influence after accounting for wealth. The weaker or nonsignificant effects for wasting in both analyses also match, suggesting that acute malnutrition is less dependent on socioeconomic or environmental conditions. Overall, the regression analysis reinforces the Cramér’s V findings, showing that socioeconomic status and sanitation are key but moderately strong determinants of child nutritional outcomes