1.Introduction:

This report shows the initial finding of the descriptive analysis of child malnutrition in Pakistan,focusing distribution of stunting, wasting and underweight across different wealth groups.

2.Setup and data loading:

This section prepares the environment by loading necessary R libraries and importing the data-set, ensuring variables are in the correct format for analysis.

2.1 Load Libraries:

library(haven)

## Warning: package 'haven' was built under R version 4.5.1

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.5.1

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(psych)

## Warning: package 'psych' was built under R version 4.5.1

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

library(tidyr)
library(naniar)

## Warning: package 'naniar' was built under R version 4.5.1

2.2 Load Datasets:

In this section Kids data, Household data and Women data is loading in R.

kids <- read_dta("C:/Users/ameen/OneDrive - Higher Education Commission/Documents/Talha csc scholarship china/open door/itmo docs/introduction to R and more/1 Term/reaserch in ternship/Pakistan reaserch/PKKR71DT kids/PKKR71FL.DTA")
household <- read_dta("C:/Users/ameen/OneDrive - Higher Education Commission/Documents/Talha csc scholarship china/open door/itmo docs/introduction to R and more/1 Term/reaserch in ternship/Pakistan reaserch/PKHR71DT household/PKHR71FL.DTA")
women <- read_dta("C:/Users/ameen/OneDrive - Higher Education Commission/Documents/Talha csc scholarship china/open door/itmo docs/introduction to R and more/1 Term/reaserch in ternship/Pakistan reaserch/PKIR71DT women/PKIR71FL.DTA")

3 Select variables:

In this step variables that are useful for analysis are seperated from the raw data.

kids_selective <- kids %>%
  select(v001, v002, v003, b16, midx, 
         hw70, hw71, hw72,   # nutrition outcomes
         hw1, hw13,          # child age + reason for missing
         v190, h2:h9, m70)

household_selective <- household %>%
  select(hv001, hv002, hv201, hv202, hv204, hv205, hv226) %>%
  rename(v001 = hv001, v002 = hv002)

women_selective <- women %>%
  select(v001, v002, v003, v467a:v467f)

3.1 Linked check:

This step check while merging Kids and women data is data loss during merging?

# --- 2.1. Linked check ---
kids_selective$linked <- ifelse(
  paste(kids_selective$v001, kids_selective$v002, kids_selective$v003) %in%
    paste(women_selective$v001, women_selective$v002, women_selective$v003),
  1, 0
)
cat("\nLinked vs Unlinked children (pre-merge):\n")

## 
## Linked vs Unlinked children (pre-merge):

print(table(kids_selective$linked))

## 
##     1 
## 12708

Interpretation:

The results show that all 12708 children merged successfully with their respective mothers no data is lost during merging.

4 Merge datasets All:

In this step all selective data sets are merged.

kids_selective <- kids_selective %>% mutate(b16 = as.numeric(b16))

merged_datasets <- kids_selective %>%
  left_join(household_selective, by = c("v001","v002")) %>%
  left_join(women_selective, by = c("v001","v002","v003"))

4.1 Descriptive Statistics of Children’s Age:

Summarize the variable hw1 (child’s age in months) to understand its distribution and check for missing values.

summary(merged_datasets$hw1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   14.00   29.00   29.16   44.00   59.00    8209

Interpretation:

The variable hw1 (children’s age in months) ranges from 0 to 59 months. The median age is 29 months, close to the mean of 29.16, suggesting a roughly symmetric distribution. The first and third quartiles are 14 and 44 months, respectively. There are 8,209 missing values, indicating a substantial amount of incomplete data.

4.2 Histogram of Children’s Age:

hist(merged_datasets$hw1, main = "Age distribution of children", xlab = "Age in months")

Interpretation:

This histogram shows the number of children at different ages (in months). Overall, the sample is well-distributed across the full 0–60 month range, with slightly more infants (0–5 months) and slightly fewer children in the oldest age group (55–60 months). This indicates a large and fairly even sample, which is suitable for studying age-related changes in malnutrition.

4.3 Check for kids without a household match:

This counts how many children don’t have household information (hv201), showing if the merge missed any matches.

sum(is.na(merged_datasets$hv201))

## [1] 0

Interpretation:

The result [1] 0 means that all children in the data-set have matching household information. There are no missing values in hv201, so the merge worked correctly.

5 Data Cleaning and Variable Preparation:

In this step, we cleaned and prepared the merged dat-aset. Variables were renamed to meaningful names for easier interpretation. Children were grouped into age categories (0–11 months, 12–23 months, etc.), and anthropometry scores (stunting, underweight, wasting) were converted back to z-scores and cleaned for implausible values. Wealth index was recoded into 5 ordered categories from poorest to richest for analysis.

merged_datasets <- merged_datasets %>%
  rename(
    cluster_number   = v001,
    household_number = v002,
    child_index      = b16,
    mother_index     = v003,
    stunting         = hw70,
    underweight      = hw71,
    wasting          = hw72,
    wealth_index     = v190,
    water_source     = hv201,
    time_to_water    = hv202,   
    electricity      = hv204,
    toilet_facility  = hv205,
    cooking_fuel     = hv226,
    postnatal_care   = m70,
    know_where_to_go = v467a,
    getting_permission_to_go = v467b,
    money_needed = v467c,
    distance_ti_health = v467d,
    transport  = v467e,
    wait_to_go_alone  = v467f
  ) %>%
  mutate(
    # 5.1 Child age groups
    age_group = cut(hw1,
                    breaks = seq(0, 60, by = 12),
                    labels = c("0–11m", "12–23m", "24–35m", "36–47m", "48–59m"),
                    right = FALSE),
    #5.2  Convert anthropometry scores back to z-scores
    stunting    = stunting / 100,
    underweight = underweight / 100,
    wasting     = wasting / 100,
    #5.3  Clean implausible values
    stunting    = ifelse(stunting < -6 | stunting > 6, NA, stunting),
    underweight = ifelse(underweight < -6 | underweight > 6, NA, underweight),
    wasting     = ifelse(wasting < -5 | wasting > 5, NA, wasting),
    # Wealth categories (5 groups)
    wealth_cat = case_when(
      wealth_index == 1 ~ "poorest",
      wealth_index == 2 ~ "poorer",
      wealth_index == 3 ~ "middle",
      wealth_index == 4 ~ "richer",
      wealth_index == 5 ~ "richest",
      TRUE ~ NA_character_
    ),
    wealth_cat = factor(wealth_cat, levels = c("poorest","poorer","middle","richer","richest"))
  )

6 Nutrition outcomes by age group:

In this section, we examine how child malnutrition indicators vary across age groups (0–11 months, 12–23 months, etc.). Three key nutrition outcomes are considered: stunting (HAZ), underweight (WAZ), and wasting (WHZ).

6.1 Boxplots by age group.

ggplot(merged_datasets, aes(x = age_group, y = stunting, fill = age_group)) +
  geom_boxplot() + theme_minimal() +
  labs(title = "Stunting (HAZ) distribution by child age group", x="Age group (months)", y="HAZ")

## Warning: Removed 8567 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

ggplot(merged_datasets, aes(x = age_group, y = underweight, fill = age_group)) +
  geom_boxplot() + theme_minimal() +
  labs(title = "Underweight (WAZ) distribution by child age group", x="Age group (months)", y="WAZ")

## Warning: Removed 8443 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

ggplot(merged_datasets, aes(x = age_group, y = wasting, fill = age_group)) +
  geom_boxplot() + theme_minimal() +
  labs(title = "Wasting (WHZ) distribution by child age group", x="Age group (months)", y="WHZ")

## Warning: Removed 8557 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Interpretation:

Stunting (HAZ) – Chronic Malnutrition: Stunting increases with age. Infants (0–11 months) are close to the healthy range, but from 12–35 months, median HAZ drops below -1.5. For older children (36–59 months), stunting remains low, showing that once it occurs, it persists.

Underweight (WAZ) – Combined Malnutrition: Underweight also worsens with age. Median WAZ is below 0 for all groups, and the lower quartiles are near -2, indicating that many children are underweight across all ages.

Wasting (WHZ) – Acute Malnutrition: Wasting is more stable, with median WHZ near 0. However, some children in all age groups fall below -2, showing ongoing risk of acute malnutrition, often due to short-term illness or food shortage.

6.2 line chart by age group:

make_line_chart <- function(data, var, var_label){
  line_data <- data %>%
    group_by(age_group) %>%
    summarise(mean_val = mean(.data[[var]], na.rm = TRUE),
              sd_val   = sd(.data[[var]], na.rm = TRUE),
              n        = n()) %>%
    mutate(se_val = sd_val / sqrt(n))
  
  ggplot(line_data, aes(x = age_group, y = mean_val, group = 1)) +
    geom_line(color = "blue", size = 1) +
    geom_point(size = 3, color = "red") +
    geom_errorbar(aes(ymin = mean_val - se_val, ymax = mean_val + se_val),
                  width = 0.1, color = "black") +
    theme_minimal() +
    labs(title = paste("Average", var_label, "Z-score by Child Age Group"),
         x = "Age group (months)",
         y = paste("Mean", var_label, "Z-score")) +
    geom_hline(yintercept = -2, linetype = "dashed", color = "darkred") +
    annotate("text", x = 2.5, y = -2.1,
             label = paste(var_label, "cutoff (Z = -2)"),
             color = "darkred", hjust = 0)
}

# Run charts
p_stunting_line    <- make_line_chart(merged_datasets, "stunting", "Stunting (HAZ)")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

p_underweight_line <- make_line_chart(merged_datasets, "underweight", "Underweight (WAZ)")
p_wasting_line     <- make_line_chart(merged_datasets, "wasting", "Wasting (WHZ)")

print(p_stunting_line)

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

print(p_underweight_line)

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

print(p_wasting_line)

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Interpretation:

Stunting (HAZ): Stunting gets worse with age. The mean HAZ drops sharply from infancy and crosses the −2 cutoff around 36–47 months, meaning older children are on average chronically malnourished.

Underweight (WAZ): Underweight also declines with age but less severely. The mean WAZ stays below the healthy average (0) in all groups but does not cross the severe −2 cutoff.

Wasting (WHZ): Wasting remains fairly stable across age groups, with the mean WHZ close to 0 and well above the −2 cutoff. This shows that acute malnutrition is less of a population-wide problem and occurs more as short-term cases.

7 Nutritional Outcomes by Wealth Category (Heatmaps):

We used heatmaps to examine how child nutrition outcomes differ across wealth categories.
The plots show the percentage distribution of stunting, underweight, and wasting across different levels of household wealth.
This helps visualize socioeconomic disparities in child malnutrition.

make_single_heatmap <- function(data, var, var_label){
  df <- data %>%
    filter(!is.na(!!rlang::sym(var))) %>%     # drop NA values
    mutate(value100 = round(!!rlang::sym(var) * 100))
  
  # DHS-style binning (-600 to 600)
  breaks <- seq(-600, 600, by = 100)
  labels <- paste0(breaks[-length(breaks)], "-", (breaks[-1]-1))
  
  df <- df %>%
    mutate(value_bin = cut(value100, breaks = breaks, labels = labels,
                           include.lowest = TRUE, right = TRUE))
  
  heat_data <- df %>%
    group_by(wealth_cat, value_bin) %>%
    summarise(n = n(), .groups = "drop") %>%
    complete(wealth_cat, value_bin, fill = list(n = 0)) %>%
    group_by(wealth_cat) %>%
    mutate(total_in_row = sum(n),
           pct_within_wealth = ifelse(total_in_row==0, NA, 100*n/total_in_row)) %>%
    ungroup()
  
  ggplot(heat_data, aes(x=value_bin, y=wealth_cat, fill=pct_within_wealth)) +
    geom_tile(color="white", size=0.35) +
    geom_text(aes(label = ifelse(is.na(pct_within_wealth), "", sprintf("%.1f%%", pct_within_wealth))),
              color="black", size=3.1) +
    geom_text(aes(label = ifelse(is.na(pct_within_wealth), "", sprintf("%.1f%%", pct_within_wealth))),
              color="white", size=2.6) +
    scale_fill_viridis_c(name="% within wealth", option="viridis", na.value="white") +
    labs(title=var_label, x=paste(var_label,"(binned)"), y="Wealth") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle=45, hjust=1),
          panel.grid = element_blank()) +
    scale_y_discrete(limits = rev(levels(heat_data$wealth_cat)))
}

# Run heatmaps
p_stunting <- make_single_heatmap(merged_datasets, "stunting", "Stunting")
p_underweight <- make_single_heatmap(merged_datasets, "underweight", "Underweight")
p_wasting <- make_single_heatmap(merged_datasets, "wasting", "Wasting")

print(p_stunting)

print(p_underweight)

print(p_wasting)

Interpretation:

Stunting (HAZ) – Clear Wealth Gradient
Stunting shows the strongest wealth gradient. Children from poorer households are much more likely to have severe stunting, while richer households cluster closer to the healthy range.
Underweight (WAZ) – Strong Wealth Gradient
Underweight also declines as wealth increases. Poorer children are more concentrated in the severe underweight bins, while richer children are closer to normal values.
Wasting (WHZ) – Weak Wealth Gradient
Wasting shows the weakest wealth gradient. Most children in all wealth groups fall near normal values, though the poorest still face slightly higher risks of severe wasting.

8 Missingness Analysis:

We examined missing data across all variables. The table below highlights the variables with the highest percentage of missing values to identify potential data quality issues.

missing_counts <- sapply(merged_datasets, function(x) sum(is.na(x)))
missing_percent <- round(sapply(merged_datasets, function(x) mean(is.na(x))*100), 2)

missing_table <- data.frame(
  Variable = names(missing_counts),
  Missing_Count = missing_counts,
  Missing_Percent = missing_percent
)
missing_table[order(-missing_table$Missing_Percent), ]

##                                          Variable Missing_Count Missing_Percent
## know_where_to_go                 know_where_to_go         12708          100.00
## transport                               transport         12708          100.00
## time_to_water                       time_to_water         12457           98.02
## h7d                                           h7d          9930           78.14
## h7m                                           h7m          9930           78.14
## h7y                                           h7y          9930           78.14
## h8d                                           h8d          9896           77.87
## h8m                                           h8m          9896           77.87
## h8y                                           h8y          9896           77.87
## h5d                                           h5d          9604           75.57
## h5m                                           h5m          9604           75.57
## h5y                                           h5y          9604           75.57
## h6d                                           h6d          9574           75.34
## h6m                                           h6m          9574           75.34
## h6y                                           h6y          9574           75.34
## h3d                                           h3d          9272           72.96
## h3m                                           h3m          9272           72.96
## h3y                                           h3y          9272           72.96
## h4d                                           h4d          9233           72.66
## h4m                                           h4m          9233           72.66
## h4y                                           h4y          9233           72.66
## h2d                                           h2d          8961           70.51
## h2m                                           h2m          8961           70.51
## h2y                                           h2y          8961           70.51
## stunting                                 stunting          8567           67.41
## wasting                                   wasting          8557           67.34
## underweight                           underweight          8443           66.44
## hw1                                           hw1          8209           64.60
## age_group                               age_group          8209           64.60
## h5                                             h5          5557           43.73
## h7                                             h7          5557           43.73
## h4                                             h4          5556           43.72
## h6                                             h6          5556           43.72
## h8                                             h8          5556           43.72
## h2                                             h2          5555           43.71
## h3                                             h3          5555           43.71
## h9                                             h9          5555           43.71
## postnatal_care                     postnatal_care          4433           34.88
## child_index                           child_index           719            5.66
## distance_ti_health             distance_ti_health            12            0.09
## getting_permission_to_go getting_permission_to_go            10            0.08
## money_needed                         money_needed            10            0.08
## wait_to_go_alone                 wait_to_go_alone            10            0.08
## cooking_fuel                         cooking_fuel             4            0.03
## hw13                                         hw13             3            0.02
## cluster_number                     cluster_number             0            0.00
## household_number                 household_number             0            0.00
## mother_index                         mother_index             0            0.00
## midx                                         midx             0            0.00
## wealth_index                         wealth_index             0            0.00
## linked                                     linked             0            0.00
## water_source                         water_source             0            0.00
## electricity                           electricity             0            0.00
## toilet_facility                   toilet_facility             0            0.00
## wealth_cat                             wealth_cat             0            0.00

9 Reason for Messingness:

In this section we check reason for messingness.

merged_datasets <- merged_datasets %>%
  mutate(hw13_reason = case_when(
    hw13 == 0 ~ "Measured",
    hw13 == 1 ~ "Child died",
    hw13 == 2 ~ "Child sick",
    hw13 == 3 ~ "Not present",
    hw13 == 4 ~ "Refused",
    hw13 == 5 ~ "Mother refused",
    hw13 == 6 ~ "Other",
    hw13 == 7 ~ "No measurement in household",
    TRUE      ~ NA_character_
  ))

cat("\nReasons for missing stunting/underweight data (hw13):\n")

## 
## Reasons for missing stunting/underweight data (hw13):

print(table(merged_datasets$hw13_reason, useNA="ifany"))

## 
##                  Child died                    Measured 
##                         719                        4227 
## No measurement in household                 Not present 
##                        7490                         111 
##                       Other                     Refused 
##                          72                          86 
##                        <NA> 
##                           3

Interpretation:

Most missing cases are due to children not being measured, followed by child deaths, refusals, or absence.

10 Visual Missingness Check (naniar plots):

In this section different plot help to visualize missingness pattern.

# Missingness across variables
gg_miss_var(merged_datasets)

# Upset plot for overlapping missingness
gg_miss_upset(merged_datasets)

# Missingness by wealth group
gg_miss_var(merged_datasets, facet = "wealth_cat")

Interpretation:

Missingness Across Wealth Groups

The missingness in stunting, underweight, and wasting is roughly uniform across wealth categories. This is expected because:

Survey Design: PDHS/MICS uses standardized data collection protocols, so the likelihood of a measurement being missing (due to absence, refusal, sickness, or other reasons) is generally independent of household wealth.

Wealth Index Construction: The wealth index is internally defined within the survey sample, splitting households into quintiles relative to others in the same country. Therefore, “poorest” and “richest” are relative categories, not absolute wealth levels. Missingness patterns caused by logistical factors (like a child being absent or sick) affect all households similarly.

Field Practicalities: Measurement errors, refusals, or children not present at the time of survey occur randomly across households, making the missingness appear uniform across all wealth groups.

Conclusion: Uniform missingness across wealth groups does not indicate data fabrication; rather, it reflects the design and practical constraints of the survey.

11 Structural Variables and Correlations

To explore how household and healthcare-related factors cluster together, we calculated Spearman correlations among key structural variables (e.g., water source, electricity, sanitation, cooking fuel). Correlations were also examined separately by wealth groups to assess whether these relationships vary by socioeconomic status.

library(dplyr)
library(ggcorrplot)

## Warning: package 'ggcorrplot' was built under R version 4.5.1

# --- Select required variables including new ones ---
structural_variables <- merged_datasets %>%
  select(any_of(c("toilet_facility", "water_source", "postnatal_care", "cooking_fuel",
                  "distance_ti_health", "wait_to_go_alone", 
                  "getting_permission_to_go", "money_needed")))

# Convert all to numeric
struct_vars_num <- structural_variables %>% mutate(across(everything(), as.numeric))

# --- Overall correlation ---
corr_matrix <- cor(struct_vars_num, use = "pairwise.complete.obs", method = "spearman")
corr_matrix[is.na(corr_matrix)] <- 0  # replace NAs

# Plot overall correlation heatmap
ggcorrplot(corr_matrix, 
           lab = TRUE, lab_size = 4,
           hc.order = TRUE,
           type = "lower",
           colors = c("red", "white", "blue"),
           title = "Overall Correlation of Structural Variables",
           ggtheme = theme_minimal() +
             theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
                   axis.text.y = element_text(size = 10)))

# --- Correlation by wealth group ---
for (grp in levels(merged_datasets$wealth_cat)) {
  subset_data <- struct_vars_num[merged_datasets$wealth_cat == grp, ]
  
  # Correlation for this wealth group
  corr_matrix_grp <- cor(subset_data, use = "pairwise.complete.obs", method = "spearman")
  corr_matrix_grp[is.na(corr_matrix_grp)] <- 0  # replace NAs
  
  print(
    ggcorrplot(corr_matrix_grp,
               lab = TRUE, lab_size = 4,
               hc.order = TRUE,
               type = "lower",
               colors = c("red", "white", "blue"),
               title = paste("Correlation by Wealth Group:", grp),
               ggtheme = theme_minimal() +
                 theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
                       axis.text.y = element_text(size = 10)))
  )
}

Interpretation:

Overall correlation:

The overall heatmap shows that access barriers are strongly linked. For example, needing permission and money are closely related (0.64), and distance to a health facility is linked with time or needing to go alone (0.50–0.58). Structural factors like toilet, water, and cooking fuel are largely independent of these access barriers, with weak or negative correlations (-0.21 to -0.23).

Correlation by Wealth Group:

Poorest: Barriers are tightly clustered. Permission, money, and distance show strong correlations (0.53–0.62). Postnatal care use is slightly positively linked with overcoming money (0.2) and permission barriers (0.13).

Richest: Correlations among barriers are weaker, often near zero (e.g., Permission–Money: 0.32). Barriers are less consistently linked.

Poorer, Middle, Richer: These groups show moderate clustering among barriers (Permission–Distance: 0.60–0.67). Structural factors and postnatal care remain mostly unlinked (near zero or slightly negative).

Conclusion: Access barriers tend to cluster more in poorer households, while richer households experience barriers more independently. Structural household factors generally do not predict healthcare access.

12 Summary Barrier Index:

I created a summary barrier index by combining all reported healthcare access barriers, including getting permission to go, money needed, distance to the health facility, and not wanting to go alone. The index ranges from 0 to 4, where higher values indicate a greater number of barriers faced.

# --- Clean and create Summary Barrier Index ---
library(dplyr)

# Step 1: Check unique values 
sapply(merged_datasets[, c("getting_permission_to_go", "money_needed", 
                           "distance_ti_health", "transport", 
                           "wait_to_go_alone")], unique)

## $getting_permission_to_go
## <labelled<double>[3]>: getting medical help for self: getting permission to go
## [1]  2  1 NA
## 
## Labels:
##  value             label
##      0        no problem
##      1       big problem
##      2 not a big problem
## 
## $money_needed
## <labelled<double>[3]>: getting medical help for self: getting money needed for treatment
## [1]  2  1 NA
## 
## Labels:
##  value             label
##      0        no problem
##      1       big problem
##      2 not a big problem
## 
## $distance_ti_health
## <labelled<double>[3]>: getting medical help for self: distance to health facility
## [1]  2  1 NA
## 
## Labels:
##  value             label
##      0        no problem
##      1       big problem
##      2 not a big problem
## 
## $transport
## <labelled<double>[1]>: na - getting medical help for self: having to take transport
## [1] NA
## 
## Labels:
##  value             label
##      0        no problem
##      1       big problem
##      2 not a big problem
## 
## $wait_to_go_alone
## <labelled<double>[3]>: getting medical help for self: not wanting to go alone
## [1]  2  1 NA
## 
## Labels:
##  value             label
##      0        no problem
##      1       big problem
##      2 not a big problem

library(dplyr)

# Recode and create the summary barrier index
merged_datasets <- merged_datasets %>%
  mutate(across(c(getting_permission_to_go, money_needed, distance_ti_health,
                  transport, wait_to_go_alone),
                ~ case_when(
                  . == 1 ~ 1,  # Big problem → barrier present
                  . %in% c(0, 2) ~ 0,  # No or small problem → no barrier
                  TRUE ~ NA_real_  # Keep NA
                ))) %>%
  mutate(barrier_summary = rowSums(across(c(getting_permission_to_go, 
                                            money_needed, 
                                            distance_ti_health, 
                                            transport, 
                                            wait_to_go_alone)),
                                   na.rm = TRUE))

# View summary of the new variable
summary(merged_datasets$barrier_summary)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   2.000   1.833   3.000   4.000

table(merged_datasets$barrier_summary)

## 
##    0    1    2    3    4 
## 3379 2354 2700 1556 2719

# Average number of barriers by wealth group
merged_datasets %>%
  group_by(wealth_cat) %>%
  summarise(
    mean_barriers = mean(barrier_summary, na.rm = TRUE),
    sd_barriers   = sd(barrier_summary, na.rm = TRUE),
    n             = n()
  )

## # A tibble: 5 × 4
##   wealth_cat mean_barriers sd_barriers     n
##   <fct>              <dbl>       <dbl> <int>
## 1 poorest            2.40         1.45  2907
## 2 poorer             2.33         1.41  2905
## 3 middle             1.85         1.44  2510
## 4 richer             1.31         1.36  2222
## 5 richest            0.919        1.10  2164

ggplot(merged_datasets, aes(x = wealth_cat, y = barrier_summary, fill = wealth_cat)) +
  geom_boxplot() +
  theme_minimal() +
  labs(
    title = "Distribution of Access Barriers by Wealth Group",
    x = "Wealth Group",
    y = "Number of Barriers"
  ) +
  scale_fill_brewer(palette = "Blues")

Interpretation:

The mean number of healthcare access barriers decreases steadily with higher wealth status — from 2.40 in the poorest to 0.92 in the richest group. This shows that poorer households face more overlapping barriers to healthcare, while wealthier households experience fewer and more isolated ones.

13 Any Barrier Index:

I also created an any barrier index, which shows whether a woman experienced at least one barrier to healthcare access (1 = any barrier, 0 = no barrier). This provides a simpler, binary view of the overall access problems.

# --- Create "Any Barrier" Index ---
merged_datasets <- merged_datasets %>%
  mutate(
    any_barrier = ifelse(barrier_summary > 0, 1, 0)
  )

# View basic results
table(merged_datasets$any_barrier)

## 
##    0    1 
## 3379 9329

prop.table(table(merged_datasets$any_barrier))

## 
##         0         1 
## 0.2658955 0.7341045

# by wealth groups
merged_datasets %>%
  group_by(wealth_cat) %>%
  summarise(
    percent_with_any_barrier = mean(any_barrier, na.rm = TRUE) * 100,
    n = n()
  )

## # A tibble: 5 × 3
##   wealth_cat percent_with_any_barrier     n
##   <fct>                         <dbl> <int>
## 1 poorest                        84.3  2907
## 2 poorer                         84.6  2905
## 3 middle                         75.5  2510
## 4 richer                         61.7  2222
## 5 richest                        53.3  2164

library(ggplot2)

# Bar plot of % women with any barrier by wealth group
merged_datasets %>%
  group_by(wealth_cat) %>%
  summarise(percent_with_any_barrier = mean(any_barrier, na.rm = TRUE) * 100) %>%
  ggplot(aes(x = wealth_cat, y = percent_with_any_barrier, fill = wealth_cat)) +
  geom_col(width = 0.7) +
  geom_text(aes(label = sprintf("%.1f%%", percent_with_any_barrier)),
            vjust = -0.5, size = 4) +
  scale_fill_brewer(palette = "Blues") +
  theme_minimal() +
  labs(
    title = "Percentage of Women Facing Any Barrier to Healthcare Access",
    x = "Wealth Group",
    y = "Percent with Any Barrier"
  ) +
  theme(
    legend.position = "none",
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    axis.text.x = element_text(size = 11),
    axis.text.y = element_text(size = 11)
  ) +
  ylim(0, 100)

Interpretation:

The share of households facing at least one healthcare access barrier declines sharply with wealth — from 84% among the poorest to 53% among the richest. This means that most poor households encounter some difficulty accessing healthcare, while barriers are less common among wealthier groups.

Chi-Square Analysis of Categorical Variables

14. Chi-Square Analysis:

In this section, I test the association between categorical variables such as child malnutrition indicators (stunted, underweight, wasted), household & Enviromental factors using the Chi-square test of independence.

14.1 Prepare categorical variables:

I convert nutritional outcomes into binary categories based on standard cutoffs (Z < -2 indicates malnutrition).

# --- Drop variables with >95% missing ---
vars_to_drop <- missing_table$Variable[missing_table$Missing_Percent > 95]
merged_datasets <- merged_datasets %>% select(-all_of(vars_to_drop))

cat("\nDropped variables due to excessive missingness (>95%):\n")

## 
## Dropped variables due to excessive missingness (>95%):

print(vars_to_drop)

## [1] "time_to_water"    "know_where_to_go" "transport"

 #binary
merged_datasets <- merged_datasets %>%
  mutate(
    stunted = ifelse(stunting < -2, 1, 0),
    underweight_bin = ifelse(underweight < -2, 1, 0),
    wasting_bin = ifelse(wasting < -2, 1, 0)
  )

    
# Count number of stunted children (1 = stunted)
table(merged_datasets$stunted)

## 
##    0    1 
## 2561 1580

# Count number of underweight children
table(merged_datasets$underweight_bin)

## 
##    0    1 
## 3281  984

# Count number of wasted children
table(merged_datasets$wasting_bin)

## 
##    0    1 
## 3817  334

14.2 CHI-SQUARE TESTS BETWEEN NUTRITION OUTCOMES AND ENVIRONMENTAL/HOUSEHOLD FACTORS:

  # --- Install and load required packages ---
    if (!requireNamespace("DescTools", quietly = TRUE)) {
      install.packages("DescTools")
    }
    library(DescTools)

## Warning: package 'DescTools' was built under R version 4.5.1

## 
## Attaching package: 'DescTools'

## The following objects are masked from 'package:psych':
## 
##     AUC, ICC, SD

    library(dplyr)
    
    
    dep_vars <- c("stunted", "underweight_bin", "wasting_bin")
    indep_vars <- c("wealth_cat", "wait_to_go_alone", "money_needed",
                    "getting_permission_to_go", "distance_ti_health",
                    "postnatal_care", "cooking_fuel", "toilet_facility", "water_source")
    
    # create empty results data frame
    chi_results <- data.frame()
    
    for (dv in dep_vars) {
      for (iv in indep_vars) {
        
        # check both variables exist
        if (!(dv %in% names(merged_datasets)) || !(iv %in% names(merged_datasets))) next
        
        # drop missing data
        df_sub <- merged_datasets %>% select(all_of(c(dv, iv))) %>% drop_na()
        
        # skip if empty
        if (nrow(df_sub) == 0) next
        
        # contingency table
        tbl <- table(df_sub[[dv]], df_sub[[iv]])
        if (sum(tbl) == 0) next
        
        # chi-square test
        chi <- suppressWarnings(chisq.test(tbl))
        cv <- suppressWarnings(CramerV(tbl))
        
        # store results
        chi_results <- rbind(chi_results, data.frame(
          Dependent = dv,
          Independent = iv,
          Chi_Square = round(chi$statistic, 3),
          df = chi$parameter,
          p_value = round(chi$p.value, 5),
          Cramers_V = round(cv, 3)
        ))
      }
    }
    
    # print results in console
    print(chi_results)

##                   Dependent              Independent Chi_Square df p_value
## X-squared           stunted               wealth_cat    282.733  4 0.00000
## X-squared1          stunted         wait_to_go_alone     22.849  1 0.00000
## X-squared2          stunted             money_needed     94.337  1 0.00000
## X-squared3          stunted getting_permission_to_go     31.053  1 0.00000
## X-squared4          stunted       distance_ti_health     45.459  1 0.00000
## X-squared5          stunted           postnatal_care     17.167  2 0.00019
## X-squared6          stunted             cooking_fuel    121.319 10 0.00000
## X-squared7          stunted          toilet_facility    190.003 12 0.00000
## X-squared8          stunted             water_source     88.448 15 0.00000
## X-squared9  underweight_bin               wealth_cat    285.524  4 0.00000
## X-squared10 underweight_bin         wait_to_go_alone     12.443  1 0.00042
## X-squared11 underweight_bin             money_needed     60.703  1 0.00000
## X-squared12 underweight_bin getting_permission_to_go     44.550  1 0.00000
## X-squared13 underweight_bin       distance_ti_health     20.993  1 0.00000
## X-squared14 underweight_bin           postnatal_care      5.647  2 0.05938
## X-squared15 underweight_bin             cooking_fuel    150.952 10 0.00000
## X-squared16 underweight_bin          toilet_facility    206.585 12 0.00000
## X-squared17 underweight_bin             water_source    110.240 15 0.00000
## X-squared18     wasting_bin               wealth_cat     32.504  4 0.00000
## X-squared19     wasting_bin         wait_to_go_alone      4.054  1 0.04407
## X-squared20     wasting_bin             money_needed     11.803  1 0.00059
## X-squared21     wasting_bin getting_permission_to_go     11.235  1 0.00080
## X-squared22     wasting_bin       distance_ti_health      8.481  1 0.00359
## X-squared23     wasting_bin           postnatal_care      7.237  2 0.02682
## X-squared24     wasting_bin             cooking_fuel     17.070 10 0.07283
## X-squared25     wasting_bin          toilet_facility     43.494 12 0.00002
## X-squared26     wasting_bin             water_source     73.901 15 0.00000
##             Cramers_V
## X-squared       0.261
## X-squared1      0.075
## X-squared2      0.152
## X-squared3      0.087
## X-squared4      0.105
## X-squared5      0.079
## X-squared6      0.171
## X-squared7      0.214
## X-squared8      0.146
## X-squared9      0.259
## X-squared10     0.055
## X-squared11     0.120
## X-squared12     0.103
## X-squared13     0.071
## X-squared14     0.045
## X-squared15     0.188
## X-squared16     0.220
## X-squared17     0.161
## X-squared18     0.088
## X-squared19     0.032
## X-squared20     0.054
## X-squared21     0.053
## X-squared22     0.046
## X-squared23     0.051
## X-squared24     0.064
## X-squared25     0.102
## X-squared26     0.133

Chi square Interpretation:

Chi-square tests examined whether child malnutrition (stunting, underweight, wasting) is associated with household and environmental factors. Household wealth showed the strongest association with stunting (V = 0.261) and underweight (V = 0.259), while toilet facility was also strongly linked to both outcomes (V = 0.214–0.220). Overall, 25 of 27 tests were statistically significant (p < 0.05), indicating most factors are related to nutritional status. Only underweight vs. postnatal care (p = 0.059) and wasting vs. cooking fuel (p = 0.073) were not significant. Associations with wasting were generally weaker (e.g., water source V = 0.133, toilet V = 0.102), suggesting different factors may drive acute malnutrition. In short, wealth and sanitation are the key determinants of chronic malnutrition, while acute malnutrition shows weaker associations.

Although Chi square show association between different variables but it is not trusworthy because of large sample size so we will more rely on cram V results.

Cram v Interpretation:

Cramér’s V values indicated that the strength of association between malnutrition indicators and socioeconomic or environmental factors ranged from very weak to moderate. For stunting and underweight, wealth category (V ≈ 0.26) and toilet facility (V ≈ 0.21–0.22) showed moderate associations, highlighting the importance of economic status and sanitation in child nutrition. Other variables such as money needed for treatment, distance to healthcare, cooking fuel, and water source demonstrated weak relationships. In contrast, all associations with wasting were weak or very weak (V ≤ 0.13), suggesting that acute malnutrition is influenced by additional short-term or biological factors. Overall, the findings indicate that while socioeconomic and environmental factors are related to child nutritional outcomes, their effects are generally moderate or weak, reflecting the multifactorial nature of malnutrition.

Missing values removal:

In this section we remove all missing values in all variables wether its dependent or independent and create a full clean data set which used further in logistic regression models.

# Household independent variable
household_vars <- c("wealth_cat")

# Structural independent variables
structural_vars <- c("toilet_facility", "water_source", "postnatal_care", 
                     "cooking_fuel", "distance_ti_health", "wait_to_go_alone", 
                     "getting_permission_to_go", "money_needed")

# Remove rows with missing household variable
household_clean <- merged_datasets %>%
  select(all_of(household_vars)) %>%
  drop_na()

# Remove rows with missing structural variables
structural_clean <- merged_datasets %>%
  select(all_of(structural_vars)) %>%
  drop_na()

# Check the dimensions
cat("Household variable cleaned data dimensions:", dim(household_clean), "\n")

## Household variable cleaned data dimensions: 12708 1

cat("Structural variables cleaned data dimensions:", dim(structural_clean), "\n")

## Structural variables cleaned data dimensions: 8265 8

# Inspect first few rows
head(household_clean)

## # A tibble: 6 × 1
##   wealth_cat
##   <fct>     
## 1 poorest   
## 2 poorer    
## 3 poorer    
## 4 poorest   
## 5 poorest   
## 6 poorer

head(structural_clean)

## # A tibble: 6 × 8
##   toilet_facility    water_source postnatal_care cooking_fuel distance_ti_health
##   <dbl+lbl>          <dbl+lbl>    <dbl+lbl>      <dbl+lbl>                 <dbl>
## 1 13 [flush to pit … 42 [unprote… 0 [no]         8 [wood]                      0
## 2 13 [flush to pit … 43 [river/d… 1 [yes]        8 [wood]                      1
## 3 13 [flush to pit … 43 [river/d… 0 [no]         8 [wood]                      1
## 4 13 [flush to pit … 42 [unprote… 0 [no]         8 [wood]                      1
## 5 13 [flush to pit … 41 [protect… 0 [no]         8 [wood]                      1
## 6 13 [flush to pit … 41 [protect… 0 [no]         8 [wood]                      1
## # ℹ 3 more variables: wait_to_go_alone <dbl>, getting_permission_to_go <dbl>,
## #   money_needed <dbl>

# Number of rows before removing NAs
n_before <- nrow(merged_datasets %>% select(all_of(structural_vars)))
cat("Number of rows with structural variables before removing NAs:", n_before, "\n")

## Number of rows with structural variables before removing NAs: 12708

# Number of rows after removing NAs
n_after <- nrow(structural_clean)
cat("Number of rows with structural variables after removing NAs:", n_after, "\n")

## Number of rows with structural variables after removing NAs: 8265

# Optional: how many rows were removed
cat("Number of rows removed due to missing structural variables:", n_before - n_after, "\n")

## Number of rows removed due to missing structural variables: 4443

# Remove rows with missing values in malnutrition outcomes
malnutrition_clean <- merged_datasets %>%
  drop_na(stunting, underweight, wasting)

# Number of rows before removing NAs
n_before <- nrow(merged_datasets)
cat("Number of rows before removing missing malnutrition outcomes:", n_before, "\n")

## Number of rows before removing missing malnutrition outcomes: 12708

# Number of rows after removing NAs
n_after <- nrow(malnutrition_clean)
cat("Number of rows after removing missing malnutrition outcomes:", n_after, "\n")

## Number of rows after removing missing malnutrition outcomes: 4098

# Number of rows removed
cat("Number of rows removed due to missing malnutrition outcomes:", n_before - n_after, "\n")

## Number of rows removed due to missing malnutrition outcomes: 8610

# Number of missing values in each malnutrition outcome
na_stunting <- sum(is.na(merged_datasets$stunting))
na_underweight <- sum(is.na(merged_datasets$underweight))
na_wasting <- sum(is.na(merged_datasets$wasting))

cat("Missing stunting values:", na_stunting, "\n")

## Missing stunting values: 8567

cat("Missing underweight values:", na_underweight, "\n")

## Missing underweight values: 8443

cat("Missing wasting values:", na_wasting, "\n")

## Missing wasting values: 8557

# Check missing values in the cleaned dataset
sum(is.na(malnutrition_clean$stunting))        # Should be 0

## [1] 0

sum(is.na(malnutrition_clean$underweight))     # Should be 0

## [1] 0

sum(is.na(malnutrition_clean$wasting))         # Should be 0

## [1] 0

# Remove rows with missing child age (hw1) in merged_datasets
merged_datasets <- merged_datasets %>%
  drop_na(hw1)
# Count missing values in hw1
sum(is.na(merged_datasets$hw1))

## [1] 0

## converting to binary variables;

merged_datasets <- merged_datasets %>%
  mutate(
    stunted = ifelse(stunting < -2, 1, 0),
    underweight_bin = ifelse(underweight < -2, 1, 0),
    wasting_bin = ifelse(wasting < -2, 1, 0)
  )


# Define all variables to keep
all_vars <- c("hw1", 
              "stunting", "underweight", "wasting", 
              "stunted", "underweight_bin", "wasting_bin",
              "wealth_cat",
              "toilet_facility", "water_source", "postnatal_care", 
              "cooking_fuel", "distance_ti_health", "wait_to_go_alone", 
              "getting_permission_to_go", "money_needed")

# Create a fully cleaned dataset with no missing values in any of the variables
full_clean <- merged_datasets %>%
  select(all_of(all_vars)) %>%
  drop_na()

# Check number of rows after cleaning
cat("Number of rows in fully cleaned dataset:", nrow(full_clean), "\n")

## Number of rows in fully cleaned dataset: 2709

# Check first few rows
head(full_clean)

## # A tibble: 6 × 16
##     hw1 stunting underweight wasting stunted underweight_bin wasting_bin
##   <dbl>    <dbl>       <dbl>   <dbl>   <dbl>           <dbl>       <dbl>
## 1    42    -3.48       -2.33   -0.41       1               1           0
## 2    26    -2.67       -2.43   -1.24       1               1           0
## 3    10     2.33       -1.35   -3.31       0               0           1
## 4    34    -3.04       -1.65    0.13       1               0           0
## 5    15    -0.71       -2.92   -3.6        0               1           1
## 6     4     3.77       -1.41   -4.61       0               0           1
## # ℹ 9 more variables: wealth_cat <fct>, toilet_facility <dbl+lbl>,
## #   water_source <dbl+lbl>, postnatal_care <dbl+lbl>, cooking_fuel <dbl+lbl>,
## #   distance_ti_health <dbl>, wait_to_go_alone <dbl>,
## #   getting_permission_to_go <dbl>, money_needed <dbl>

# Optional: summary of cleaned dataset
summary(full_clean)

##       hw1          stunting      underweight        wasting       
##  Min.   : 0.0   Min.   :-5.99   Min.   :-5.900   Min.   :-4.9400  
##  1st Qu.: 9.0   1st Qu.:-2.50   1st Qu.:-1.810   1st Qu.:-1.0500  
##  Median :20.0   Median :-1.39   Median :-0.990   Median :-0.2300  
##  Mean   :22.8   Mean   :-1.41   Mean   :-1.024   Mean   :-0.2851  
##  3rd Qu.:35.0   3rd Qu.:-0.32   3rd Qu.:-0.150   3rd Qu.: 0.5700  
##  Max.   :59.0   Max.   : 5.81   Max.   : 3.900   Max.   : 4.9900  
##     stunted       underweight_bin   wasting_bin        wealth_cat 
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   poorest:555  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   poorer :665  
##  Median :0.0000   Median :0.0000   Median :0.00000   middle :506  
##  Mean   :0.3536   Mean   :0.2097   Mean   :0.09413   richer :481  
##  3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   richest:502  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000                
##  toilet_facility  water_source   postnatal_care   cooking_fuel   
##  Min.   :11.00   Min.   :11.00   Min.   :0.000   Min.   : 1.000  
##  1st Qu.:12.00   1st Qu.:12.00   1st Qu.:0.000   1st Qu.: 3.000  
##  Median :13.00   Median :21.00   Median :0.000   Median : 8.000  
##  Mean   :15.97   Mean   :23.45   Mean   :0.285   Mean   : 5.779  
##  3rd Qu.:13.00   3rd Qu.:21.00   3rd Qu.:1.000   3rd Qu.: 8.000  
##  Max.   :96.00   Max.   :96.00   Max.   :8.000   Max.   :11.000  
##  distance_ti_health wait_to_go_alone getting_permission_to_go  money_needed   
##  Min.   :0.0000     Min.   :0.0000   Min.   :0.0000           Min.   :0.0000  
##  1st Qu.:0.0000     1st Qu.:0.0000   1st Qu.:0.0000           1st Qu.:0.0000  
##  Median :1.0000     Median :1.0000   Median :0.0000           Median :0.0000  
##  Mean   :0.5109     Mean   :0.6704   Mean   :0.2794           Mean   :0.3736  
##  3rd Qu.:1.0000     3rd Qu.:1.0000   3rd Qu.:1.0000           3rd Qu.:1.0000  
##  Max.   :1.0000     Max.   :1.0000   Max.   :1.0000           Max.   :1.0000

Logistic regression:

Household only:

## household factors :

# Load required packages
library(dplyr)
library(broom)       # for tidy model output
library(margins)     # for marginal effects

## Warning: package 'margins' was built under R version 4.5.1

library(pscl)        # for pseudo R-squared

## Warning: package 'pscl' was built under R version 4.5.1

## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.

# 1. Logistic regression models WITHOUT child age (hw1)
model_stunted_wealth <- glm(stunted ~ wealth_cat, data = full_clean, family = binomial)
model_underweight_wealth <- glm(underweight_bin ~ wealth_cat, data = full_clean, family = binomial)
model_wasting_wealth <- glm(wasting_bin ~ wealth_cat, data = full_clean, family = binomial)

# 2. Logistic regression models WITH child age (hw1)
model_stunted_wealth_age <- glm(stunted ~ wealth_cat + hw1, data = full_clean, family = binomial)
model_underweight_wealth_age <- glm(underweight_bin ~ wealth_cat + hw1, data = full_clean, family = binomial)
model_wasting_wealth_age <- glm(wasting_bin ~ wealth_cat + hw1, data = full_clean, family = binomial)

# 3. Model summaries
summary(model_stunted_wealth)

## 
## Call:
## glm(formula = stunted ~ wealth_cat, family = binomial, data = full_clean)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         0.1698     0.0852   1.993   0.0463 *  
## wealth_catpoorer   -0.6702     0.1169  -5.734 9.78e-09 ***
## wealth_catmiddle   -0.8600     0.1271  -6.768 1.30e-11 ***
## wealth_catricher   -1.1421     0.1330  -8.584  < 2e-16 ***
## wealth_catrichest  -1.4997     0.1389 -10.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3519.9  on 2708  degrees of freedom
## Residual deviance: 3371.7  on 2704  degrees of freedom
## AIC: 3381.7
## 
## Number of Fisher Scoring iterations: 4

summary(model_stunted_wealth_age)

## 
## Call:
## glm(formula = stunted ~ wealth_cat + hw1, family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -0.481891   0.104488  -4.612 3.99e-06 ***
## wealth_catpoorer  -0.724487   0.120273  -6.024 1.70e-09 ***
## wealth_catmiddle  -0.888319   0.130440  -6.810 9.75e-12 ***
## wealth_catricher  -1.233227   0.137104  -8.995  < 2e-16 ***
## wealth_catrichest -1.565836   0.142601 -10.981  < 2e-16 ***
## hw1                0.029355   0.002644  11.103  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3519.9  on 2708  degrees of freedom
## Residual deviance: 3243.4  on 2703  degrees of freedom
## AIC: 3255.4
## 
## Number of Fisher Scoring iterations: 4

summary(model_underweight_wealth)

## 
## Call:
## glm(formula = underweight_bin ~ wealth_cat, family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -0.55820    0.08822  -6.327 2.50e-10 ***
## wealth_catpoorer  -0.55046    0.12587  -4.373 1.22e-05 ***
## wealth_catmiddle  -1.04178    0.14807  -7.036 1.98e-12 ***
## wealth_catricher  -1.11495    0.15300  -7.287 3.17e-13 ***
## wealth_catrichest -1.88849    0.18694 -10.102  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2782.3  on 2708  degrees of freedom
## Residual deviance: 2629.9  on 2704  degrees of freedom
## AIC: 2639.9
## 
## Number of Fisher Scoring iterations: 5

summary(model_underweight_wealth_age)

## 
## Call:
## glm(formula = underweight_bin ~ wealth_cat + hw1, family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -0.899767   0.113112  -7.955 1.80e-15 ***
## wealth_catpoorer  -0.566187   0.126735  -4.467 7.91e-06 ***
## wealth_catmiddle  -1.044240   0.148829  -7.016 2.28e-12 ***
## wealth_catricher  -1.141753   0.153984  -7.415 1.22e-13 ***
## wealth_catrichest -1.900188   0.187634 -10.127  < 2e-16 ***
## hw1                0.014760   0.002968   4.972 6.62e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2782.3  on 2708  degrees of freedom
## Residual deviance: 2605.3  on 2703  degrees of freedom
## AIC: 2617.3
## 
## Number of Fisher Scoring iterations: 5

summary(model_wasting_wealth)

## 
## Call:
## glm(formula = wasting_bin ~ wealth_cat, family = binomial, data = full_clean)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.95210    0.12865 -15.174  < 2e-16 ***
## wealth_catpoorer   0.05844    0.17255   0.339  0.73487    
## wealth_catmiddle  -0.47636    0.20758  -2.295  0.02174 *  
## wealth_catricher  -0.62410    0.21954  -2.843  0.00447 ** 
## wealth_catrichest -1.03945    0.24557  -4.233 2.31e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1690.4  on 2708  degrees of freedom
## Residual deviance: 1655.9  on 2704  degrees of freedom
## AIC: 1665.9
## 
## Number of Fisher Scoring iterations: 5

summary(model_wasting_wealth_age)

## 
## Call:
## glm(formula = wasting_bin ~ wealth_cat + hw1, family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.621652   0.155227 -10.447  < 2e-16 ***
## wealth_catpoorer   0.067392   0.173125   0.389 0.697077    
## wealth_catmiddle  -0.486002   0.208105  -2.335 0.019524 *  
## wealth_catricher  -0.614165   0.220071  -2.791 0.005258 ** 
## wealth_catrichest -1.048078   0.246018  -4.260 2.04e-05 ***
## hw1               -0.015612   0.004374  -3.570 0.000357 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1690.4  on 2708  degrees of freedom
## Residual deviance: 1642.5  on 2703  degrees of freedom
## AIC: 1654.5
## 
## Number of Fisher Scoring iterations: 5

# 4. Pseudo R-squared
pR2(model_stunted_wealth)

## fitting null model for pseudo-r2

##           llh       llhNull            G2      McFadden          r2ML 
## -1.685863e+03 -1.759952e+03  1.481777e+02  4.209709e-02  5.322926e-02 
##          r2CU 
##  7.318875e-02

pR2(model_stunted_wealth_age)

## fitting null model for pseudo-r2

##           llh       llhNull            G2      McFadden          r2ML 
## -1.621718e+03 -1.759952e+03  2.764672e+02  7.854397e-02  9.702019e-02 
##          r2CU 
##  1.334001e-01

pR2(model_underweight_wealth)

## fitting null model for pseudo-r2

##           llh       llhNull            G2      McFadden          r2ML 
## -1.314954e+03 -1.391129e+03  1.523484e+02  5.475711e-02  5.468574e-02 
##          r2CU 
##  8.518885e-02

pR2(model_underweight_wealth_age)

## fitting null model for pseudo-r2

##           llh       llhNull            G2      McFadden          r2ML 
## -1.302641e+03 -1.391129e+03  1.769751e+02  6.360845e-02  6.324037e-02 
##          r2CU 
##  9.851515e-02

pR2(model_wasting_wealth)

## fitting null model for pseudo-r2

##           llh       llhNull            G2      McFadden          r2ML 
## -827.93319436 -845.18615278   34.50591683    0.02041321    0.01265673 
##          r2CU 
##    0.02726602

pR2(model_wasting_wealth_age)

## fitting null model for pseudo-r2

##           llh       llhNull            G2      McFadden          r2ML 
## -821.24752102 -845.18615278   47.87726352    0.02832350    0.01751815 
##          r2CU 
##    0.03773882

# 5. AIC/BIC for model comparison
AIC(model_stunted_wealth, model_stunted_wealth_age)

##                          df      AIC
## model_stunted_wealth      5 3381.726
## model_stunted_wealth_age  6 3255.437

BIC(model_stunted_wealth, model_stunted_wealth_age)

##                          df      BIC
## model_stunted_wealth      5 3411.248
## model_stunted_wealth_age  6 3290.863

AIC(model_underweight_wealth, model_underweight_wealth_age)

##                              df      AIC
## model_underweight_wealth      5 2639.909
## model_underweight_wealth_age  6 2617.282

BIC(model_underweight_wealth, model_underweight_wealth_age)

##                              df      BIC
## model_underweight_wealth      5 2669.430
## model_underweight_wealth_age  6 2652.708

AIC(model_wasting_wealth, model_wasting_wealth_age)

##                          df      AIC
## model_wasting_wealth      5 1665.866
## model_wasting_wealth_age  6 1654.495

BIC(model_wasting_wealth, model_wasting_wealth_age)

##                          df      BIC
## model_wasting_wealth      5 1695.388
## model_wasting_wealth_age  6 1689.921

# 6. Marginal effects
margins(model_stunted_wealth)

## Average marginal effects

## glm(formula = stunted ~ wealth_cat, family = binomial, data = full_clean)

##  wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
##           -0.1649          -0.2084          -0.2679           -0.3332

margins(model_stunted_wealth_age)

## Average marginal effects

## glm(formula = stunted ~ wealth_cat + hw1, family = binomial,     data = full_clean)

##       hw1 wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
##  0.006038          -0.1689          -0.2045          -0.2738           -0.3318

margins(model_underweight_wealth)

## Average marginal effects

## glm(formula = underweight_bin ~ wealth_cat, family = binomial,     data = full_clean)

##  wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
##           -0.1158           -0.196           -0.206           -0.2843

margins(model_underweight_wealth_age)

## Average marginal effects

## glm(formula = underweight_bin ~ wealth_cat + hw1, family = binomial,     data = full_clean)

##       hw1 wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
##  0.002287          -0.1176          -0.1949          -0.2081           -0.2842

margins(model_wasting_wealth)

## Average marginal effects

## glm(formula = wasting_bin ~ wealth_cat, family = binomial, data = full_clean)

##  wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
##          0.006503          -0.0433         -0.05364          -0.07652

margins(model_wasting_wealth_age)

## Average marginal effects

## glm(formula = wasting_bin ~ wealth_cat + hw1, family = binomial,     data = full_clean)

##        hw1 wealth_catpoorer wealth_catmiddle wealth_catricher wealth_catrichest
##  -0.001308         0.007463         -0.04375         -0.05269          -0.07657

# 7. Create tidy tables with coefficients and 95% confidence intervals
tidy(model_stunted_wealth, conf.int = TRUE)

## # A tibble: 5 × 7
##   term              estimate std.error statistic  p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)          0.170    0.0852      1.99 4.63e- 2  0.00308     0.337
## 2 wealth_catpoorer    -0.670    0.117      -5.73 9.78e- 9 -0.900      -0.442
## 3 wealth_catmiddle    -0.860    0.127      -6.77 1.30e-11 -1.11       -0.612
## 4 wealth_catricher    -1.14     0.133      -8.58 9.16e-18 -1.40       -0.883
## 5 wealth_catrichest   -1.50     0.139     -10.8  3.64e-27 -1.78       -1.23

tidy(model_stunted_wealth_age, conf.int = TRUE)

## # A tibble: 6 × 7
##   term              estimate std.error statistic  p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)        -0.482    0.104       -4.61 3.99e- 6  -0.687    -0.277 
## 2 wealth_catpoorer   -0.724    0.120       -6.02 1.70e- 9  -0.961    -0.489 
## 3 wealth_catmiddle   -0.888    0.130       -6.81 9.75e-12  -1.15     -0.634 
## 4 wealth_catricher   -1.23     0.137       -8.99 2.37e-19  -1.50     -0.966 
## 5 wealth_catrichest  -1.57     0.143      -11.0  4.74e-28  -1.85     -1.29  
## 6 hw1                 0.0294   0.00264     11.1  1.21e-28   0.0242    0.0346

tidy(model_underweight_wealth, conf.int = TRUE)

## # A tibble: 5 × 7
##   term              estimate std.error statistic  p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)         -0.558    0.0882     -6.33 2.50e-10   -0.733    -0.387
## 2 wealth_catpoorer    -0.550    0.126      -4.37 1.22e- 5   -0.798    -0.304
## 3 wealth_catmiddle    -1.04     0.148      -7.04 1.98e-12   -1.34     -0.755
## 4 wealth_catricher    -1.11     0.153      -7.29 3.17e-13   -1.42     -0.819
## 5 wealth_catrichest   -1.89     0.187     -10.1  5.42e-24   -2.27     -1.53

tidy(model_underweight_wealth_age, conf.int = TRUE)

## # A tibble: 6 × 7
##   term              estimate std.error statistic  p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)        -0.900    0.113       -7.95 1.80e-15 -1.12      -0.680 
## 2 wealth_catpoorer   -0.566    0.127       -4.47 7.91e- 6 -0.815     -0.318 
## 3 wealth_catmiddle   -1.04     0.149       -7.02 2.28e-12 -1.34      -0.756 
## 4 wealth_catricher   -1.14     0.154       -7.41 1.22e-13 -1.45      -0.844 
## 5 wealth_catrichest  -1.90     0.188      -10.1  4.19e-24 -2.28      -1.54  
## 6 hw1                 0.0148   0.00297      4.97 6.62e- 7  0.00894    0.0206

tidy(model_wasting_wealth, conf.int = TRUE)

## # A tibble: 5 × 7
##   term              estimate std.error statistic  p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)        -1.95       0.129   -15.2   5.26e-52   -2.21    -1.71  
## 2 wealth_catpoorer    0.0584     0.173     0.339 7.35e- 1   -0.279    0.399 
## 3 wealth_catmiddle   -0.476      0.208    -2.29  2.17e- 2   -0.890   -0.0742
## 4 wealth_catricher   -0.624      0.220    -2.84  4.47e- 3   -1.06    -0.201 
## 5 wealth_catrichest  -1.04       0.246    -4.23  2.31e- 5   -1.54    -0.572

tidy(model_wasting_wealth_age, conf.int = TRUE)

## # A tibble: 6 × 7
##   term              estimate std.error statistic  p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)        -1.62     0.155     -10.4   1.51e-25  -1.93    -1.32   
## 2 wealth_catpoorer    0.0674   0.173       0.389 6.97e- 1  -0.271    0.409  
## 3 wealth_catmiddle   -0.486    0.208      -2.34  1.95e- 2  -0.901   -0.0828 
## 4 wealth_catricher   -0.614    0.220      -2.79  5.26e- 3  -1.06    -0.190  
## 5 wealth_catrichest  -1.05     0.246      -4.26  2.04e- 5  -1.55    -0.580  
## 6 hw1                -0.0156   0.00437    -3.57  3.57e- 4  -0.0243  -0.00716

structural only:

## structural only:

# List of structural factors
struct_vars <- c("toilet_facility", "water_source", "postnatal_care", 
                 "cooking_fuel", "distance_ti_health", "wait_to_go_alone", 
                 "getting_permission_to_go", "money_needed")

# Function to create formulas
create_formula <- function(outcome, include_age = FALSE) {
  if(include_age) {
    as.formula(paste(outcome, "~", paste(struct_vars, collapse = " + "), "+ hw1"))
  } else {
    as.formula(paste(outcome, "~", paste(struct_vars, collapse = " + ")))
  }
}

# Outcomes
outcomes <- c("stunted", "underweight_bin", "wasting_bin")

# Run models
models <- list()
for(outcome in outcomes){
  models[[paste0(outcome, "_struct")]] <- glm(create_formula(outcome, FALSE), 
                                              family = binomial, data = full_clean)
  models[[paste0(outcome, "_struct_age")]] <- glm(create_formula(outcome, TRUE), 
                                                  family = binomial, data = full_clean)
}

# Check pseudo R² to see model improvement
library(pscl)
lapply(models, pR2)

## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2

## $stunted_struct
##           llh       llhNull            G2      McFadden          r2ML 
## -1.699649e+03 -1.759952e+03  1.206064e+02  3.426413e-02  4.354415e-02 
##          r2CU 
##  5.987200e-02 
## 
## $stunted_struct_age
##           llh       llhNull            G2      McFadden          r2ML 
## -1.637142e+03 -1.759952e+03  2.456200e+02  6.978032e-02  8.667926e-02 
##          r2CU 
##  1.191816e-01 
## 
## $underweight_bin_struct
##           llh       llhNull            G2      McFadden          r2ML 
## -1.346337e+03 -1.391129e+03  8.958356e+01  3.219816e-02  3.252807e-02 
##          r2CU 
##  5.067187e-02 
## 
## $underweight_bin_struct_age
##           llh       llhNull            G2      McFadden          r2ML 
## -1.334007e+03 -1.391129e+03  1.142427e+02  4.106115e-02  4.129468e-02 
##          r2CU 
##  6.432840e-02 
## 
## $wasting_bin_struct
##           llh       llhNull            G2      McFadden          r2ML 
## -831.10604103 -845.18615278   28.16022350    0.01665918    0.01034122 
##          r2CU 
##    0.02227778 
## 
## $wasting_bin_struct_age
##           llh       llhNull            G2      McFadden          r2ML 
## -825.27958923 -845.18615278   39.81312710    0.02355287    0.01458915 
##          r2CU 
##    0.03142896

# Compare AIC/BIC
AIC(models$stunted_struct, models$stunted_struct_age)

##                           df      AIC
## models$stunted_struct      9 3417.297
## models$stunted_struct_age 10 3294.284

BIC(models$stunted_struct, models$stunted_struct_age)

##                           df      BIC
## models$stunted_struct      9 3470.436
## models$stunted_struct_age 10 3353.327

AIC(models$underweight_bin_struct, models$underweight_bin_struct_age)

##                                   df      AIC
## models$underweight_bin_struct      9 2710.674
## models$underweight_bin_struct_age 10 2688.014

BIC(models$underweight_bin_struct, models$underweight_bin_struct_age)

##                                   df      BIC
## models$underweight_bin_struct      9 2763.813
## models$underweight_bin_struct_age 10 2747.058

AIC(models$wasting_bin_struct, models$wasting_bin_struct_age)

##                               df      AIC
## models$wasting_bin_struct      9 1680.212
## models$wasting_bin_struct_age 10 1670.559

BIC(models$wasting_bin_struct, models$wasting_bin_struct_age)

##                               df      BIC
## models$wasting_bin_struct      9 1733.351
## models$wasting_bin_struct_age 10 1729.603

# Check coefficients and 95% CI
library(broom)
lapply(models, tidy, conf.int = TRUE)

## $stunted_struct
## # A tibble: 9 × 7
##   term                  estimate std.error statistic  p.value conf.low conf.high
##   <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)           -1.47      0.138     -10.7   1.67e-26  -1.75   -1.20    
## 2 toilet_facility        0.0299    0.00514     5.82  5.90e- 9   0.0200  0.0402  
## 3 water_source          -0.00479   0.00295    -1.62  1.04e- 1  -0.0106  0.000930
## 4 postnatal_care        -0.0193    0.0741     -0.260 7.95e- 1  -0.167   0.125   
## 5 cooking_fuel           0.0519    0.0172      3.02  2.53e- 3   0.0182  0.0856  
## 6 distance_ti_health    -0.0949    0.114      -0.836 4.03e- 1  -0.318   0.128   
## 7 wait_to_go_alone       0.129     0.109       1.18  2.38e- 1  -0.0852  0.343   
## 8 getting_permission_t… -0.117     0.123      -0.945 3.45e- 1  -0.359   0.125   
## 9 money_needed           0.491     0.118       4.17  3.10e- 5   0.260   0.722   
## 
## $stunted_struct_age
## # A tibble: 10 × 7
##    term                 estimate std.error statistic  p.value conf.low conf.high
##    <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
##  1 (Intercept)          -2.21      0.159     -13.8   1.40e-43 -2.52    -1.90    
##  2 toilet_facility       0.0291    0.00529     5.51  3.68e- 8  0.0190   0.0397  
##  3 water_source         -0.00527   0.00302    -1.75  8.08e- 2 -0.0112   0.000585
##  4 postnatal_care       -0.0169    0.0754     -0.224 8.23e- 1 -0.166    0.131   
##  5 cooking_fuel          0.0543    0.0176      3.08  2.07e- 3  0.0197   0.0888  
##  6 distance_ti_health   -0.110     0.116      -0.945 3.45e- 1 -0.337    0.118   
##  7 wait_to_go_alone      0.210     0.112       1.88  6.05e- 2 -0.00923  0.430   
##  8 getting_permission_… -0.133     0.126      -1.05  2.92e- 1 -0.380    0.114   
##  9 money_needed          0.527     0.120       4.37  1.23e- 5  0.291    0.763   
## 10 hw1                   0.0289    0.00263    11.0   4.98e-28  0.0237   0.0340  
## 
## $underweight_bin_struct
## # A tibble: 9 × 7
##   term                  estimate std.error statistic  p.value conf.low conf.high
##   <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)           -2.39      0.168     -14.2   1.40e-45 -2.72     -2.06   
## 2 toilet_facility        0.0149    0.00497     3.00  2.69e- 3  0.00508   0.0247 
## 3 water_source          -0.00210   0.00349    -0.601 5.48e- 1 -0.00908   0.00463
## 4 postnatal_care         0.0456    0.0809      0.564 5.73e- 1 -0.120     0.201  
## 5 cooking_fuel           0.107     0.0203      5.27  1.37e- 7  0.0673    0.147  
## 6 distance_ti_health    -0.145     0.134      -1.09  2.78e- 1 -0.407     0.117  
## 7 wait_to_go_alone       0.139     0.131       1.06  2.87e- 1 -0.117     0.396  
## 8 getting_permission_t…  0.279     0.140       2.00  4.59e- 2  0.00566   0.554  
## 9 money_needed           0.225     0.137       1.65  9.98e- 2 -0.0443    0.492  
## 
## $underweight_bin_struct_age
## # A tibble: 10 × 7
##    term                 estimate std.error statistic  p.value conf.low conf.high
##    <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
##  1 (Intercept)          -2.75      0.186     -14.8   2.89e-49 -3.12     -2.39   
##  2 toilet_facility       0.0140    0.00499     2.80  5.15e- 3  0.00408   0.0237 
##  3 water_source         -0.00220   0.00351    -0.627 5.30e- 1 -0.00921   0.00456
##  4 postnatal_care        0.0451    0.0804      0.561 5.75e- 1 -0.118     0.200  
##  5 cooking_fuel          0.108     0.0204      5.29  1.22e- 7  0.0680    0.148  
##  6 distance_ti_health   -0.150     0.134      -1.12  2.63e- 1 -0.412     0.113  
##  7 wait_to_go_alone      0.177     0.132       1.35  1.78e- 1 -0.0804    0.436  
##  8 getting_permission_…  0.276     0.140       1.97  4.90e- 2  0.00177   0.552  
##  9 money_needed          0.233     0.137       1.70  8.92e- 2 -0.0370    0.501  
## 10 hw1                   0.0146    0.00294     4.98  6.40e- 7  0.00888   0.0204 
## 
## $wasting_bin_struct
## # A tibble: 9 × 7
##   term                  estimate std.error statistic  p.value conf.low conf.high
##   <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)           -2.96      0.235     -12.6   2.25e-36 -3.43     -2.51   
## 2 toilet_facility       -0.00917   0.00830    -1.11  2.69e- 1 -0.0264    0.00616
## 3 water_source           0.00583   0.00445     1.31  1.90e- 1 -0.00313   0.0143 
## 4 postnatal_care         0.0627    0.112       0.557 5.77e- 1 -0.181     0.264  
## 5 cooking_fuel           0.0452    0.0279      1.62  1.05e- 1 -0.00924   0.100  
## 6 distance_ti_health     0.180     0.185       0.973 3.30e- 1 -0.180     0.546  
## 7 wait_to_go_alone       0.235     0.188       1.25  2.12e- 1 -0.132     0.606  
## 8 getting_permission_t…  0.203     0.190       1.07  2.84e- 1 -0.167     0.577  
## 9 money_needed           0.197     0.189       1.04  2.98e- 1 -0.177     0.564  
## 
## $wasting_bin_struct_age
## # A tibble: 10 × 7
##    term                 estimate std.error statistic  p.value conf.low conf.high
##    <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
##  1 (Intercept)          -2.65      0.251     -10.6   4.77e-26 -3.15     -2.17   
##  2 toilet_facility      -0.00815   0.00835    -0.977 3.29e- 1 -0.0254    0.00727
##  3 water_source          0.00597   0.00446     1.34  1.80e- 1 -0.00300   0.0145 
##  4 postnatal_care        0.0626    0.116       0.541 5.89e- 1 -0.187     0.270  
##  5 cooking_fuel          0.0455    0.0280      1.63  1.04e- 1 -0.00912   0.101  
##  6 distance_ti_health    0.182     0.186       0.975 3.30e- 1 -0.181     0.550  
##  7 wait_to_go_alone      0.201     0.189       1.06  2.87e- 1 -0.167     0.573  
##  8 getting_permission_…  0.210     0.190       1.10  2.70e- 1 -0.161     0.586  
##  9 money_needed          0.193     0.190       1.01  3.11e- 1 -0.183     0.562  
## 10 hw1                  -0.0146    0.00438    -3.34  8.46e- 4 -0.0233   -0.00615

Odd ratios of both Household and Structural models:

# 1. Function to create tidy table with odds ratios
tidy_or <- function(model) {
  tidy(model, conf.int = TRUE) %>%
    dplyr::mutate(
      odds_ratio = exp(estimate),
      conf.low.or = exp(conf.low),
      conf.high.or = exp(conf.high)
    )
}

# 2. Apply to your models
tidy_or(model_stunted_wealth)

## # A tibble: 5 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep…    0.170    0.0852      1.99 4.63e- 2  0.00308     0.337      1.19 
## 2 wealth_ca…   -0.670    0.117      -5.73 9.78e- 9 -0.900      -0.442      0.512
## 3 wealth_ca…   -0.860    0.127      -6.77 1.30e-11 -1.11       -0.612      0.423
## 4 wealth_ca…   -1.14     0.133      -8.58 9.16e-18 -1.40       -0.883      0.319
## 5 wealth_ca…   -1.50     0.139     -10.8  3.64e-27 -1.78       -1.23       0.223
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>

tidy_or(model_stunted_wealth_age)

## # A tibble: 6 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep…  -0.482    0.104       -4.61 3.99e- 6  -0.687    -0.277       0.618
## 2 wealth_ca…  -0.724    0.120       -6.02 1.70e- 9  -0.961    -0.489       0.485
## 3 wealth_ca…  -0.888    0.130       -6.81 9.75e-12  -1.15     -0.634       0.411
## 4 wealth_ca…  -1.23     0.137       -8.99 2.37e-19  -1.50     -0.966       0.291
## 5 wealth_ca…  -1.57     0.143      -11.0  4.74e-28  -1.85     -1.29        0.209
## 6 hw1          0.0294   0.00264     11.1  1.21e-28   0.0242    0.0346      1.03 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>

tidy_or(model_underweight_wealth)

## # A tibble: 5 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep…   -0.558    0.0882     -6.33 2.50e-10   -0.733    -0.387      0.572
## 2 wealth_ca…   -0.550    0.126      -4.37 1.22e- 5   -0.798    -0.304      0.577
## 3 wealth_ca…   -1.04     0.148      -7.04 1.98e-12   -1.34     -0.755      0.353
## 4 wealth_ca…   -1.11     0.153      -7.29 3.17e-13   -1.42     -0.819      0.328
## 5 wealth_ca…   -1.89     0.187     -10.1  5.42e-24   -2.27     -1.53       0.151
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>

tidy_or(model_underweight_wealth_age)

## # A tibble: 6 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep…  -0.900    0.113       -7.95 1.80e-15 -1.12      -0.680       0.407
## 2 wealth_ca…  -0.566    0.127       -4.47 7.91e- 6 -0.815     -0.318       0.568
## 3 wealth_ca…  -1.04     0.149       -7.02 2.28e-12 -1.34      -0.756       0.352
## 4 wealth_ca…  -1.14     0.154       -7.41 1.22e-13 -1.45      -0.844       0.319
## 5 wealth_ca…  -1.90     0.188      -10.1  4.19e-24 -2.28      -1.54        0.150
## 6 hw1          0.0148   0.00297      4.97 6.62e- 7  0.00894    0.0206      1.01 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>

tidy_or(model_wasting_wealth)

## # A tibble: 5 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep…  -1.95       0.129   -15.2   5.26e-52   -2.21    -1.71        0.142
## 2 wealth_ca…   0.0584     0.173     0.339 7.35e- 1   -0.279    0.399       1.06 
## 3 wealth_ca…  -0.476      0.208    -2.29  2.17e- 2   -0.890   -0.0742      0.621
## 4 wealth_ca…  -0.624      0.220    -2.84  4.47e- 3   -1.06    -0.201       0.536
## 5 wealth_ca…  -1.04       0.246    -4.23  2.31e- 5   -1.54    -0.572       0.354
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>

tidy_or(model_wasting_wealth_age)

## # A tibble: 6 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep…  -1.62     0.155     -10.4   1.51e-25  -1.93    -1.32         0.198
## 2 wealth_ca…   0.0674   0.173       0.389 6.97e- 1  -0.271    0.409        1.07 
## 3 wealth_ca…  -0.486    0.208      -2.34  1.95e- 2  -0.901   -0.0828       0.615
## 4 wealth_ca…  -0.614    0.220      -2.79  5.26e- 3  -1.06    -0.190        0.541
## 5 wealth_ca…  -1.05     0.246      -4.26  2.04e- 5  -1.55    -0.580        0.351
## 6 hw1         -0.0156   0.00437    -3.57  3.57e- 4  -0.0243  -0.00716      0.985
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>

# 3. For structural models
lapply(models, tidy_or)

## $stunted_struct
## # A tibble: 9 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep… -1.47      0.138     -10.7   1.67e-26  -1.75   -1.20          0.229
## 2 toilet_fa…  0.0299    0.00514     5.82  5.90e- 9   0.0200  0.0402        1.03 
## 3 water_sou… -0.00479   0.00295    -1.62  1.04e- 1  -0.0106  0.000930      0.995
## 4 postnatal… -0.0193    0.0741     -0.260 7.95e- 1  -0.167   0.125         0.981
## 5 cooking_f…  0.0519    0.0172      3.02  2.53e- 3   0.0182  0.0856        1.05 
## 6 distance_… -0.0949    0.114      -0.836 4.03e- 1  -0.318   0.128         0.909
## 7 wait_to_g…  0.129     0.109       1.18  2.38e- 1  -0.0852  0.343         1.14 
## 8 getting_p… -0.117     0.123      -0.945 3.45e- 1  -0.359   0.125         0.890
## 9 money_nee…  0.491     0.118       4.17  3.10e- 5   0.260   0.722         1.63 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $stunted_struct_age
## # A tibble: 10 × 10
##    term      estimate std.error statistic  p.value conf.low conf.high odds_ratio
##    <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Interce… -2.21      0.159     -13.8   1.40e-43 -2.52    -1.90          0.110
##  2 toilet_f…  0.0291    0.00529     5.51  3.68e- 8  0.0190   0.0397        1.03 
##  3 water_so… -0.00527   0.00302    -1.75  8.08e- 2 -0.0112   0.000585      0.995
##  4 postnata… -0.0169    0.0754     -0.224 8.23e- 1 -0.166    0.131         0.983
##  5 cooking_…  0.0543    0.0176      3.08  2.07e- 3  0.0197   0.0888        1.06 
##  6 distance… -0.110     0.116      -0.945 3.45e- 1 -0.337    0.118         0.896
##  7 wait_to_…  0.210     0.112       1.88  6.05e- 2 -0.00923  0.430         1.23 
##  8 getting_… -0.133     0.126      -1.05  2.92e- 1 -0.380    0.114         0.876
##  9 money_ne…  0.527     0.120       4.37  1.23e- 5  0.291    0.763         1.69 
## 10 hw1        0.0289    0.00263    11.0   4.98e-28  0.0237   0.0340        1.03 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $underweight_bin_struct
## # A tibble: 9 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep… -2.39      0.168     -14.2   1.40e-45 -2.72     -2.06        0.0920
## 2 toilet_fa…  0.0149    0.00497     3.00  2.69e- 3  0.00508   0.0247      1.02  
## 3 water_sou… -0.00210   0.00349    -0.601 5.48e- 1 -0.00908   0.00463     0.998 
## 4 postnatal…  0.0456    0.0809      0.564 5.73e- 1 -0.120     0.201       1.05  
## 5 cooking_f…  0.107     0.0203      5.27  1.37e- 7  0.0673    0.147       1.11  
## 6 distance_… -0.145     0.134      -1.09  2.78e- 1 -0.407     0.117       0.865 
## 7 wait_to_g…  0.139     0.131       1.06  2.87e- 1 -0.117     0.396       1.15  
## 8 getting_p…  0.279     0.140       2.00  4.59e- 2  0.00566   0.554       1.32  
## 9 money_nee…  0.225     0.137       1.65  9.98e- 2 -0.0443    0.492       1.25  
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $underweight_bin_struct_age
## # A tibble: 10 × 10
##    term      estimate std.error statistic  p.value conf.low conf.high odds_ratio
##    <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Interce… -2.75      0.186     -14.8   2.89e-49 -3.12     -2.39        0.0640
##  2 toilet_f…  0.0140    0.00499     2.80  5.15e- 3  0.00408   0.0237      1.01  
##  3 water_so… -0.00220   0.00351    -0.627 5.30e- 1 -0.00921   0.00456     0.998 
##  4 postnata…  0.0451    0.0804      0.561 5.75e- 1 -0.118     0.200       1.05  
##  5 cooking_…  0.108     0.0204      5.29  1.22e- 7  0.0680    0.148       1.11  
##  6 distance… -0.150     0.134      -1.12  2.63e- 1 -0.412     0.113       0.861 
##  7 wait_to_…  0.177     0.132       1.35  1.78e- 1 -0.0804    0.436       1.19  
##  8 getting_…  0.276     0.140       1.97  4.90e- 2  0.00177   0.552       1.32  
##  9 money_ne…  0.233     0.137       1.70  8.92e- 2 -0.0370    0.501       1.26  
## 10 hw1        0.0146    0.00294     4.98  6.40e- 7  0.00888   0.0204      1.01  
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $wasting_bin_struct
## # A tibble: 9 × 10
##   term       estimate std.error statistic  p.value conf.low conf.high odds_ratio
##   <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
## 1 (Intercep… -2.96      0.235     -12.6   2.25e-36 -3.43     -2.51        0.0517
## 2 toilet_fa… -0.00917   0.00830    -1.11  2.69e- 1 -0.0264    0.00616     0.991 
## 3 water_sou…  0.00583   0.00445     1.31  1.90e- 1 -0.00313   0.0143      1.01  
## 4 postnatal…  0.0627    0.112       0.557 5.77e- 1 -0.181     0.264       1.06  
## 5 cooking_f…  0.0452    0.0279      1.62  1.05e- 1 -0.00924   0.100       1.05  
## 6 distance_…  0.180     0.185       0.973 3.30e- 1 -0.180     0.546       1.20  
## 7 wait_to_g…  0.235     0.188       1.25  2.12e- 1 -0.132     0.606       1.26  
## 8 getting_p…  0.203     0.190       1.07  2.84e- 1 -0.167     0.577       1.23  
## 9 money_nee…  0.197     0.189       1.04  2.98e- 1 -0.177     0.564       1.22  
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $wasting_bin_struct_age
## # A tibble: 10 × 10
##    term      estimate std.error statistic  p.value conf.low conf.high odds_ratio
##    <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Interce… -2.65      0.251     -10.6   4.77e-26 -3.15     -2.17        0.0706
##  2 toilet_f… -0.00815   0.00835    -0.977 3.29e- 1 -0.0254    0.00727     0.992 
##  3 water_so…  0.00597   0.00446     1.34  1.80e- 1 -0.00300   0.0145      1.01  
##  4 postnata…  0.0626    0.116       0.541 5.89e- 1 -0.187     0.270       1.06  
##  5 cooking_…  0.0455    0.0280      1.63  1.04e- 1 -0.00912   0.101       1.05  
##  6 distance…  0.182     0.186       0.975 3.30e- 1 -0.181     0.550       1.20  
##  7 wait_to_…  0.201     0.189       1.06  2.87e- 1 -0.167     0.573       1.22  
##  8 getting_…  0.210     0.190       1.10  2.70e- 1 -0.161     0.586       1.23  
##  9 money_ne…  0.193     0.190       1.01  3.11e- 1 -0.183     0.562       1.21  
## 10 hw1       -0.0146    0.00438    -3.34  8.46e- 4 -0.0233   -0.00615     0.985 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>

Interpretation of Houshold only model:

Above code presents the odds ratios (ORs) from the household models examining the relationship between wealth status and child malnutrition. For stunting, children from wealthier households were significantly less likely to be stunted than those from the poorest households. Compared to the poorest, the odds of stunting decreased progressively across the wealth categories—poorer (OR = 0.51, 95% CI: 0.41–0.64), middle (0.42, 0.33–0.54), richer (0.32, 0.25–0.41), and richest (0.22, 0.17–0.29). After adjusting for child age, the protective effect of wealth persisted, and each additional month of age increased the odds of stunting slightly (OR = 1.03, 95% CI: 1.02–1.04). For underweight, the pattern was similar, with children from the richest households showing an 85% lower likelihood of being underweight compared to the poorest (OR = 0.15, 95% CI: 0.10–0.21). The effect of wealth remained significant after controlling for age, which also showed a modest positive association with underweight (OR = 1.01, 95% CI: 1.01–1.02). For wasting, only the middle, richer, and richest categories had significantly lower odds than the poorest (ORs = 0.62, 0.54, and 0.35, respectively). After adding child age, wasting risk declined with age (OR = 0.98, 95% CI: 0.98–0.99). Overall, wealthier households were strongly protected against chronic malnutrition (stunting and underweight), while effects on acute malnutrition (wasting) were weaker but still evident among higher wealth groups.

Interpretation of structural only model:

Above code summarizes odds ratios for environmental and healthcare-access predictors. For stunting, children living in households with improved toilet facilities (OR = 1.03, 95% CI: 1.02–1.04) and cleaner cooking fuel (OR = 1.05, 95% CI: 1.02–1.09) had slightly higher odds of stunting, possibly reflecting confounding with socioeconomic factors ( got correct in combine model below because wealth quantile is a confounding factor here but as we are russing model independtly that why this results occure). Financial difficulty in obtaining medical care significantly increased stunting risk (OR = 1.63, 95% CI: 1.30–2.06), and this effect persisted after age adjustment (OR = 1.69, 95% CI: 1.34–2.15). Child age was again positively associated with stunting (OR = 1.03, 95% CI: 1.02–1.03). For underweight, access to improved sanitation (OR = 1.02, 95% CI: 1.01–1.02) and use of clean fuels (OR = 1.11, 95% CI: 1.07–1.16) were linked to lower odds of underweight, whereas needing permission to seek care slightly increased risk (OR = 1.32, 95% CI: 1.01–1.74). After adjusting for age, these associations remained stable. For wasting, structural predictors were not statistically significant, and all odds ratios were close to unity, suggesting that acute malnutrition was less influenced by environmental or healthcare-access factors in this sample.

Combine model ( household + structural):

# List of structural factors
struct_vars <- c("toilet_facility", "water_source", "postnatal_care", 
                 "cooking_fuel", "distance_ti_health", "wait_to_go_alone", 
                 "getting_permission_to_go", "money_needed")

# List of household factors
household_vars <- c("wealth_cat")

# Function to create formula for combined model
create_combined_formula <- function(outcome, include_age = FALSE) {
  all_vars <- c(household_vars, struct_vars)
  if(include_age) {
    all_vars <- c(all_vars, "hw1")
  }
  as.formula(paste(outcome, "~", paste(all_vars, collapse = " + ")))
}

# Outcomes
outcomes <- c("stunted", "underweight_bin", "wasting_bin")

# Run combined models
combined_models <- list()
for(outcome in outcomes){
  combined_models[[paste0(outcome, "_combined")]] <- glm(
    create_combined_formula(outcome, FALSE), 
    family = binomial, data = full_clean
  )
  combined_models[[paste0(outcome, "_combined_age")]] <- glm(
    create_combined_formula(outcome, TRUE), 
    family = binomial, data = full_clean
  )
}

# Check summary
lapply(combined_models, summary)

## $stunted_combined
## 
## Call:
## glm(formula = create_combined_formula(outcome, FALSE), family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               0.027331   0.252333   0.108 0.913747    
## wealth_catpoorer         -0.576659   0.126903  -4.544 5.52e-06 ***
## wealth_catmiddle         -0.753248   0.152258  -4.947 7.53e-07 ***
## wealth_catricher         -1.029594   0.175489  -5.867 4.44e-09 ***
## wealth_catrichest        -1.358477   0.193856  -7.008 2.42e-12 ***
## toilet_facility           0.013902   0.005359   2.594 0.009488 ** 
## water_source             -0.005025   0.003047  -1.649 0.099158 .  
## postnatal_care           -0.008726   0.073687  -0.118 0.905739    
## cooking_fuel             -0.029068   0.021491  -1.353 0.176193    
## distance_ti_health       -0.172321   0.115916  -1.487 0.137120    
## wait_to_go_alone          0.129734   0.110462   1.174 0.240207    
## getting_permission_to_go -0.119481   0.124299  -0.961 0.336432    
## money_needed              0.400615   0.119625   3.349 0.000811 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3519.9  on 2708  degrees of freedom
## Residual deviance: 3346.4  on 2696  degrees of freedom
## AIC: 3372.4
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## $stunted_combined_age
## 
## Call:
## glm(formula = create_combined_formula(outcome, TRUE), family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -0.608540   0.264708  -2.299 0.021510 *  
## wealth_catpoorer         -0.649408   0.130652  -4.971 6.68e-07 ***
## wealth_catmiddle         -0.808851   0.156404  -5.172 2.32e-07 ***
## wealth_catricher         -1.145143   0.180528  -6.343 2.25e-10 ***
## wealth_catrichest        -1.441170   0.198589  -7.257 3.96e-13 ***
## toilet_facility           0.011671   0.005482   2.129 0.033261 *  
## water_source             -0.005732   0.003125  -1.834 0.066663 .  
## postnatal_care           -0.007086   0.075848  -0.093 0.925570    
## cooking_fuel             -0.031134   0.022021  -1.414 0.157405    
## distance_ti_health       -0.191844   0.118582  -1.618 0.105702    
## wait_to_go_alone          0.211544   0.113546   1.863 0.062452 .  
## getting_permission_to_go -0.136124   0.127347  -1.069 0.285106    
## money_needed              0.432594   0.122558   3.530 0.000416 ***
## hw1                       0.029888   0.002671  11.190  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3519.9  on 2708  degrees of freedom
## Residual deviance: 3215.9  on 2695  degrees of freedom
## AIC: 3243.9
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## $underweight_bin_combined
## 
## Call:
## glm(formula = create_combined_formula(outcome, FALSE), family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -0.450846   0.291582  -1.546   0.1221    
## wealth_catpoorer         -0.587293   0.137867  -4.260 2.05e-05 ***
## wealth_catmiddle         -1.113333   0.176938  -6.292 3.13e-10 ***
## wealth_catricher         -1.187482   0.202997  -5.850 4.92e-09 ***
## wealth_catrichest        -1.950846   0.245037  -7.961 1.70e-15 ***
## toilet_facility          -0.002845   0.005856  -0.486   0.6271    
## water_source             -0.002459   0.003678  -0.668   0.5038    
## postnatal_care            0.064758   0.080971   0.800   0.4239    
## cooking_fuel             -0.011219   0.025569  -0.439   0.6608    
## distance_ti_health       -0.255728   0.136730  -1.870   0.0614 .  
## wait_to_go_alone          0.141789   0.132915   1.067   0.2861    
## getting_permission_to_go  0.282221   0.141480   1.995   0.0461 *  
## money_needed              0.102038   0.139222   0.733   0.4636    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2782.3  on 2708  degrees of freedom
## Residual deviance: 2617.7  on 2696  degrees of freedom
## AIC: 2643.7
## 
## Number of Fisher Scoring iterations: 5
## 
## 
## $underweight_bin_combined_age
## 
## Call:
## glm(formula = create_combined_formula(outcome, TRUE), family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -0.777528   0.301281  -2.581  0.00986 ** 
## wealth_catpoorer         -0.616734   0.139018  -4.436 9.15e-06 ***
## wealth_catmiddle         -1.134373   0.178404  -6.358 2.04e-10 ***
## wealth_catricher         -1.233722   0.204802  -6.024 1.70e-09 ***
## wealth_catrichest        -1.977885   0.246345  -8.029 9.83e-16 ***
## toilet_facility          -0.004429   0.005913  -0.749  0.45380    
## water_source             -0.002727   0.003699  -0.737  0.46089    
## postnatal_care            0.063964   0.080979   0.790  0.42959    
## cooking_fuel             -0.012069   0.025721  -0.469  0.63891    
## distance_ti_health       -0.258915   0.137016  -1.890  0.05880 .  
## wait_to_go_alone          0.175779   0.133686   1.315  0.18856    
## getting_permission_to_go  0.282135   0.142101   1.985  0.04709 *  
## money_needed              0.106493   0.139627   0.763  0.44565    
## hw1                       0.015253   0.002989   5.102 3.36e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2782.3  on 2708  degrees of freedom
## Residual deviance: 2591.8  on 2695  degrees of freedom
## AIC: 2619.8
## 
## Number of Fisher Scoring iterations: 5
## 
## 
## $wasting_bin_combined
## 
## Call:
## glm(formula = create_combined_formula(outcome, FALSE), family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -1.537318   0.408726  -3.761 0.000169 ***
## wealth_catpoorer         -0.128137   0.192326  -0.666 0.505254    
## wealth_catmiddle         -0.774785   0.249877  -3.101 0.001931 ** 
## wealth_catricher         -0.954514   0.287137  -3.324 0.000887 ***
## wealth_catrichest        -1.418415   0.328054  -4.324 1.53e-05 ***
## toilet_facility          -0.022335   0.009885  -2.260 0.023848 *  
## water_source              0.006527   0.004611   1.416 0.156915    
## postnatal_care            0.087126   0.110835   0.786 0.431816    
## cooking_fuel             -0.055517   0.034273  -1.620 0.105268    
## distance_ti_health        0.066911   0.187441   0.357 0.721112    
## wait_to_go_alone          0.240598   0.188891   1.274 0.202754    
## getting_permission_to_go  0.206353   0.190928   1.081 0.279792    
## money_needed              0.075113   0.190894   0.393 0.693964    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1690.4  on 2708  degrees of freedom
## Residual deviance: 1635.7  on 2696  degrees of freedom
## AIC: 1661.7
## 
## Number of Fisher Scoring iterations: 5
## 
## 
## $wasting_bin_combined_age
## 
## Call:
## glm(formula = create_combined_formula(outcome, TRUE), family = binomial, 
##     data = full_clean)
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -1.243023   0.416243  -2.986 0.002824 ** 
## wealth_catpoorer         -0.113244   0.192427  -0.589 0.556195    
## wealth_catmiddle         -0.767300   0.249575  -3.074 0.002109 ** 
## wealth_catricher         -0.933540   0.287107  -3.252 0.001148 ** 
## wealth_catrichest        -1.415741   0.328435  -4.311 1.63e-05 ***
## toilet_facility          -0.021014   0.009901  -2.122 0.033798 *  
## water_source              0.006779   0.004623   1.466 0.142541    
## postnatal_care            0.087624   0.114331   0.766 0.443435    
## cooking_fuel             -0.054607   0.034302  -1.592 0.111403    
## distance_ti_health        0.063642   0.188856   0.337 0.736126    
## wait_to_go_alone          0.210495   0.189692   1.110 0.267143    
## getting_permission_to_go  0.210957   0.191899   1.099 0.271632    
## money_needed              0.072131   0.192070   0.376 0.707253    
## hw1                      -0.014743   0.004405  -3.347 0.000817 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1690.4  on 2708  degrees of freedom
## Residual deviance: 1624.0  on 2695  degrees of freedom
## AIC: 1652
## 
## Number of Fisher Scoring iterations: 5

# Odds ratios with 95% CI
tidy_or <- function(model) {
  broom::tidy(model, conf.int = TRUE) %>%
    dplyr::mutate(
      odds_ratio = exp(estimate),
      conf.low.or = exp(conf.low),
      conf.high.or = exp(conf.high)
    )
}

lapply(combined_models, tidy_or)

## $stunted_combined
## # A tibble: 13 × 10
##    term      estimate std.error statistic  p.value conf.low conf.high odds_ratio
##    <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Interce…  0.0273    0.252       0.108 9.14e- 1 -0.468    0.522         1.03 
##  2 wealth_c… -0.577     0.127      -4.54  5.52e- 6 -0.826   -0.328         0.562
##  3 wealth_c… -0.753     0.152      -4.95  7.53e- 7 -1.05    -0.456         0.471
##  4 wealth_c… -1.03      0.175      -5.87  4.44e- 9 -1.38    -0.687         0.357
##  5 wealth_c… -1.36      0.194      -7.01  2.42e-12 -1.74    -0.980         0.257
##  6 toilet_f…  0.0139    0.00536     2.59  9.49e- 3  0.00353  0.0246        1.01 
##  7 water_so… -0.00503   0.00305    -1.65  9.92e- 2 -0.0111   0.000885      0.995
##  8 postnata… -0.00873   0.0737     -0.118 9.06e- 1 -0.155    0.135         0.991
##  9 cooking_… -0.0291    0.0215     -1.35  1.76e- 1 -0.0713   0.0130        0.971
## 10 distance… -0.172     0.116      -1.49  1.37e- 1 -0.400    0.0547        0.842
## 11 wait_to_…  0.130     0.110       1.17  2.40e- 1 -0.0868   0.346         1.14 
## 12 getting_… -0.119     0.124      -0.961 3.36e- 1 -0.364    0.124         0.887
## 13 money_ne…  0.401     0.120       3.35  8.11e- 4  0.166    0.635         1.49 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $stunted_combined_age
## # A tibble: 14 × 10
##    term      estimate std.error statistic  p.value conf.low conf.high odds_ratio
##    <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Interce… -0.609     0.265     -2.30   2.15e- 2 -1.13    -0.0907        0.544
##  2 wealth_c… -0.649     0.131     -4.97   6.68e- 7 -0.906   -0.394         0.522
##  3 wealth_c… -0.809     0.156     -5.17   2.32e- 7 -1.12    -0.503         0.445
##  4 wealth_c… -1.15      0.181     -6.34   2.25e-10 -1.50    -0.793         0.318
##  5 wealth_c… -1.44      0.199     -7.26   3.96e-13 -1.83    -1.05          0.237
##  6 toilet_f…  0.0117    0.00548    2.13   3.33e- 2  0.00106  0.0226        1.01 
##  7 water_so… -0.00573   0.00313   -1.83   6.67e- 2 -0.0119   0.000330      0.994
##  8 postnata… -0.00709   0.0758    -0.0934 9.26e- 1 -0.156    0.143         0.993
##  9 cooking_… -0.0311    0.0220    -1.41   1.57e- 1 -0.0744   0.0120        0.969
## 10 distance… -0.192     0.119     -1.62   1.06e- 1 -0.425    0.0403        0.825
## 11 wait_to_…  0.212     0.114      1.86   6.25e- 2 -0.0109   0.434         1.24 
## 12 getting_… -0.136     0.127     -1.07   2.85e- 1 -0.386    0.113         0.873
## 13 money_ne…  0.433     0.123      3.53   4.16e- 4  0.192    0.673         1.54 
## 14 hw1        0.0299    0.00267   11.2    4.57e-29  0.0247   0.0352        1.03 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $underweight_bin_combined
## # A tibble: 13 × 10
##    term      estimate std.error statistic  p.value conf.low conf.high odds_ratio
##    <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Interce… -0.451     0.292      -1.55  1.22e- 1 -1.02      0.121        0.637
##  2 wealth_c… -0.587     0.138      -4.26  2.05e- 5 -0.859    -0.318        0.556
##  3 wealth_c… -1.11      0.177      -6.29  3.13e-10 -1.46     -0.770        0.328
##  4 wealth_c… -1.19      0.203      -5.85  4.92e- 9 -1.59     -0.793        0.305
##  5 wealth_c… -1.95      0.245      -7.96  1.70e-15 -2.44     -1.48         0.142
##  6 toilet_f… -0.00284   0.00586    -0.486 6.27e- 1 -0.0147    0.00835      0.997
##  7 water_so… -0.00246   0.00368    -0.668 5.04e- 1 -0.00979   0.00464      0.998
##  8 postnata…  0.0648    0.0810      0.800 4.24e- 1 -0.101     0.221        1.07 
##  9 cooking_… -0.0112    0.0256     -0.439 6.61e- 1 -0.0613    0.0389       0.989
## 10 distance… -0.256     0.137      -1.87  6.14e- 2 -0.524     0.0125       0.774
## 11 wait_to_…  0.142     0.133       1.07  2.86e- 1 -0.118     0.403        1.15 
## 12 getting_…  0.282     0.141       1.99  4.61e- 2  0.00555   0.560        1.33 
## 13 money_ne…  0.102     0.139       0.733 4.64e- 1 -0.172     0.374        1.11 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $underweight_bin_combined_age
## # A tibble: 14 × 10
##    term      estimate std.error statistic  p.value conf.low conf.high odds_ratio
##    <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Interce… -0.778     0.301      -2.58  9.86e- 3 -1.37     -0.187        0.460
##  2 wealth_c… -0.617     0.139      -4.44  9.15e- 6 -0.890    -0.345        0.540
##  3 wealth_c… -1.13      0.178      -6.36  2.04e-10 -1.49     -0.788        0.322
##  4 wealth_c… -1.23      0.205      -6.02  1.70e- 9 -1.64     -0.836        0.291
##  5 wealth_c… -1.98      0.246      -8.03  9.83e-16 -2.47     -1.50         0.138
##  6 toilet_f… -0.00443   0.00591    -0.749 4.54e- 1 -0.0164    0.00687      0.996
##  7 water_so… -0.00273   0.00370    -0.737 4.61e- 1 -0.0101    0.00441      0.997
##  8 postnata…  0.0640    0.0810      0.790 4.30e- 1 -0.100     0.221        1.07 
##  9 cooking_… -0.0121    0.0257     -0.469 6.39e- 1 -0.0625    0.0384       0.988
## 10 distance… -0.259     0.137      -1.89  5.88e- 2 -0.528     0.00981      0.772
## 11 wait_to_…  0.176     0.134       1.31  1.89e- 1 -0.0859    0.438        1.19 
## 12 getting_…  0.282     0.142       1.99  4.71e- 2  0.00422   0.562        1.33 
## 13 money_ne…  0.106     0.140       0.763 4.46e- 1 -0.169     0.379        1.11 
## 14 hw1        0.0153    0.00299     5.10  3.36e- 7  0.00939   0.0211       1.02 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $wasting_bin_combined
## # A tibble: 13 × 10
##    term       estimate std.error statistic p.value conf.low conf.high odds_ratio
##    <chr>         <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Intercep… -1.54      0.409      -3.76  1.69e-4 -2.34     -0.738        0.215
##  2 wealth_ca… -0.128     0.192      -0.666 5.05e-1 -0.505     0.250        0.880
##  3 wealth_ca… -0.775     0.250      -3.10  1.93e-3 -1.27     -0.290        0.461
##  4 wealth_ca… -0.955     0.287      -3.32  8.87e-4 -1.52     -0.397        0.385
##  5 wealth_ca… -1.42      0.328      -4.32  1.53e-5 -2.07     -0.783        0.242
##  6 toilet_fa… -0.0223    0.00988    -2.26  2.38e-2 -0.0426   -0.00402      0.978
##  7 water_sou…  0.00653   0.00461     1.42  1.57e-1 -0.00274   0.0154       1.01 
##  8 postnatal…  0.0871    0.111       0.786 4.32e-1 -0.153     0.287        1.09 
##  9 cooking_f… -0.0555    0.0343     -1.62  1.05e-1 -0.123     0.0119       0.946
## 10 distance_…  0.0669    0.187       0.357 7.21e-1 -0.298     0.437        1.07 
## 11 wait_to_g…  0.241     0.189       1.27  2.03e-1 -0.128     0.614        1.27 
## 12 getting_p…  0.206     0.191       1.08  2.80e-1 -0.166     0.583        1.23 
## 13 money_nee…  0.0751    0.191       0.393 6.94e-1 -0.302     0.446        1.08 
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>
## 
## $wasting_bin_combined_age
## # A tibble: 14 × 10
##    term       estimate std.error statistic p.value conf.low conf.high odds_ratio
##    <chr>         <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>      <dbl>
##  1 (Intercep… -1.24      0.416      -2.99  2.82e-3 -2.06     -0.429        0.289
##  2 wealth_ca… -0.113     0.192      -0.589 5.56e-1 -0.490     0.265        0.893
##  3 wealth_ca… -0.767     0.250      -3.07  2.11e-3 -1.26     -0.283        0.464
##  4 wealth_ca… -0.934     0.287      -3.25  1.15e-3 -1.50     -0.377        0.393
##  5 wealth_ca… -1.42      0.328      -4.31  1.63e-5 -2.07     -0.780        0.243
##  6 toilet_fa… -0.0210    0.00990    -2.12  3.38e-2 -0.0413   -0.00266      0.979
##  7 water_sou…  0.00678   0.00462     1.47  1.43e-1 -0.00251   0.0156       1.01 
##  8 postnatal…  0.0876    0.114       0.766 4.43e-1 -0.159     0.293        1.09 
##  9 cooking_f… -0.0546    0.0343     -1.59  1.11e-1 -0.122     0.0128       0.947
## 10 distance_…  0.0636    0.189       0.337 7.36e-1 -0.304     0.437        1.07 
## 11 wait_to_g…  0.210     0.190       1.11  2.67e-1 -0.160     0.585        1.23 
## 12 getting_p…  0.211     0.192       1.10  2.72e-1 -0.163     0.590        1.23 
## 13 money_nee…  0.0721    0.192       0.376 7.07e-1 -0.308     0.446        1.07 
## 14 hw1        -0.0147    0.00441    -3.35  8.17e-4 -0.0235   -0.00622      0.985
## # ℹ 2 more variables: conf.low.or <dbl>, conf.high.or <dbl>

# Pseudo R² to see model fit improvement
library(pscl)
lapply(combined_models, pscl::pR2)

## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2
## fitting null model for pseudo-r2

## $stunted_combined
##           llh       llhNull            G2      McFadden          r2ML 
## -1.673180e+03 -1.759952e+03  1.735440e+02  4.930362e-02  6.205317e-02 
##          r2CU 
##  8.532138e-02 
## 
## $stunted_combined_age
##           llh       llhNull            G2      McFadden          r2ML 
## -1.607927e+03 -1.759952e+03  3.040502e+02  8.638026e-02  1.061677e-01 
##          r2CU 
##  1.459776e-01 
## 
## $underweight_bin_combined
##           llh       llhNull            G2      McFadden          r2ML 
## -1.308864e+03 -1.391129e+03  1.645292e+02  5.913517e-02  5.892676e-02 
##          r2CU 
##  9.179545e-02 
## 
## $underweight_bin_combined_age
##           llh       llhNull            G2      McFadden          r2ML 
## -1.295894e+03 -1.391129e+03  1.904696e+02  6.845865e-02  6.789510e-02 
##          r2CU 
##  1.057662e-01 
## 
## $wasting_bin_combined
##           llh       llhNull            G2      McFadden          r2ML 
## -817.84826069 -845.18615278   54.67578418    0.03234541    0.01998070 
##          r2CU 
##    0.04304382 
## 
## $wasting_bin_combined_age
##           llh       llhNull            G2      McFadden          r2ML 
## -811.98966492 -845.18615278   66.39297571    0.03927713    0.02421041 
##          r2CU 
##    0.05215575

# Compare AIC/BIC for models with and without age
AIC(combined_models$stunted_combined, combined_models$stunted_combined_age)

##                                      df      AIC
## combined_models$stunted_combined     13 3372.360
## combined_models$stunted_combined_age 14 3243.854

BIC(combined_models$stunted_combined, combined_models$stunted_combined_age)

##                                      df      BIC
## combined_models$stunted_combined     13 3449.116
## combined_models$stunted_combined_age 14 3326.514

Interpretation of combined model:

Across the combined logistic regression models, wealth status consistently showed a strong inverse association with all three malnutrition outcomes—stunting, underweight, and wasting. Children from poorer, middle, richer, and richest households had significantly lower odds of being malnourished compared to those from the poorest households, with odds ratios decreasing progressively with higher wealth categories (e.g., OR = 0.56 to 0.26 for stunting). This pattern indicates that improved household wealth substantially reduces the likelihood of malnutrition. Additionally, limited access to resources, such as difficulties obtaining money for healthcare, increased the odds of stunting (OR = 1.49, 95% CI 1.18–1.89), highlighting economic barriers as a key risk factor. Toilet facilities also showed a small but significant protective effect for wasting (OR = 0.98, 95% CI 0.96–1.00), emphasizing the role of sanitation in child health. When child age (hw1) was added to the models, it became a strong predictor of nutritional outcomes. Each unit increase in age was associated with higher odds of stunting (OR = 1.03, 95% CI 1.02–1.04) and underweight (OR = 1.02, 95% CI 1.01–1.02) but lower odds of wasting (OR = 0.98, 95% CI 0.96–1.00), suggesting that while older children are more prone to chronic forms of malnutrition, acute malnutrition is more common among younger children. Overall, the models demonstrate that socioeconomic status, sanitation, and child age are key determinants of child nutritional outcomes in this population.

Conclusion:

In simple terms, the results show that a family’s wealth has the strongest effect on a child’s nutrition — children from richer households are much less likely to be stunted, underweight, or wasted compared to those from poorer families. Financial problems, especially difficulty paying for medical care, also increase the risk of poor nutrition. Better sanitation and cleaner cooking fuels help protect children, but their effects are smaller and sometimes not statistically significant after accounting for wealth. Age also matters: as children grow older, long-term nutrition problems like stunting become more common, while short-term issues like wasting are more frequent among younger children. Overall, poverty and barriers to healthcare are the most important challenges for improving child nutrition.

Comparison of Logistic Regression and Cramér’s V Findings:

The logistic regression results largely support the associations observed in the Cramér’s V analysis. Both approaches identified household wealth and toilet facility as the most consistent predictors of child malnutrition. In the Cramér’s V results, these variables showed moderate associations with stunting and underweight, and the regression models confirmed that children from wealthier households had substantially lower odds of being stunted, underweight, or wasted. Similarly, improved sanitation showed a small but significant protective effect, aligning with its moderate Cramér’s V values. Other environmental and access variables—such as cooking fuel, distance to healthcare, and permission to seek care—displayed weak associations in both methods, indicating limited influence after accounting for wealth. The weaker or nonsignificant effects for wasting in both analyses also match, suggesting that acute malnutrition is less dependent on socioeconomic or environmental conditions. Overall, the regression analysis reinforces the Cramér’s V findings, showing that socioeconomic status and sanitation are key but moderately strong determinants of child nutritional outcomes

Descriptive Analysis and Analytical Report of PDHS data

Ameen Talha

2025-10-02

1.Introduction:

2.Setup and data loading:

2.1 Load Libraries:

2.2 Load Datasets:

3 Select variables:

3.1 Linked check:

Interpretation:

4 Merge datasets All:

4.1 Descriptive Statistics of Children’s Age:

Interpretation:

4.2 Histogram of Children’s Age:

Interpretation:

4.3 Check for kids without a household match:

Interpretation:

5 Data Cleaning and Variable Preparation:

6 Nutrition outcomes by age group:

6.1 Boxplots by age group.

Interpretation:

6.2 line chart by age group:

Interpretation:

7 Nutritional Outcomes by Wealth Category (Heatmaps):

Interpretation:

8 Missingness Analysis:

9 Reason for Messingness:

Interpretation:

10 Visual Missingness Check (naniar plots):

Interpretation:

11 Structural Variables and Correlations

Interpretation:

Overall correlation:

Correlation by Wealth Group:

12 Summary Barrier Index:

Interpretation:

13 Any Barrier Index:

Interpretation:

Chi-Square Analysis of Categorical Variables

14. Chi-Square Analysis:

14.1 Prepare categorical variables:

14.2 CHI-SQUARE TESTS BETWEEN NUTRITION OUTCOMES AND ENVIRONMENTAL/HOUSEHOLD FACTORS:

Chi square Interpretation:

Cram v Interpretation:

Missing values removal:

Logistic regression:

Household only:

structural only:

Odd ratios of both Household and Structural models:

Interpretation of Houshold only model:

Interpretation of structural only model:

Combine model ( household + structural):

Interpretation of combined model:

Conclusion:

Comparison of Logistic Regression and Cramér’s V Findings: