What are the factors associated with height-for-age Z-score (HAZ) among children aged 6 to 59 months?
To examine factors associated with height-for-age Z-score (HAZ) among children aged 6 to 59 months.
N= 500 Population: Children aged 6 to 59 months. Outcome: Height-for-age Z-score (HAZ) Predictors: - Monthly household income (RM) - Dietary diversity (food group count) - Residence (Urban/Rural) - Maternal education (Low/Secondary/Tertiary)
library(tidyverse)
library(readxl)
library(broom)
library(gtsummary)
library(dplyr)
library(corrplot)
library(knitr)
library(MASS)
library(broom.helpers)
data.child <- read_excel('child_nutrition.xlsx')
glimpse(data.child)
## Rows: 500
## Columns: 7
## $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ age_months <dbl> 9, 55, 12, 8, 41, 9, 27, 50, 6, 55, 44, 6, 34, 8, 1…
## $ household_income <dbl> 1697, 3487, 4519, 2256, 5647, 3869, 1495, 3288, 874…
## $ dietary_diversity <dbl> 4.7, 2.5, 4.9, 3.5, 7.4, 3.6, 3.5, 4.1, 3.6, 3.4, 4…
## $ residence <dbl> 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
## $ maternal_education <chr> "Low", "Secondary", "Tertiary", "Low", "Low", "Low"…
## $ haz <dbl> 1.46, -0.31, 2.02, -0.61, 1.54, 0.94, -0.71, -1.80,…
Convert residence and maternal education to factors with ordered levels
data.child <- data.child |>
mutate (
residence = factor(
residence,
levels = c(0, 1),
labels = c("Rural", "Urban")
))
data.child <- data.child |>
mutate(
maternal_education = factor(maternal_education, levels = c("Low", "Secondary", "Tertiary")))
glimpse(data.child)
## Rows: 500
## Columns: 7
## $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ age_months <dbl> 9, 55, 12, 8, 41, 9, 27, 50, 6, 55, 44, 6, 34, 8, 1…
## $ household_income <dbl> 1697, 3487, 4519, 2256, 5647, 3869, 1495, 3288, 874…
## $ dietary_diversity <dbl> 4.7, 2.5, 4.9, 3.5, 7.4, 3.6, 3.5, 4.1, 3.6, 3.4, 4…
## $ residence <fct> Urban, Urban, Urban, Rural, Rural, Rural, Urban, Ur…
## $ maternal_education <fct> Low, Secondary, Tertiary, Low, Low, Low, Secondary,…
## $ haz <dbl> 1.46, -0.31, 2.02, -0.61, 1.54, 0.94, -0.71, -1.80,…
summary(data.child)
## id age_months household_income dietary_diversity residence
## Min. : 1.0 Min. : 6.00 Min. : 500 Min. :1.500 Rural:188
## 1st Qu.:125.8 1st Qu.:18.00 1st Qu.:2131 1st Qu.:3.800 Urban:312
## Median :250.5 Median :32.00 Median :2968 Median :4.600
## Mean :250.5 Mean :32.01 Mean :2977 Mean :4.564
## 3rd Qu.:375.2 3rd Qu.:45.00 3rd Qu.:3773 3rd Qu.:5.300
## Max. :500.0 Max. :59.00 Max. :6776 Max. :8.200
## maternal_education haz
## Low :212 Min. :-2.4600
## Secondary:191 1st Qu.:-0.2700
## Tertiary : 97 Median : 0.4650
## Mean : 0.4793
## 3rd Qu.: 1.1950
## Max. : 4.4600
data.child |>
dplyr::select (-id) |>
gtsummary::tbl_summary(
label = list(age_months ~ 'Age (months)',
household_income ~ 'Household Income (RM)',
dietary_diversity ~ 'Dietary Diversity (Food Group)',
residence ~ 'Residence',
maternal_education ~ 'Maternal Education',
haz ~ 'Height-for-age (Z-score)'),
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)")
)
| Characteristic | N = 5001 |
|---|---|
| Age (months) | 32 (16) |
| Household Income (RM) | 2,977 (1,162) |
| Dietary Diversity (Food Group) | 4.56 (1.15) |
| Residence | |
| Rural | 188 (38%) |
| Urban | 312 (62%) |
| Maternal Education | |
| Low | 212 (42%) |
| Secondary | 191 (38%) |
| Tertiary | 97 (19%) |
| Height-for-age (Z-score) | 0.48 (1.11) |
| 1 Mean (SD); n (%) | |
Histogram for continuous variables
Histograms were used to assess the distribution and identify skewness and outliers in continuous variables.
ggplot(data.child, aes(age_months)) +
geom_histogram()
Interpretation:
Age demonstrated a broad distribution (adequate variability) with no evidence of extreme outliers.
ggplot(data.child, aes(household_income)) +
geom_histogram()
Interpretation:
Household income exhibited a positively skewed distribution, with most households in the lower to middle income range and a small number of high-income households.
ggplot(data.child, aes(dietary_diversity)) +
geom_histogram()
Interpretation:
Dietary diversity scores were approximately normally distributed, with most children consuming foods from four to six food groups.
Boxplots for categorical variables
Boxplots were used to compare the distribution of outcome across categorical variables.
ggplot(data.child, aes(x = residence, y = haz)) +
geom_boxplot(fill = "lightblue") +
geom_hline(yintercept = 0, color = "red")
Interpretation:
HAZ differed by place of residence. Urban children had a higher median HAZ compared with rural children, although greater variability was observed among urban children.
ggplot(data.child, aes(x = maternal_education, y = haz)) +
geom_boxplot(fill = "lightblue") +
geom_hline(yintercept = 0, color = "red")
Interpretation:
HAZ increased with higher maternal education levels. Children of mothers with tertiary education had the highest median HAZ, followed by those with secondary education, while the lowest median HAZ was observed among children of mothers with low education.
Scatter plots between variables
Scatter plots were used to visually assess the linearity of relationships between continuous predictors and the outcome.
data.child2 <- data.child |>
dplyr::select (age_months, household_income, dietary_diversity, haz)
ggplot(data.child2, aes(x = age_months, y = haz)) +
geom_point() +
geom_smooth() +
labs(title = "Height-for-age vs Age",
x = "Age (months)",
y = "Height-for-age (Z-score)") +
theme_minimal()
Interpretation:
HAZ showed a gradual decline with increasing age, indicating cumulative growth faltering over time. Although the association was relatively weak, the overall pattern supports inclusion of age as a covariate in multivariable models.
ggplot(data.child2, aes(x = household_income, y = haz)) +
geom_point() +
geom_smooth() +
labs(title = "Height-for-age vs Household Income",
x = "Household Income (RM)",
y = "Height-for-age (Z-score)") +
theme_minimal()
Interpretation:
HAZ increased with higher household income, indicating better linear growth among children from higher-income households. The relationship appeared approximately linear, supporting the use of linear regression analysis.
ggplot(data.child2, aes(x = dietary_diversity, y = haz)) +
geom_point() +
geom_smooth() +
labs(title = "Height-for-age vs Dietary Diversity",
x = "Dietary Diversity (food group count)",
y = "Height-for-age (Z-score)") +
theme_minimal()
Interpretation:
HAZ increased with greater dietary diversity, indicating improved linear growth among children consuming a more varied diet. The relationship appeared approximately linear, supporting inclusion of dietary diversity as a continuous variable in regression models.
Correlation matrix
Correlation matrix was used to examine bivariate relationships and assess potential multicollinearity. This steps serves as an initial screening, as multicollinearity will be formally assessed using variance inflation factors during model diagnostics.
data.child2 <- data.child |>
dplyr::select (age_months, household_income, dietary_diversity, haz)
cor.data <-
cor(data.child2,
use = "complete.obs",
method = "pearson")
head(round(cor.data,2))
## age_months household_income dietary_diversity haz
## age_months 1.00 0.02 0.01 -0.18
## household_income 0.02 1.00 0.42 0.56
## dietary_diversity 0.01 0.42 1.00 0.52
## haz -0.18 0.56 0.52 1.00
HAZ showed a weak negative correlation with child age (r = -0.18) and moderate positive correlations with both household income (r = 0.56) and dietary diversity (r = 0.52). Household income was also moderately correlated with dietary diversity (r = 0.42), supporting its role as an upstream determinant of child diet quality. No strong correlations were observed between child age and other variables, indicating minimal collinearity. This preliminary correlation support the inclusion of all variables in subsequent multivariable modelling.
This is the correlogram:
cor_matrix <- cor(data.child2 |> select_if(is.numeric), use = "complete.obs")
corrplot(cor_matrix,
type = "upper",
method = "circle",
addCoef.col = "black",
tl.col = "black",
tl.srt = 45)
Univariable linear regression analyses were conducted to examine the association between each independent variable and HAZ. Age in months, household income, dietary diversity, residence and maternal education were analysed individually.
slr.age <- lm(haz ~ age_months, data = data.child)
tidy(slr.age, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 2 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) 0.891 0.110 8.12 <0.001 0.676 1.11
## 2 age_months -0.0129 0.00307 -4.19 <0.001 -0.0189 -0.00684
Interpretation:
Child age was significantly associated with HAZ. Each additional month of age was associated with a 0.013-unit decrease in HAZ (p < 0.001, 95% CI: −0.019, −0.007), indicating progressive growth faltering with increasing age.
slr.income <- lm(haz ~ household_income, data = data.child)
tidy(slr.income, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 2 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -1.12 0.114 -9.86 <0.001 -1.35 -0.898
## 2 household_income 0.000538 0.0000356 15.1 <0.001 0.000468 0.000608
Interpretation:
Household income was positively associated with HAZ. Each RM 100 increase in monthly household income was associated with a 0.054-unit increase in HAZ (p < 0.001, 95% CI: 0.047, 0.061), indicating better linear growth among children from higher-income households.
slr.diet <- lm(haz ~ dietary_diversity, data = data.child)
tidy(slr.diet, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 2 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -1.84 0.174 -10.6 <0.001 -2.19 -1.50
## 2 dietary_diversity 0.509 0.0370 13.8 <0.001 0.436 0.582
Interpretation:
Dietary diversity was positively associated with HAZ. Each additional food group consumed was associated with a 0.51-unit increase in HAZ (p < 0.001, 95% CI: 0.44, 0.58).
slr.residence <- lm(haz ~ residence, data = data.child)
tidy(slr.residence, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 2 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) 0.0432 0.0775 0.557 0.58 -0.109 0.196
## 2 residenceUrban 0.699 0.0982 7.12 <0.001 0.506 0.892
Interpretation:
HAZ differed significantly by place of residence. Urban children had a 0.70-unit higher mean HAZ compared with rural children (p < 0.001, 95% CI: 0.51, 0.89).
slr.mom <- lm(haz ~ maternal_education, data = data.child)
tidy(slr.mom, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 3 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) 0.142 0.0737 1.93 0.054 -0.00227 0.287
## 2 maternal_educationSec… 0.475 0.107 4.44 <0.001 0.265 0.685
## 3 maternal_educationTer… 0.801 0.131 6.09 <0.001 0.543 1.06
Interpretation:
Maternal education was significantly associated with HAZ. Compared with children of mothers with low education, children of mothers with secondary education had a 0.47-unit higher mean HAZ (p< 0.001, 95% CI: 0.26, 0.69 ), while those whose mothers had tertiary education had a 0.80-unit higher mean HAZ (p < 0.001, 95% CI: 0.54, 1.06).
tbl_univ <-
data.child |>
dplyr::select(-id
) |>
gtsummary::tbl_uvregression(
method = lm,
y = haz,
label = list(age_months ~ 'Age (months)',
household_income ~ 'Household Income (RM)',
dietary_diversity ~ 'Dietary Diversity (Food Group)',
residence ~ 'Residence',
maternal_education ~ 'Maternal Education')
) |>
modify_fmt_fun(
estimate ~ function(x) style_number(x, digits = 6),
conf.low ~ function(x) style_number(x, digits = 6),
conf.high ~ function(x) style_number(x, digits = 6)
) |>
gtsummary::bold_p()
tbl_univ
| Characteristic | N | Beta | 95% CI | p-value |
|---|---|---|---|---|
| Age (months) | 500 | -0.012875 | -0.018906, -0.006844 | <0.001 |
| Household Income (RM) | 500 | 0.000538 | 0.000468, 0.000608 | <0.001 |
| Dietary Diversity (Food Group) | 500 | 0.509211 | 0.436475, 0.581947 | <0.001 |
| Residence | 500 | |||
| Rural | — | — | ||
| Urban | 0.698860 | 0.505993, 0.891727 | <0.001 | |
| Maternal Education | 500 | |||
| Low | — | — | ||
| Secondary | 0.474929 | 0.264714, 0.685145 | <0.001 | |
| Tertiary | 0.801052 | 0.542754, 1.059351 | <0.001 | |
| Abbreviation: CI = Confidence Interval | ||||
Multivariable analysis was guided by the purposeful selection approach, which balances statistical criteria with theoretical relevance and confounding assessment. Stepwise selection was additionally performed as a comparative analysis to illustrate differences between theory-driven and automated model-building approaches.
STEP 1: Fit a model with significant variables or clinical importance
Based on the univariable analysis in previous section, we identified all independent variables as statistically significant predictors (p < 0.001). Therefore, all predictors were included in the
m.all <- lm(haz ~ age_months + household_income + dietary_diversity + residence + maternal_education, data = data.child)
tidy(m.all, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 7 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -2.36e+0 0.155 -15.2 <0.001 -2.67e+0 -2.06
## 2 age_months -1.28e-2 0.00200 -6.39 <0.001 -1.67e-2 -0.00884
## 3 household_income 4.26e-4 0.0000306 13.9 <0.001 3.66e-4 0.000486
## 4 dietary_diversity 2.73e-1 0.0322 8.49 <0.001 2.10e-1 0.337
## 5 residenceUrban 7.51e-1 0.0663 11.3 <0.001 6.21e-1 0.882
## 6 maternal_educationSec… 3.29e-1 0.0726 4.53 <0.001 1.86e-1 0.472
## 7 maternal_educationTer… 7.15e-1 0.0922 7.76 <0.001 5.34e-1 0.896
STEP 2: Identify key exposure In this analysis
In this analysis, household income was identified as the key exposure because it is an upstream, theory-driven factor with a central role in influencing child growth (Outcome = Height-for-age (HAZ)). Therefore, household income will be included in all proposed model.
STEP 3: Assess confounding
Decision for confounder/s is based on theory, whether:
1. It is related to household income. 2. It is related to HAZ. 3. It is not caused by household income.
Confounding factors were assessed by observing changes in the exposure effect when a variable is added to the model, not using p-value.
Model 1: household income only
m.income <- lm(haz ~ household_income, data = data.child)
tidy(m.income, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 2 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -1.12 0.114 -9.86 <0.001 -1.35 -0.898
## 2 household_income 0.000538 0.0000356 15.1 <0.001 0.000468 0.000608
Model 2: household income + child age in months
m.income.age <- lm(haz ~ household_income + age_months, data = data.child)
tidy(m.income.age, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 3 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -0.699 0.136 -5.14 <0.001 -0.966 -0.432
## 2 household_income 0.000541 0.0000347 15.6 <0.001 0.000473 0.000609
## 3 age_months -0.0135 0.00252 -5.35 <0.001 -0.0184 -0.00852
Model 3: household income + maternal education
m.income.edu <- lm(haz ~ household_income + maternal_education, data = data.child)
tidy(m.income.edu, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))
## # A tibble: 4 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -1.46e+0 0.116 -12.6 <0.001 -1.69e+0 -1.23
## 2 household_income 5.41e-4 0.0000335 16.1 <0.001 4.75e-4 0.000607
## 3 maternal_educationSec… 4.25e-1 0.0868 4.90 <0.001 2.55e-1 0.596
## 4 maternal_educationTer… 8.45e-1 0.107 7.93 <0.001 6.36e-1 1.05
We compare the values of estimated coefficients in these 3 models and if a significant percentage change indicates that the variables are the confounder.
% change in household income vs age in months
coeff_income <- tidy(m.income)$estimate [2]
coeff_income_age <- tidy(m.income.age)$estimate [2]
coeff_income_edu <- tidy(m.income.edu)$estimate [2]
pct_change_age <- ((coeff_income_age - coeff_income) / coeff_income) * 100
pct_change_age
## [1] 0.5233729
% change in household income vs maternal education
pct_change_edu <- ((coeff_income_edu - coeff_income) / coeff_income) * 100
pct_change_edu
## [1] 0.6511423
Interpretation:
A variable is considered a confounder if adjustment leads to a ≥10% change in the exposure coefficient. Neither age nor maternal education materially changed the income–HAZ association, with percentage changes of 0.52% and 0.65% respectively, indicating minimal confounding. Nevertheless, child age and maternal education were retained in adjusted models based on their theoretical relevance in child growth.
*Residence was not included in confounding assessment because it was considered a potential effect modifier rather than a confounder. The relationship between household income and child growth may differ between urban and rural settings due to differences in food access, cost of living, and caregiving environments. Therefore, residence was evaluated through an interaction term with household income.
Dietary diversity was also excluded from the confounding assessment as it lies on the causal pathway between household income and child growth. Thus dietary diversity was considered as a potential mediator.
STEP 4: Preliminary final model
Since no predictors were excluded, the preliminary final model is the full model from Step 1.
The fact that no predictors were removed, it indicates that the all the predictors independently contribute to the outcome and that the final model appropriately reflects the underlying relationship.
Now we explore the stepwise approach as an alternative for variable selection to compare with the purposeful selection approach
Backward
m.all <- lm(haz ~ age_months + household_income + dietary_diversity + residence + maternal_education, data = data.child)
step.bw <- MASS::stepAIC(m.all, direction = "backward")
## Start: AIC=-331.21
## haz ~ age_months + household_income + dietary_diversity + residence +
## maternal_education
##
## Df Sum of Sq RSS AIC
## <none> 250.68 -331.21
## - age_months 1 20.744 271.43 -293.46
## - maternal_education 2 31.680 282.36 -275.71
## - dietary_diversity 1 36.670 287.35 -264.95
## - residence 1 65.188 315.87 -217.64
## - household_income 1 98.360 349.04 -167.71
summary(step.bw)
##
## Call:
## lm(formula = haz ~ age_months + household_income + dietary_diversity +
## residence + maternal_education, data = data.child)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.07322 -0.47134 0.04735 0.46983 2.30035
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.361e+00 1.550e-01 -15.235 < 2e-16 ***
## age_months -1.277e-02 1.999e-03 -6.387 3.91e-10 ***
## household_income 4.262e-04 3.065e-05 13.908 < 2e-16 ***
## dietary_diversity 2.733e-01 3.218e-02 8.492 2.41e-16 ***
## residenceUrban 7.512e-01 6.635e-02 11.323 < 2e-16 ***
## maternal_educationSecondary 3.289e-01 7.262e-02 4.529 7.46e-06 ***
## maternal_educationTertiary 7.154e-01 9.215e-02 7.763 4.83e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7131 on 493 degrees of freedom
## Multiple R-squared: 0.5958, Adjusted R-squared: 0.5909
## F-statistic: 121.1 on 6 and 493 DF, p-value: < 2.2e-16
Interpretation:
Backward stepwise AIC selection indicated that all included variables contributed to model fit, as removal of any variable increased the AIC. Therefore, the full model was retained.
Forward
haz.only.m <- lm(haz ~ 1, data = data.child)
step.fw <- MASS::stepAIC(haz.only.m,
direction = "forward",
scope = list (upper = m.all,
lower = haz.only.m))
## Start: AIC=109.76
## haz ~ 1
##
## Df Sum of Sq RSS AIC
## + household_income 1 194.891 425.36 -76.831
## + dietary_diversity 1 170.763 449.49 -49.245
## + residence 1 57.296 562.96 63.300
## + maternal_education 2 48.599 571.66 72.965
## + age_months 1 21.163 599.09 94.404
## <none> 620.26 109.762
##
## Step: AIC=-76.83
## haz ~ household_income
##
## Df Sum of Sq RSS AIC
## + dietary_diversity 1 62.802 362.56 -154.705
## + residence 1 54.619 370.75 -143.547
## + maternal_education 2 50.536 374.83 -136.069
## + age_months 1 23.163 402.20 -102.827
## <none> 425.36 -76.831
##
## Step: AIC=-154.71
## haz ~ household_income + dietary_diversity
##
## Df Sum of Sq RSS AIC
## + residence 1 58.063 304.50 -239.97
## + age_months 1 23.410 339.15 -186.08
## + maternal_education 2 24.230 338.33 -185.29
## <none> 362.56 -154.71
##
## Step: AIC=-239.97
## haz ~ household_income + dietary_diversity + residence
##
## Df Sum of Sq RSS AIC
## + maternal_education 2 33.072 271.43 -293.46
## + age_months 1 22.136 282.36 -275.71
## <none> 304.50 -239.97
##
## Step: AIC=-293.46
## haz ~ household_income + dietary_diversity + residence + maternal_education
##
## Df Sum of Sq RSS AIC
## + age_months 1 20.744 250.68 -331.21
## <none> 271.43 -293.46
##
## Step: AIC=-331.21
## haz ~ household_income + dietary_diversity + residence + maternal_education +
## age_months
summary(step.fw)
##
## Call:
## lm(formula = haz ~ household_income + dietary_diversity + residence +
## maternal_education + age_months, data = data.child)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.07322 -0.47134 0.04735 0.46983 2.30035
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.361e+00 1.550e-01 -15.235 < 2e-16 ***
## household_income 4.262e-04 3.065e-05 13.908 < 2e-16 ***
## dietary_diversity 2.733e-01 3.218e-02 8.492 2.41e-16 ***
## residenceUrban 7.512e-01 6.635e-02 11.323 < 2e-16 ***
## maternal_educationSecondary 3.289e-01 7.262e-02 4.529 7.46e-06 ***
## maternal_educationTertiary 7.154e-01 9.215e-02 7.763 4.83e-14 ***
## age_months -1.277e-02 1.999e-03 -6.387 3.91e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7131 on 493 degrees of freedom
## Multiple R-squared: 0.5958, Adjusted R-squared: 0.5909
## F-statistic: 121.1 on 6 and 493 DF, p-value: < 2.2e-16
Interpretation:
Forward stepwise selection sequentially added predictors based on their contribution to model fit. The algorithm selected all predictors. This confirms that all predictors were all independently associated with higher HAZ. Thus, all predictors were retained in the model.
Both direction
step.both <- MASS::stepAIC(haz.only.m,
direction = "both",
scope = list (upper = m.all,
lower = haz.only.m))
## Start: AIC=109.76
## haz ~ 1
##
## Df Sum of Sq RSS AIC
## + household_income 1 194.891 425.36 -76.831
## + dietary_diversity 1 170.763 449.49 -49.245
## + residence 1 57.296 562.96 63.300
## + maternal_education 2 48.599 571.66 72.965
## + age_months 1 21.163 599.09 94.404
## <none> 620.26 109.762
##
## Step: AIC=-76.83
## haz ~ household_income
##
## Df Sum of Sq RSS AIC
## + dietary_diversity 1 62.802 362.56 -154.705
## + residence 1 54.619 370.75 -143.547
## + maternal_education 2 50.536 374.83 -136.069
## + age_months 1 23.163 402.20 -102.827
## <none> 425.36 -76.831
## - household_income 1 194.891 620.26 109.762
##
## Step: AIC=-154.71
## haz ~ household_income + dietary_diversity
##
## Df Sum of Sq RSS AIC
## + residence 1 58.063 304.50 -239.969
## + age_months 1 23.410 339.15 -186.079
## + maternal_education 2 24.230 338.33 -185.289
## <none> 362.56 -154.705
## - dietary_diversity 1 62.802 425.36 -76.831
## - household_income 1 86.929 449.49 -49.245
##
## Step: AIC=-239.97
## haz ~ household_income + dietary_diversity + residence
##
## Df Sum of Sq RSS AIC
## + maternal_education 2 33.072 271.43 -293.46
## + age_months 1 22.136 282.36 -275.71
## <none> 304.50 -239.97
## - residence 1 58.063 362.56 -154.71
## - dietary_diversity 1 66.245 370.75 -143.55
## - household_income 1 83.547 388.05 -120.74
##
## Step: AIC=-293.46
## haz ~ household_income + dietary_diversity + residence + maternal_education
##
## Df Sum of Sq RSS AIC
## + age_months 1 20.744 250.68 -331.21
## <none> 271.43 -293.46
## - maternal_education 2 33.072 304.50 -239.97
## - dietary_diversity 1 36.172 307.60 -232.90
## - residence 1 66.905 338.33 -185.29
## - household_income 1 97.709 369.14 -141.72
##
## Step: AIC=-331.21
## haz ~ household_income + dietary_diversity + residence + maternal_education +
## age_months
##
## Df Sum of Sq RSS AIC
## <none> 250.68 -331.21
## - age_months 1 20.744 271.43 -293.46
## - maternal_education 2 31.680 282.36 -275.71
## - dietary_diversity 1 36.670 287.35 -264.95
## - residence 1 65.188 315.87 -217.64
## - household_income 1 98.360 349.04 -167.71
summary(step.both)
##
## Call:
## lm(formula = haz ~ household_income + dietary_diversity + residence +
## maternal_education + age_months, data = data.child)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.07322 -0.47134 0.04735 0.46983 2.30035
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.361e+00 1.550e-01 -15.235 < 2e-16 ***
## household_income 4.262e-04 3.065e-05 13.908 < 2e-16 ***
## dietary_diversity 2.733e-01 3.218e-02 8.492 2.41e-16 ***
## residenceUrban 7.512e-01 6.635e-02 11.323 < 2e-16 ***
## maternal_educationSecondary 3.289e-01 7.262e-02 4.529 7.46e-06 ***
## maternal_educationTertiary 7.154e-01 9.215e-02 7.763 4.83e-14 ***
## age_months -1.277e-02 1.999e-03 -6.387 3.91e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7131 on 493 degrees of freedom
## Multiple R-squared: 0.5958, Adjusted R-squared: 0.5909
## F-statistic: 121.1 on 6 and 493 DF, p-value: < 2.2e-16
Interpretation:
Using bidirectional (stepwise) selection, all predictors were retained in the final multivariable model, as their inclusion improved model fit. The consistency across all three stepwise approaches (backward, forward, and both) provides strong evidence for this model specification.
anova_combined <- bind_rows(
as.data.frame(step.bw$anova) |> mutate (Method = "Backward"),
as.data.frame(step.fw$anova) |> mutate (Method = "Forward"),
as.data.frame(step.both$anova) |> mutate (Method = "Both"),
) |>
relocate(Method, before = 1)
anova_combined
## Method before Df Deviance Resid. Df Resid. Dev AIC
## 1 Backward NA NA 493 250.6835 -331.20844
## 2 Forward NA NA 499 620.2555 109.76173
## 3 Forward + household_income 1 194.89098 498 425.3646 -76.83075
## 4 Forward + dietary_diversity 1 62.80182 497 362.5627 -154.70528
## 5 Forward + residence 1 58.06290 496 304.4998 -239.96877
## 6 Forward + maternal_education 2 33.07244 494 271.4274 -293.45670
## 7 Forward + age_months 1 20.74389 493 250.6835 -331.20844
## 8 Both NA NA 499 620.2555 109.76173
## 9 Both + household_income 1 194.89098 498 425.3646 -76.83075
## 10 Both + dietary_diversity 1 62.80182 497 362.5627 -154.70528
## 11 Both + residence 1 58.06290 496 304.4998 -239.96877
## 12 Both + maternal_education 2 33.07244 494 271.4274 -293.45670
## 13 Both + age_months 1 20.74389 493 250.6835 -331.20844
Interpretation:
All three selection methods (forward, backward, and both) consistently identified household income, child age in months, dietary diversity, residence and maternal education as significant predictors of HAZ.
We compare the values of estimated coefficient change between model with key exposures only, and key exposures with the respective potential confounder.
In this analysis, potential confounders are:
so, model to be compared were:
coeff_compare_table <- data.frame(
Model = c("Income only", "Income + Age in months", "Income + Maternal education"),
Estimated_coefficients = c(coeff_income, coeff_income_age, coeff_income_edu),
Percent_change = c(NA, pct_change_age, pct_change_edu)
)
coeff_compare_table
## Model Estimated_coefficients Percent_change
## 1 Income only 0.0005377869 NA
## 2 Income + Age in months 0.0005406015 0.5233729
## 3 Income + Maternal education 0.0005412886 0.6511423
Interpretation:
A variable is considered a confounder if adjustment leads to a ≥10% change in the exposure coefficient. Neither age nor maternal education materially changed the income–HAZ association, with percentage changes of 0.52% and 0.65% respectively, indicating minimal confounding. Nevertheless, child age and maternal education were retained in adjusted models based on their theoretical relevance in child growth.
Potential effect modification was assessed by testing the interaction terms in the model. Interaction terms were examined selectively based on theoretical plausibility rather than exhaustively testing all possible combinations, to avoid overfitting and maintain interpretability.
There were 3 potential interactions selected in this analysis.
Selection relevance:
ia.income.res <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + household_income*residence, data = data.child)
tidy(ia.income.res, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))
## # A tibble: 8 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -1.94e+0 0.191 -10.1 <0.001 -2.31e+0 -1.56
## 2 household_income 2.90e-4 0.0000476 6.10 <0.001 1.97e-4 0.000384
## 3 age_months -1.31e-2 0.00198 -6.63 <0.001 -1.70e-2 -0.00923
## 4 dietary_diversity 2.74e-1 0.0318 8.61 <0.001 2.11e-1 0.336
## 5 residenceUrban 1.26e-1 0.182 0.691 0.49 -2.31e-1 0.483
## 6 maternal_educationSec… 3.09e-1 0.0719 4.30 <0.001 1.68e-1 0.451
## 7 maternal_educationTer… 6.97e-1 0.0911 7.65 <0.001 5.18e-1 0.876
## 8 household_income:resi… 2.10e-4 0.0000570 3.69 <0.001 9.83e-5 0.000322
Interpretation:
A significant interaction between household income and residence was observed (p < 0.001), indicating that the positive association between income and HAZ was stronger in urban than rural children.
ia.income.age <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + household_income*age_months, data = data.child)
tidy(ia.income.age, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))
## # A tibble: 8 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -2.37e+0 2.27e-1 -10.4 <0.001 -2.82e+0 -1.92e+0
## 2 household_income 4.29e-4 6.31e-5 6.81 <0.001 3.05e-4 5.53e-4
## 3 age_months -1.25e-2 5.71e-3 -2.18 0.0294 -2.37e-2 -1.26e-3
## 4 dietary_diversity 2.73e-1 3.22e-2 8.48 <0.001 2.10e-1 3.37e-1
## 5 residenceUrban 7.51e-1 6.65e-2 11.3 <0.001 6.21e-1 8.82e-1
## 6 maternal_educationSec… 3.29e-1 7.27e-2 4.52 <0.001 1.86e-1 4.72e-1
## 7 maternal_educationTer… 7.15e-1 9.23e-2 7.75 <0.001 5.34e-1 8.97e-1
## 8 household_income:age_… -1.01e-7 1.77e-6 -0.0569 0.9546 -3.58e-6 3.38e-6
Interpretation:
No evidence of effect modification by child age was observed (p = 0.95); therefore, the interaction term was excluded from the final model.
ia.income.edu <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + household_income*maternal_education, data = data.child)
tidy(ia.income.edu, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))
## # A tibble: 9 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -2.53e+0 0.193 -13.1 <0.001 -2.91e+0 -2.15e+0
## 2 household_income 4.78e-4 0.0000494 9.69 <0.001 3.81e-4 5.75e-4
## 3 age_months -1.27e-2 0.00199 -6.39 <0.001 -1.67e-2 -8.83e-3
## 4 dietary_diversity 2.76e-1 0.0321 8.58 <0.001 2.13e-1 3.39e-1
## 5 residenceUrban 7.56e-1 0.0664 11.4 <0.001 6.26e-1 8.87e-1
## 6 maternal_educationSec… 4.33e-1 0.204 2.12 0.0341 3.26e-2 8.34e-1
## 7 maternal_educationTer… 1.18e+0 0.234 5.04 <0.001 7.17e-1 1.64e+0
## 8 household_income:mate… -3.62e-5 0.0000634 -0.571 0.5679 -1.61e-4 8.83e-5
## 9 household_income:mate… -1.59e-4 0.0000738 -2.16 0.0315 -3.04e-4 -1.42e-5
Interpretation:
A significant interaction between household income and maternal education was observed for tertiary education (p = 0.032), indicating that the positive association between income and HAZ was weaker among children of mothers with tertiary education. This finding suggests that higher maternal education may mitigate the dependence of child growth on household income, possibly through improved caregiving practices and nutrition knowledge.
Although the interaction between household income and maternal education was explored, it was not retained in the final model because evidence of effect modification was limited to the tertiary education category and the magnitude of interaction was modest, as indicated by the small estimated coefficient for the income × maternal education (tertiary) term, despite statistical significance.
Based on the three interactions tested, the final model should include the interaction between household income and residence, while excluding the income × age and the income × maternal education interaction.
prelim_model <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + household_income*residence, data = data.child)
tidy(prelim_model, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))
## # A tibble: 8 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -1.94e+0 0.191 -10.1 <0.001 -2.31e+0 -1.56
## 2 household_income 2.90e-4 0.0000476 6.10 <0.001 1.97e-4 0.000384
## 3 age_months -1.31e-2 0.00198 -6.63 <0.001 -1.70e-2 -0.00923
## 4 dietary_diversity 2.74e-1 0.0318 8.61 <0.001 2.11e-1 0.336
## 5 residenceUrban 1.26e-1 0.182 0.691 0.49 -2.31e-1 0.483
## 6 maternal_educationSec… 3.09e-1 0.0719 4.30 <0.001 1.68e-1 0.451
## 7 maternal_educationTer… 6.97e-1 0.0911 7.65 <0.001 5.18e-1 0.876
## 8 household_income:resi… 2.10e-4 0.0000570 3.69 <0.001 9.83e-5 0.000322
#Residuals vs Fitted plot
plot(prelim_model, which = 1)
Interpretation:
The residuals versus fitted plot shows a random scatter around zero with no clear pattern, indicating that the assumptions of linearity and homoscedasticity are reasonably met.
library(gridExtra)
# Residuals vs Child age in month
p1 <- augment(prelim_model) |>
ggplot(aes(x = age_months, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, color = "red")
# Residuals vs Child household income
p2 <- augment(prelim_model) |>
ggplot(aes(x = household_income, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, color = "red")
# Residuals vs Child dietary diveristy
p3 <- augment(prelim_model) |>
ggplot(aes(x = dietary_diversity, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, color = "red")
grid.arrange (p1,p2, p3, ncol = 3)
Interpretation:
Residual plots against child age, household income, and dietary diversity showed random scatter around zero with no evident systematic patterns or curvature. This suggests that the linearity assumption for continuous predictors was reasonably met. The spread of residuals was relatively constant across the range of predictors, indicating no major violation of homoscedasticity.
par(mfrow = c(1, 2))
data.res <- augment(prelim_model)
plot(prelim_model, which = 2)
hist(data.res$.resid)
shapiro.test(prelim_model$residuals)
##
## Shapiro-Wilk normality test
##
## data: prelim_model$residuals
## W = 0.997, p-value = 0.4912
Interpretation:
Homoscedasticity was assessed by visual inspection of residuals versus fitted values and scale–location plots.
par(mfrow = c(1, 2))
plot(prelim_model, which = c(1, 3))
Interpretation:
Visual inspection of the residuals versus fitted values and scale–location plots showed a random scatter of residuals around zero with no obvious systematic pattern. The smooth lines in both plots were approximately horizontal, and the spread of residuals was relatively constant across fitted values. These findings indicate that the assumptions of linearity and homoscedasticity were reasonably met.
library(lmtest)
bptest(prelim_model)
##
## studentized Breusch-Pagan test
##
## data: prelim_model
## BP = 2.588, df = 7, p-value = 0.9203
Interpretation:
The Breusch–Pagan test did not indicate evidence of heteroscedasticity (BP = 2.59, df = 7, p = 0.92), supporting the assumption of constant variance. This finding was consistent with visual inspection of residual and scale–location plots.
A Box–Cox transformation was considered to improve normality and variance homogeneity of the outcome. If lamda is outside the range of -1 to 1, transformation is needed due to validity of normality and homoscedasticity assumptions.
In this analysis, Box–Cox transformation was not applied because the outcome variable included negative values, violating the requirement for strictly positive responses. Model adequacy was instead assessed using visual diagnostics.
library(car)
vif(prelim_model)
## GVIF Df GVIF^(1/(2*Df))
## household_income 3.084380 1 1.756240
## age_months 1.005642 1 1.002817
## dietary_diversity 1.341233 1 1.158116
## residence 7.814222 1 2.795393
## maternal_education 1.132323 2 1.031556
## household_income:residence 9.709263 1 3.115969
Interpretation:
Multicollinearity was assessed using variance inflation factors. Adjusted GVIF values (GVIF^(1/(2·Df))) were examined due to the inclusion of categorical variables and interaction terms. All predictors had adjusted GVIF values below 5, indicating no evidence of problematic multicollinearity.
We used three diagnostic criteria to identify potentially influential observations:
Where: \(n\) = number of observations \(k\) = number of predictors in the model
These thresholds help flag data points that may disproportionately influence model estimates or distort inference.
# Add diagnostics columns
child.inf <- augment(prelim_model, data.child)
# Define thresholds
n_obs <- nrow(data.child)
p_preds <- length(coef(prelim_model))
cooks_cutoff <- 4 / n_obs
leverage_cutoff <- (2 * p_preds + 2) / n_obs
# Filter Influential Observations
influential_obs <- child.inf %>%
filter(.cooksd > cooks_cutoff | abs(.std.resid) > 2 | .hat > leverage_cutoff)
# Print count
cat("Number of influential observation:", nrow(influential_obs), "\n")
## Number of influential observation: 43
cat("Cook's distance cutoff:", round(cooks_cutoff, 4), "\n")
## Cook's distance cutoff: 0.008
plot(prelim_model, which = 4)
library(DT)
child.inf |>
filter(.std.resid > 2 | .std.resid < -2 | .hat > 0.038) |>
datatable()
plot(prelim_model, which = 5)
Interpretation:
Using a Cook’s distance cutoff of 4/n, several observations were flagged as potentially influential. However, none exhibited Cook’s distance values approaching 1, and residuals versus leverage plots did not reveal observations with both high leverage and large residuals. Sensitivity analyses excluding the most influential observations did not materially alter model estimates, indicating that no individual observation exerted undue influence on the results.
final.m <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + (household_income*residence), data = data.child)
tidy(final.m, conf.int = TRUE) |>
mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))
## # A tibble: 8 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (Intercept) -1.94e+0 0.191 -10.1 <0.001 -2.31e+0 -1.56
## 2 household_income 2.90e-4 0.0000476 6.10 <0.001 1.97e-4 0.000384
## 3 age_months -1.31e-2 0.00198 -6.63 <0.001 -1.70e-2 -0.00923
## 4 dietary_diversity 2.74e-1 0.0318 8.61 <0.001 2.11e-1 0.336
## 5 residenceUrban 1.26e-1 0.182 0.691 0.49 -2.31e-1 0.483
## 6 maternal_educationSec… 3.09e-1 0.0719 4.30 <0.001 1.68e-1 0.451
## 7 maternal_educationTer… 6.97e-1 0.0911 7.65 <0.001 5.18e-1 0.876
## 8 household_income:resi… 2.10e-4 0.0000570 3.69 <0.001 9.83e-5 0.000322
library(gt)
tbl_regression(final.m,
intercept = TRUE,
label = list(household_income ~ "Household Income (RM)",
age_months ~ "Age (months)",
dietary_diversity ~ "Dietary Diversity (Food group count)",
residence ~ "Residence",
maternal_education ~ "Maternal Education")) |>
bold_p(t = 0.05) |>
bold_labels() |>
add_glance_table(include = c(r.squared, adj.r.squared, AIC, BIC)) |>
modify_fmt_fun(
estimate ~ function(x) style_number(x, digits = 6),
conf.low ~ function(x) style_number(x, digits = 6),
conf.high ~ function(x) style_number(x, digits = 6)) |>
modify_header(label = "**Variables**",
estimate = "**Adj. B**",
statistic = "**t-stat**") |>
modify_caption("**Final Multivariable Linear Regression Model for Height-For-Age (Z-score)**")
| Variables | Adj. B | t-stat | 95% CI | p-value |
|---|---|---|---|---|
| (Intercept) | -1.939003 | -10.1 | -2.314524, -1.563481 | <0.001 |
| Household Income (RM) | 0.000290 | 6.10 | 0.000197, 0.000384 | <0.001 |
| Age (months) | -0.013111 | -6.63 | -0.016993, -0.009228 | <0.001 |
| Dietary Diversity (Food group count) | 0.273727 | 8.61 | 0.211293, 0.336161 | <0.001 |
| Residence | ||||
| Rural | — | — | — | |
| Urban | 0.125640 | 0.691 | -0.231419, 0.482699 | 0.5 |
| Maternal Education | ||||
| Low | — | — | — | |
| Secondary | 0.309225 | 4.30 | 0.167944, 0.450507 | <0.001 |
| Tertiary | 0.696868 | 7.65 | 0.517805, 0.875931 | <0.001 |
| Household Income (RM) * Residence | ||||
| Household Income (RM) * Urban | 0.000210 | 3.69 | 0.000098, 0.000322 | <0.001 |
| R² | 0.606725 | |||
| Adjusted R² | 0.601130 | |||
| AIC | 1,078.076916 | |||
| BIC | 1,116.008389 | |||
| Abbreviation: CI = Confidence Interval | ||||
\(\widehat{income} = (-1.9390) + 0.0003(\text{Income}) −0.0131(\text{Age}) + 0.2737(\text{Dietary Diversity}) + 0.1256 (\text{Urban}) + 0.3092 (\text{Secondary}) + 0.6969 (\text{Tertiary}) + 0.0002 (\text{Income × Urban})\)
These values indicate the effect of each predictor on HAZ after adjusting for the other variables.
a.Household income (RM) - For each additional RM 100 in household income associated with an increase of 0.029 HAZ, indicate better growth among children from higher-income households.
The final model was validated to ensure the stability and reliability of the estimates.
Homoscedasticity: The studentized Breusch-Pagan test (\(\text{p = 0.92}\)) indicates we fail to reject the null hypothesis of constant variance, meaning the model is homoscedastic.
Multicollinearity: All predictors showed VIF values below 5, indicating no evidence of problematic multicollinearity.
Influence and outliers: 43 influential observations were identified (Cook’s Distance cutoff = 0.008). Observation 300 is the most influential, but leverage plots confirm it does not cross critical thresholds to be considered a harmful outlier.
# Create the Forest Plot
sjPlot::plot_model(final.m,
type = "est",
show.values = TRUE,
value.offset = .3,
title = "Forest Plot of Height-for-age (Z-score",
axis.labels = c(
"household_income:residence" = "Interaction: Income x Residence",
"maternal_education" = "Maternal Education",
"residence" = "Residence",
"dietary_diversity" = "Dietary Diversity",
"age_months" = "Child age in month",
"household_income" = "Household Income"
),
vline.color = "red")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the sjPlot package.
## Please report the issue at <https://github.com/strengejacke/sjPlot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Interpretation:
Child HAZ is strongly associated with socioeconomic and nutritional factors. Dietary diversity and maternal education show the largest positive effects, while child age is negatively associated with HAZ. The impact of household income on child growth is significantly modified by urban residence.
# Actual vs Predicted
augmented_data <- augment(final.m)
ggplot(augmented_data, aes(x = haz, y = .fitted)) +
geom_point(alpha = 0.4, color = "#667eea", size = 2) +
geom_abline(intercept = 0, slope = 1, color = "#F57C00",
linewidth = 1.2, linetype = "dashed") +
geom_smooth(method = "lm", se = FALSE, color = "#D32F2F", linewidth = 1) +
labs(title = "Actual vs Predicted HAZ",
x = "Actual HAZ (z-score)",
y = "Predicted HAZ (z-score)") +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(face = "bold", hjust = 0.5)) +
annotate("text", x = (augmented_data$sbp) + 5,
y = max(augmented_data$.fitted) - 5,
size = 5, fontface = "bold", color = "#D32F2F")
## Warning: Unknown or uninitialised column: `sbp`.
Interpretation:
The Actual versus Predicted HAZ plot shows that predicted values closely align with observed HAZ, indicating good model fit, an appropriate linear specification, and no major systematic prediction bias.
new_data <- expand.grid(household_income = c(750, 1000, 1050),
age_months = c(8, 12, 25, 40, 55),
dietary_diversity = c(2.0, 3.2, 5.0, 7.9),
residence = c('Urban', 'Rural'),
maternal_education = c('Low', 'Secondary', 'Tertiary'))
new_data
## household_income age_months dietary_diversity residence maternal_education
## 1 750 8 2.0 Urban Low
## 2 1000 8 2.0 Urban Low
## 3 1050 8 2.0 Urban Low
## 4 750 12 2.0 Urban Low
## 5 1000 12 2.0 Urban Low
## 6 1050 12 2.0 Urban Low
## 7 750 25 2.0 Urban Low
## 8 1000 25 2.0 Urban Low
## 9 1050 25 2.0 Urban Low
## 10 750 40 2.0 Urban Low
## 11 1000 40 2.0 Urban Low
## 12 1050 40 2.0 Urban Low
## 13 750 55 2.0 Urban Low
## 14 1000 55 2.0 Urban Low
## 15 1050 55 2.0 Urban Low
## 16 750 8 3.2 Urban Low
## 17 1000 8 3.2 Urban Low
## 18 1050 8 3.2 Urban Low
## 19 750 12 3.2 Urban Low
## 20 1000 12 3.2 Urban Low
## 21 1050 12 3.2 Urban Low
## 22 750 25 3.2 Urban Low
## 23 1000 25 3.2 Urban Low
## 24 1050 25 3.2 Urban Low
## 25 750 40 3.2 Urban Low
## 26 1000 40 3.2 Urban Low
## 27 1050 40 3.2 Urban Low
## 28 750 55 3.2 Urban Low
## 29 1000 55 3.2 Urban Low
## 30 1050 55 3.2 Urban Low
## 31 750 8 5.0 Urban Low
## 32 1000 8 5.0 Urban Low
## 33 1050 8 5.0 Urban Low
## 34 750 12 5.0 Urban Low
## 35 1000 12 5.0 Urban Low
## 36 1050 12 5.0 Urban Low
## 37 750 25 5.0 Urban Low
## 38 1000 25 5.0 Urban Low
## 39 1050 25 5.0 Urban Low
## 40 750 40 5.0 Urban Low
## 41 1000 40 5.0 Urban Low
## 42 1050 40 5.0 Urban Low
## 43 750 55 5.0 Urban Low
## 44 1000 55 5.0 Urban Low
## 45 1050 55 5.0 Urban Low
## 46 750 8 7.9 Urban Low
## 47 1000 8 7.9 Urban Low
## 48 1050 8 7.9 Urban Low
## 49 750 12 7.9 Urban Low
## 50 1000 12 7.9 Urban Low
## 51 1050 12 7.9 Urban Low
## 52 750 25 7.9 Urban Low
## 53 1000 25 7.9 Urban Low
## 54 1050 25 7.9 Urban Low
## 55 750 40 7.9 Urban Low
## 56 1000 40 7.9 Urban Low
## 57 1050 40 7.9 Urban Low
## 58 750 55 7.9 Urban Low
## 59 1000 55 7.9 Urban Low
## 60 1050 55 7.9 Urban Low
## 61 750 8 2.0 Rural Low
## 62 1000 8 2.0 Rural Low
## 63 1050 8 2.0 Rural Low
## 64 750 12 2.0 Rural Low
## 65 1000 12 2.0 Rural Low
## 66 1050 12 2.0 Rural Low
## 67 750 25 2.0 Rural Low
## 68 1000 25 2.0 Rural Low
## 69 1050 25 2.0 Rural Low
## 70 750 40 2.0 Rural Low
## 71 1000 40 2.0 Rural Low
## 72 1050 40 2.0 Rural Low
## 73 750 55 2.0 Rural Low
## 74 1000 55 2.0 Rural Low
## 75 1050 55 2.0 Rural Low
## 76 750 8 3.2 Rural Low
## 77 1000 8 3.2 Rural Low
## 78 1050 8 3.2 Rural Low
## 79 750 12 3.2 Rural Low
## 80 1000 12 3.2 Rural Low
## 81 1050 12 3.2 Rural Low
## 82 750 25 3.2 Rural Low
## 83 1000 25 3.2 Rural Low
## 84 1050 25 3.2 Rural Low
## 85 750 40 3.2 Rural Low
## 86 1000 40 3.2 Rural Low
## 87 1050 40 3.2 Rural Low
## 88 750 55 3.2 Rural Low
## 89 1000 55 3.2 Rural Low
## 90 1050 55 3.2 Rural Low
## 91 750 8 5.0 Rural Low
## 92 1000 8 5.0 Rural Low
## 93 1050 8 5.0 Rural Low
## 94 750 12 5.0 Rural Low
## 95 1000 12 5.0 Rural Low
## 96 1050 12 5.0 Rural Low
## 97 750 25 5.0 Rural Low
## 98 1000 25 5.0 Rural Low
## 99 1050 25 5.0 Rural Low
## 100 750 40 5.0 Rural Low
## 101 1000 40 5.0 Rural Low
## 102 1050 40 5.0 Rural Low
## 103 750 55 5.0 Rural Low
## 104 1000 55 5.0 Rural Low
## 105 1050 55 5.0 Rural Low
## 106 750 8 7.9 Rural Low
## 107 1000 8 7.9 Rural Low
## 108 1050 8 7.9 Rural Low
## 109 750 12 7.9 Rural Low
## 110 1000 12 7.9 Rural Low
## 111 1050 12 7.9 Rural Low
## 112 750 25 7.9 Rural Low
## 113 1000 25 7.9 Rural Low
## 114 1050 25 7.9 Rural Low
## 115 750 40 7.9 Rural Low
## 116 1000 40 7.9 Rural Low
## 117 1050 40 7.9 Rural Low
## 118 750 55 7.9 Rural Low
## 119 1000 55 7.9 Rural Low
## 120 1050 55 7.9 Rural Low
## 121 750 8 2.0 Urban Secondary
## 122 1000 8 2.0 Urban Secondary
## 123 1050 8 2.0 Urban Secondary
## 124 750 12 2.0 Urban Secondary
## 125 1000 12 2.0 Urban Secondary
## 126 1050 12 2.0 Urban Secondary
## 127 750 25 2.0 Urban Secondary
## 128 1000 25 2.0 Urban Secondary
## 129 1050 25 2.0 Urban Secondary
## 130 750 40 2.0 Urban Secondary
## 131 1000 40 2.0 Urban Secondary
## 132 1050 40 2.0 Urban Secondary
## 133 750 55 2.0 Urban Secondary
## 134 1000 55 2.0 Urban Secondary
## 135 1050 55 2.0 Urban Secondary
## 136 750 8 3.2 Urban Secondary
## 137 1000 8 3.2 Urban Secondary
## 138 1050 8 3.2 Urban Secondary
## 139 750 12 3.2 Urban Secondary
## 140 1000 12 3.2 Urban Secondary
## 141 1050 12 3.2 Urban Secondary
## 142 750 25 3.2 Urban Secondary
## 143 1000 25 3.2 Urban Secondary
## 144 1050 25 3.2 Urban Secondary
## 145 750 40 3.2 Urban Secondary
## 146 1000 40 3.2 Urban Secondary
## 147 1050 40 3.2 Urban Secondary
## 148 750 55 3.2 Urban Secondary
## 149 1000 55 3.2 Urban Secondary
## 150 1050 55 3.2 Urban Secondary
## 151 750 8 5.0 Urban Secondary
## 152 1000 8 5.0 Urban Secondary
## 153 1050 8 5.0 Urban Secondary
## 154 750 12 5.0 Urban Secondary
## 155 1000 12 5.0 Urban Secondary
## 156 1050 12 5.0 Urban Secondary
## 157 750 25 5.0 Urban Secondary
## 158 1000 25 5.0 Urban Secondary
## 159 1050 25 5.0 Urban Secondary
## 160 750 40 5.0 Urban Secondary
## 161 1000 40 5.0 Urban Secondary
## 162 1050 40 5.0 Urban Secondary
## 163 750 55 5.0 Urban Secondary
## 164 1000 55 5.0 Urban Secondary
## 165 1050 55 5.0 Urban Secondary
## 166 750 8 7.9 Urban Secondary
## 167 1000 8 7.9 Urban Secondary
## 168 1050 8 7.9 Urban Secondary
## 169 750 12 7.9 Urban Secondary
## 170 1000 12 7.9 Urban Secondary
## 171 1050 12 7.9 Urban Secondary
## 172 750 25 7.9 Urban Secondary
## 173 1000 25 7.9 Urban Secondary
## 174 1050 25 7.9 Urban Secondary
## 175 750 40 7.9 Urban Secondary
## 176 1000 40 7.9 Urban Secondary
## 177 1050 40 7.9 Urban Secondary
## 178 750 55 7.9 Urban Secondary
## 179 1000 55 7.9 Urban Secondary
## 180 1050 55 7.9 Urban Secondary
## 181 750 8 2.0 Rural Secondary
## 182 1000 8 2.0 Rural Secondary
## 183 1050 8 2.0 Rural Secondary
## 184 750 12 2.0 Rural Secondary
## 185 1000 12 2.0 Rural Secondary
## 186 1050 12 2.0 Rural Secondary
## 187 750 25 2.0 Rural Secondary
## 188 1000 25 2.0 Rural Secondary
## 189 1050 25 2.0 Rural Secondary
## 190 750 40 2.0 Rural Secondary
## 191 1000 40 2.0 Rural Secondary
## 192 1050 40 2.0 Rural Secondary
## 193 750 55 2.0 Rural Secondary
## 194 1000 55 2.0 Rural Secondary
## 195 1050 55 2.0 Rural Secondary
## 196 750 8 3.2 Rural Secondary
## 197 1000 8 3.2 Rural Secondary
## 198 1050 8 3.2 Rural Secondary
## 199 750 12 3.2 Rural Secondary
## 200 1000 12 3.2 Rural Secondary
## 201 1050 12 3.2 Rural Secondary
## 202 750 25 3.2 Rural Secondary
## 203 1000 25 3.2 Rural Secondary
## 204 1050 25 3.2 Rural Secondary
## 205 750 40 3.2 Rural Secondary
## 206 1000 40 3.2 Rural Secondary
## 207 1050 40 3.2 Rural Secondary
## 208 750 55 3.2 Rural Secondary
## 209 1000 55 3.2 Rural Secondary
## 210 1050 55 3.2 Rural Secondary
## 211 750 8 5.0 Rural Secondary
## 212 1000 8 5.0 Rural Secondary
## 213 1050 8 5.0 Rural Secondary
## 214 750 12 5.0 Rural Secondary
## 215 1000 12 5.0 Rural Secondary
## 216 1050 12 5.0 Rural Secondary
## 217 750 25 5.0 Rural Secondary
## 218 1000 25 5.0 Rural Secondary
## 219 1050 25 5.0 Rural Secondary
## 220 750 40 5.0 Rural Secondary
## 221 1000 40 5.0 Rural Secondary
## 222 1050 40 5.0 Rural Secondary
## 223 750 55 5.0 Rural Secondary
## 224 1000 55 5.0 Rural Secondary
## 225 1050 55 5.0 Rural Secondary
## 226 750 8 7.9 Rural Secondary
## 227 1000 8 7.9 Rural Secondary
## 228 1050 8 7.9 Rural Secondary
## 229 750 12 7.9 Rural Secondary
## 230 1000 12 7.9 Rural Secondary
## 231 1050 12 7.9 Rural Secondary
## 232 750 25 7.9 Rural Secondary
## 233 1000 25 7.9 Rural Secondary
## 234 1050 25 7.9 Rural Secondary
## 235 750 40 7.9 Rural Secondary
## 236 1000 40 7.9 Rural Secondary
## 237 1050 40 7.9 Rural Secondary
## 238 750 55 7.9 Rural Secondary
## 239 1000 55 7.9 Rural Secondary
## 240 1050 55 7.9 Rural Secondary
## 241 750 8 2.0 Urban Tertiary
## 242 1000 8 2.0 Urban Tertiary
## 243 1050 8 2.0 Urban Tertiary
## 244 750 12 2.0 Urban Tertiary
## 245 1000 12 2.0 Urban Tertiary
## 246 1050 12 2.0 Urban Tertiary
## 247 750 25 2.0 Urban Tertiary
## 248 1000 25 2.0 Urban Tertiary
## 249 1050 25 2.0 Urban Tertiary
## 250 750 40 2.0 Urban Tertiary
## 251 1000 40 2.0 Urban Tertiary
## 252 1050 40 2.0 Urban Tertiary
## 253 750 55 2.0 Urban Tertiary
## 254 1000 55 2.0 Urban Tertiary
## 255 1050 55 2.0 Urban Tertiary
## 256 750 8 3.2 Urban Tertiary
## 257 1000 8 3.2 Urban Tertiary
## 258 1050 8 3.2 Urban Tertiary
## 259 750 12 3.2 Urban Tertiary
## 260 1000 12 3.2 Urban Tertiary
## 261 1050 12 3.2 Urban Tertiary
## 262 750 25 3.2 Urban Tertiary
## 263 1000 25 3.2 Urban Tertiary
## 264 1050 25 3.2 Urban Tertiary
## 265 750 40 3.2 Urban Tertiary
## 266 1000 40 3.2 Urban Tertiary
## 267 1050 40 3.2 Urban Tertiary
## 268 750 55 3.2 Urban Tertiary
## 269 1000 55 3.2 Urban Tertiary
## 270 1050 55 3.2 Urban Tertiary
## 271 750 8 5.0 Urban Tertiary
## 272 1000 8 5.0 Urban Tertiary
## 273 1050 8 5.0 Urban Tertiary
## 274 750 12 5.0 Urban Tertiary
## 275 1000 12 5.0 Urban Tertiary
## 276 1050 12 5.0 Urban Tertiary
## 277 750 25 5.0 Urban Tertiary
## 278 1000 25 5.0 Urban Tertiary
## 279 1050 25 5.0 Urban Tertiary
## 280 750 40 5.0 Urban Tertiary
## 281 1000 40 5.0 Urban Tertiary
## 282 1050 40 5.0 Urban Tertiary
## 283 750 55 5.0 Urban Tertiary
## 284 1000 55 5.0 Urban Tertiary
## 285 1050 55 5.0 Urban Tertiary
## 286 750 8 7.9 Urban Tertiary
## 287 1000 8 7.9 Urban Tertiary
## 288 1050 8 7.9 Urban Tertiary
## 289 750 12 7.9 Urban Tertiary
## 290 1000 12 7.9 Urban Tertiary
## 291 1050 12 7.9 Urban Tertiary
## 292 750 25 7.9 Urban Tertiary
## 293 1000 25 7.9 Urban Tertiary
## 294 1050 25 7.9 Urban Tertiary
## 295 750 40 7.9 Urban Tertiary
## 296 1000 40 7.9 Urban Tertiary
## 297 1050 40 7.9 Urban Tertiary
## 298 750 55 7.9 Urban Tertiary
## 299 1000 55 7.9 Urban Tertiary
## 300 1050 55 7.9 Urban Tertiary
## 301 750 8 2.0 Rural Tertiary
## 302 1000 8 2.0 Rural Tertiary
## 303 1050 8 2.0 Rural Tertiary
## 304 750 12 2.0 Rural Tertiary
## 305 1000 12 2.0 Rural Tertiary
## 306 1050 12 2.0 Rural Tertiary
## 307 750 25 2.0 Rural Tertiary
## 308 1000 25 2.0 Rural Tertiary
## 309 1050 25 2.0 Rural Tertiary
## 310 750 40 2.0 Rural Tertiary
## 311 1000 40 2.0 Rural Tertiary
## 312 1050 40 2.0 Rural Tertiary
## 313 750 55 2.0 Rural Tertiary
## 314 1000 55 2.0 Rural Tertiary
## 315 1050 55 2.0 Rural Tertiary
## 316 750 8 3.2 Rural Tertiary
## 317 1000 8 3.2 Rural Tertiary
## 318 1050 8 3.2 Rural Tertiary
## 319 750 12 3.2 Rural Tertiary
## 320 1000 12 3.2 Rural Tertiary
## 321 1050 12 3.2 Rural Tertiary
## 322 750 25 3.2 Rural Tertiary
## 323 1000 25 3.2 Rural Tertiary
## 324 1050 25 3.2 Rural Tertiary
## 325 750 40 3.2 Rural Tertiary
## 326 1000 40 3.2 Rural Tertiary
## 327 1050 40 3.2 Rural Tertiary
## 328 750 55 3.2 Rural Tertiary
## 329 1000 55 3.2 Rural Tertiary
## 330 1050 55 3.2 Rural Tertiary
## 331 750 8 5.0 Rural Tertiary
## 332 1000 8 5.0 Rural Tertiary
## 333 1050 8 5.0 Rural Tertiary
## 334 750 12 5.0 Rural Tertiary
## 335 1000 12 5.0 Rural Tertiary
## 336 1050 12 5.0 Rural Tertiary
## 337 750 25 5.0 Rural Tertiary
## 338 1000 25 5.0 Rural Tertiary
## 339 1050 25 5.0 Rural Tertiary
## 340 750 40 5.0 Rural Tertiary
## 341 1000 40 5.0 Rural Tertiary
## 342 1050 40 5.0 Rural Tertiary
## 343 750 55 5.0 Rural Tertiary
## 344 1000 55 5.0 Rural Tertiary
## 345 1050 55 5.0 Rural Tertiary
## 346 750 8 7.9 Rural Tertiary
## 347 1000 8 7.9 Rural Tertiary
## 348 1050 8 7.9 Rural Tertiary
## 349 750 12 7.9 Rural Tertiary
## 350 1000 12 7.9 Rural Tertiary
## 351 1050 12 7.9 Rural Tertiary
## 352 750 25 7.9 Rural Tertiary
## 353 1000 25 7.9 Rural Tertiary
## 354 1050 25 7.9 Rural Tertiary
## 355 750 40 7.9 Rural Tertiary
## 356 1000 40 7.9 Rural Tertiary
## 357 1050 40 7.9 Rural Tertiary
## 358 750 55 7.9 Rural Tertiary
## 359 1000 55 7.9 Rural Tertiary
## 360 1050 55 7.9 Rural Tertiary
library(DT)
pred.haz <- augment(final.m, newdata = new_data)
pred.haz |> datatable()
income.effect <- expand.grid(
household_income = seq(1000, 6000, by = 500),
residence = factor(c("Urban", "Rural"),
levels = c("Urban", "Rural"))
)
income.effect$age_months <- 24
income.effect$dietary_diversity <- 4
income.effect$maternal_education <- factor ("Secondary",
levels = c("Secondary", "Tertiary"))
income.effect$Predicted_score <- predict(final.m, newdata = income.effect)
ggplot(income.effect,
aes(x = household_income,
y = Predicted_score,
color = residence,
group = residence)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
scale_color_manual(
values = c("Urban" = "green", "Rural" = "red")
)
labs(
title = "Effect of Household Income on Height-for-age (Z-score)",
x = "Household Income (RM)",
y = "Predicted HAZ",
color = "residence"
) +
theme_minimal()
## NULL
Interpretation: The graph shows how HAZ changes across levels of household income by place of residence. The prediction indicate a positive association between household income and child linear growth in urban and rural settings. However, the predicted gains in HAZ associated with increasing income greater in urban areas, resulting in consistently higher predicted HAZ among urban children and suggesting persistent contextual differences in rural settings.
This study examined the determinants of linear growth (HAZ) among children aged 6–59 months using a multivariable linear regression model. The final model demonstrated strong explanatory power, accounting for approximately 60% of the variation in HAZ (adjusted R² = 0.601), indicating a good model fit.
Child age was significantly and negatively associated with HAZ, with each additional month of age associated with a small but meaningful decline in HAZ, suggesting progressive growth faltering over early childhood. This finding is consistent with the cumulative effects of nutritional inadequacy, recurrent infections, and environmental exposures as children grow older.
Socioeconomic factors played a critical role in child linear growth. Household income showed a positive association with HAZ, indicating that higher economic resources are linked to better growth outcomes. Importantly, the significant interaction between household income and urban residence suggests that the beneficial effect of income on child growth is stronger in urban settings, highlighting contextual differences in how economic advantages translate into nutritional gains.
Dietary quality was also a key determinant. Higher dietary diversity was strongly associated with improved HAZ, reinforcing the importance of access to a variety of food groups for optimal child growth beyond caloric sufficiency alone.
Maternal education emerged as one of the strongest predictors of child linear growth. Compared with low maternal education, children of mothers with secondary and tertiary education had substantially higher HAZ scores, demonstrating a clear dose–response relationship. This underscores the role of maternal knowledge, caregiving practices, and health-seeking behaviour in shaping child nutrition outcomes.
From a public health perspective, the results underscore the need for integrated interventions that go beyond single factor approaches. Efforts to improve child linear growth should therefore combine nutrition-specific actions (such as improving dietary diversity and appropriate feeding practices) with nutrition-sensitive strategies (including maternal education support, poverty alleviation, and improved access to health and sanitation). Such a holistic approach is fundamental for addressing the underlying determinants of poor linear growth and advancing child health outcomes.
In conclusion, addressing child growth faltering among children aged 6–59 months requires multisectoral policies and targeted interventions that are responsive to the complex set of determinants identified in this analysis. Tailoring public health programming to the most salient predictors in the study population will be essential to promote optimal growth and reduce the burden of growth deficits during early childhood.