1. Introduction

Research question

What are the factors associated with height-for-age Z-score (HAZ) among children aged 6 to 59 months?

Objective

To examine factors associated with height-for-age Z-score (HAZ) among children aged 6 to 59 months.

Variables in dataset

N= 500 Population: Children aged 6 to 59 months. Outcome: Height-for-age Z-score (HAZ) Predictors: - Monthly household income (RM) - Dietary diversity (food group count) - Residence (Urban/Rural) - Maternal education (Low/Secondary/Tertiary)

2. Prepare environment and read data

2.1 Load packages

library(tidyverse)
library(readxl)
library(broom)
library(gtsummary)
library(dplyr)
library(corrplot)
library(knitr)
library(MASS)
library(broom.helpers)

2.2 Read data

data.child <- read_excel('child_nutrition.xlsx')

glimpse(data.child)

## Rows: 500
## Columns: 7
## $ id                 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ age_months         <dbl> 9, 55, 12, 8, 41, 9, 27, 50, 6, 55, 44, 6, 34, 8, 1…
## $ household_income   <dbl> 1697, 3487, 4519, 2256, 5647, 3869, 1495, 3288, 874…
## $ dietary_diversity  <dbl> 4.7, 2.5, 4.9, 3.5, 7.4, 3.6, 3.5, 4.1, 3.6, 3.4, 4…
## $ residence          <dbl> 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
## $ maternal_education <chr> "Low", "Secondary", "Tertiary", "Low", "Low", "Low"…
## $ haz                <dbl> 1.46, -0.31, 2.02, -0.61, 1.54, 0.94, -0.71, -1.80,…

2.3 Wrangling data

Convert residence and maternal education to factors with ordered levels

data.child <- data.child |> 
  mutate (
    residence = factor(
    residence,
    levels = c(0, 1),
    labels = c("Rural", "Urban")
  ))
  
data.child <- data.child |>
  mutate(
    maternal_education = factor(maternal_education, levels = c("Low", "Secondary", "Tertiary")))

glimpse(data.child)

## Rows: 500
## Columns: 7
## $ id                 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ age_months         <dbl> 9, 55, 12, 8, 41, 9, 27, 50, 6, 55, 44, 6, 34, 8, 1…
## $ household_income   <dbl> 1697, 3487, 4519, 2256, 5647, 3869, 1495, 3288, 874…
## $ dietary_diversity  <dbl> 4.7, 2.5, 4.9, 3.5, 7.4, 3.6, 3.5, 4.1, 3.6, 3.4, 4…
## $ residence          <fct> Urban, Urban, Urban, Rural, Rural, Rural, Urban, Ur…
## $ maternal_education <fct> Low, Secondary, Tertiary, Low, Low, Low, Secondary,…
## $ haz                <dbl> 1.46, -0.31, 2.02, -0.61, 1.54, 0.94, -0.71, -1.80,…

3 Explore data analysis (EDA)

3.1 Summary Statistics

summary(data.child)

##        id          age_months    household_income dietary_diversity residence  
##  Min.   :  1.0   Min.   : 6.00   Min.   : 500     Min.   :1.500     Rural:188  
##  1st Qu.:125.8   1st Qu.:18.00   1st Qu.:2131     1st Qu.:3.800     Urban:312  
##  Median :250.5   Median :32.00   Median :2968     Median :4.600                
##  Mean   :250.5   Mean   :32.01   Mean   :2977     Mean   :4.564                
##  3rd Qu.:375.2   3rd Qu.:45.00   3rd Qu.:3773     3rd Qu.:5.300                
##  Max.   :500.0   Max.   :59.00   Max.   :6776     Max.   :8.200                
##  maternal_education      haz         
##  Low      :212      Min.   :-2.4600  
##  Secondary:191      1st Qu.:-0.2700  
##  Tertiary : 97      Median : 0.4650  
##                     Mean   : 0.4793  
##                     3rd Qu.: 1.1950  
##                     Max.   : 4.4600

3.2 Descriptive analysis

data.child |> 
  dplyr::select (-id) |>
  gtsummary::tbl_summary(
    label = list(age_months ~ 'Age (months)',
                 household_income ~ 'Household Income (RM)',
                 dietary_diversity ~ 'Dietary Diversity (Food Group)',
                 residence ~ 'Residence',
                 maternal_education ~ 'Maternal Education',
                 haz ~ 'Height-for-age (Z-score)'),
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                      all_categorical() ~ "{n} ({p}%)")
  )

Characteristic	N = 500¹
Age (months)	32 (16)
Household Income (RM)	2,977 (1,162)
Dietary Diversity (Food Group)	4.56 (1.15)
Residence
Rural	188 (38%)
Urban	312 (62%)
Maternal Education
Low	212 (42%)
Secondary	191 (38%)
Tertiary	97 (19%)
Height-for-age (Z-score)	0.48 (1.11)
¹ Mean (SD); n (%)

3.3 Dataset distribution

Histogram for continuous variables

Histograms were used to assess the distribution and identify skewness and outliers in continuous variables.

Histogram for age

ggplot(data.child, aes(age_months)) +
  geom_histogram()

Interpretation:

Age demonstrated a broad distribution (adequate variability) with no evidence of extreme outliers.

Histogram for household income

ggplot(data.child, aes(household_income)) +
  geom_histogram()

Interpretation:

Household income exhibited a positively skewed distribution, with most households in the lower to middle income range and a small number of high-income households.

Histogram for dietary diversity

ggplot(data.child, aes(dietary_diversity)) +
  geom_histogram()

Interpretation:

Dietary diversity scores were approximately normally distributed, with most children consuming foods from four to six food groups.

Boxplots for categorical variables

Boxplots were used to compare the distribution of outcome across categorical variables.

Boxplot for residence

ggplot(data.child, aes(x = residence, y = haz)) +
  geom_boxplot(fill = "lightblue") +
  geom_hline(yintercept = 0, color = "red")

Interpretation:

HAZ differed by place of residence. Urban children had a higher median HAZ compared with rural children, although greater variability was observed among urban children.

Boxplot for maternal education

ggplot(data.child, aes(x = maternal_education, y = haz)) +
  geom_boxplot(fill = "lightblue") +
  geom_hline(yintercept = 0, color = "red")

Interpretation:

HAZ increased with higher maternal education levels. Children of mothers with tertiary education had the highest median HAZ, followed by those with secondary education, while the lowest median HAZ was observed among children of mothers with low education.

3.4 Dataset relationships

Scatter plots between variables

Scatter plots were used to visually assess the linearity of relationships between continuous predictors and the outcome.

HAZ vs age

data.child2 <- data.child |> 
  dplyr::select (age_months, household_income, dietary_diversity, haz)

ggplot(data.child2, aes(x = age_months, y = haz)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Height-for-age vs Age",
       x = "Age (months)",
       y = "Height-for-age (Z-score)") +
  theme_minimal()

Interpretation:

HAZ showed a gradual decline with increasing age, indicating cumulative growth faltering over time. Although the association was relatively weak, the overall pattern supports inclusion of age as a covariate in multivariable models.

HAZ vs household income

ggplot(data.child2, aes(x = household_income, y = haz)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Height-for-age vs Household Income",
       x = "Household Income (RM)",
       y = "Height-for-age (Z-score)") +
  theme_minimal()

Interpretation:

HAZ increased with higher household income, indicating better linear growth among children from higher-income households. The relationship appeared approximately linear, supporting the use of linear regression analysis.

HAZ vs dietary diversity

ggplot(data.child2, aes(x = dietary_diversity, y = haz)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Height-for-age vs Dietary Diversity",
       x = "Dietary Diversity (food group count)",
       y = "Height-for-age (Z-score)") +
  theme_minimal()

Interpretation:

HAZ increased with greater dietary diversity, indicating improved linear growth among children consuming a more varied diet. The relationship appeared approximately linear, supporting inclusion of dietary diversity as a continuous variable in regression models.

Correlation matrix

Correlation matrix was used to examine bivariate relationships and assess potential multicollinearity. This steps serves as an initial screening, as multicollinearity will be formally assessed using variance inflation factors during model diagnostics.

data.child2 <- data.child |> 
  dplyr::select (age_months, household_income, dietary_diversity, haz)

cor.data <- 
  cor(data.child2,
      use = "complete.obs", 
      method = "pearson")
head(round(cor.data,2))

##                   age_months household_income dietary_diversity   haz
## age_months              1.00             0.02              0.01 -0.18
## household_income        0.02             1.00              0.42  0.56
## dietary_diversity       0.01             0.42              1.00  0.52
## haz                    -0.18             0.56              0.52  1.00

HAZ showed a weak negative correlation with child age (r = -0.18) and moderate positive correlations with both household income (r = 0.56) and dietary diversity (r = 0.52). Household income was also moderately correlated with dietary diversity (r = 0.42), supporting its role as an upstream determinant of child diet quality. No strong correlations were observed between child age and other variables, indicating minimal collinearity. This preliminary correlation support the inclusion of all variables in subsequent multivariable modelling.

This is the correlogram:

cor_matrix <- cor(data.child2 |> select_if(is.numeric), use = "complete.obs")
corrplot(cor_matrix,
         type = "upper",
         method = "circle",
         addCoef.col = "black",
         tl.col = "black",
         tl.srt = 45)

4 Analysis data

4.1 Univariable analysis

Univariable linear regression analyses were conducted to examine the association between each independent variable and HAZ. Age in months, household income, dietary diversity, residence and maternal education were analysed individually.

4.1.1 Age (months)

slr.age <- lm(haz ~ age_months, data = data.child)

tidy(slr.age, conf.int = TRUE) |> 
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 2 × 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)   0.891    0.110        8.12 <0.001    0.676    1.11   
## 2 age_months   -0.0129   0.00307     -4.19 <0.001   -0.0189  -0.00684

Interpretation:

Child age was significantly associated with HAZ. Each additional month of age was associated with a 0.013-unit decrease in HAZ (p < 0.001, 95% CI: −0.019, −0.007), indicating progressive growth faltering with increasing age.

4.1.2 Household income (RM)

slr.income <- lm(haz ~ household_income, data = data.child)

tidy(slr.income, conf.int = TRUE) |> 
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 2 × 7
##   term              estimate std.error statistic p.value  conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl> <chr>       <dbl>     <dbl>
## 1 (Intercept)      -1.12     0.114         -9.86 <0.001  -1.35     -0.898   
## 2 household_income  0.000538 0.0000356     15.1  <0.001   0.000468  0.000608

Interpretation:

Household income was positively associated with HAZ. Each RM 100 increase in monthly household income was associated with a 0.054-unit increase in HAZ (p < 0.001, 95% CI: 0.047, 0.061), indicating better linear growth among children from higher-income households.

4.1.3 Dietary diversity (Food group count)

slr.diet <- lm(haz ~ dietary_diversity, data = data.child)

tidy(slr.diet, conf.int = TRUE) |> 
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 2 × 7
##   term              estimate std.error statistic p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)         -1.84     0.174      -10.6 <0.001    -2.19     -1.50 
## 2 dietary_diversity    0.509    0.0370      13.8 <0.001     0.436     0.582

Interpretation:

Dietary diversity was positively associated with HAZ. Each additional food group consumed was associated with a 0.51-unit increase in HAZ (p < 0.001, 95% CI: 0.44, 0.58).

4.1.4 Residence

slr.residence <- lm(haz ~ residence, data = data.child)

tidy(slr.residence, conf.int = TRUE) |> 
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 2 × 7
##   term           estimate std.error statistic p.value conf.low conf.high
##   <chr>             <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)      0.0432    0.0775     0.557 0.58      -0.109     0.196
## 2 residenceUrban   0.699     0.0982     7.12  <0.001     0.506     0.892

Interpretation:

HAZ differed significantly by place of residence. Urban children had a 0.70-unit higher mean HAZ compared with rural children (p < 0.001, 95% CI: 0.51, 0.89).

4.1.5 Maternal education

slr.mom <- lm(haz ~ maternal_education, data = data.child)

tidy(slr.mom, conf.int = TRUE) |> 
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 3 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)               0.142    0.0737      1.93 0.054   -0.00227     0.287
## 2 maternal_educationSec…    0.475    0.107       4.44 <0.001   0.265       0.685
## 3 maternal_educationTer…    0.801    0.131       6.09 <0.001   0.543       1.06

Interpretation:

Maternal education was significantly associated with HAZ. Compared with children of mothers with low education, children of mothers with secondary education had a 0.47-unit higher mean HAZ (p< 0.001, 95% CI: 0.26, 0.69 ), while those whose mothers had tertiary education had a 0.80-unit higher mean HAZ (p < 0.001, 95% CI: 0.54, 1.06).

4.1.6 Univariable analysis summary table

tbl_univ <- 
  data.child |> 
  dplyr::select(-id
                ) |>
  gtsummary::tbl_uvregression(
    method = lm,
    y = haz,
    label = list(age_months ~ 'Age (months)',
                 household_income ~ 'Household Income (RM)',
                 dietary_diversity ~ 'Dietary Diversity (Food Group)',
                 residence ~ 'Residence',
                 maternal_education ~ 'Maternal Education')
    ) |>
  modify_fmt_fun(
    estimate ~ function(x) style_number(x, digits = 6),
    conf.low ~ function(x) style_number(x, digits = 6),
    conf.high ~ function(x) style_number(x, digits = 6)
  ) |>
  gtsummary::bold_p() 

tbl_univ

Characteristic	N	Beta	95% CI	p-value
Age (months)	500	-0.012875	-0.018906, -0.006844	<0.001
Household Income (RM)	500	0.000538	0.000468, 0.000608	<0.001
Dietary Diversity (Food Group)	500	0.509211	0.436475, 0.581947	<0.001
Residence	500
Rural		—	—
Urban		0.698860	0.505993, 0.891727	<0.001
Maternal Education	500
Low		—	—
Secondary		0.474929	0.264714, 0.685145	<0.001
Tertiary		0.801052	0.542754, 1.059351	<0.001
Abbreviation: CI = Confidence Interval

4.2 Multivariable analysis

Multivariable analysis was guided by the purposeful selection approach, which balances statistical criteria with theoretical relevance and confounding assessment. Stepwise selection was additionally performed as a comparative analysis to illustrate differences between theory-driven and automated model-building approaches.

4.2.1 Purposeful selection approach

STEP 1: Fit a model with significant variables or clinical importance

Based on the univariable analysis in previous section, we identified all independent variables as statistically significant predictors (p < 0.001). Therefore, all predictors were included in the

m.all <- lm(haz ~ age_months + household_income + dietary_diversity + residence + maternal_education, data = data.child)

tidy(m.all, conf.int = TRUE) |>
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 7 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)            -2.36e+0 0.155        -15.2  <0.001  -2.67e+0 -2.06    
## 2 age_months             -1.28e-2 0.00200       -6.39 <0.001  -1.67e-2 -0.00884 
## 3 household_income        4.26e-4 0.0000306     13.9  <0.001   3.66e-4  0.000486
## 4 dietary_diversity       2.73e-1 0.0322         8.49 <0.001   2.10e-1  0.337   
## 5 residenceUrban          7.51e-1 0.0663        11.3  <0.001   6.21e-1  0.882   
## 6 maternal_educationSec…  3.29e-1 0.0726         4.53 <0.001   1.86e-1  0.472   
## 7 maternal_educationTer…  7.15e-1 0.0922         7.76 <0.001   5.34e-1  0.896

STEP 2: Identify key exposure In this analysis

In this analysis, household income was identified as the key exposure because it is an upstream, theory-driven factor with a central role in influencing child growth (Outcome = Height-for-age (HAZ)). Therefore, household income will be included in all proposed model.

STEP 3: Assess confounding

Decision for confounder/s is based on theory, whether:

1. It is related to household income. 2. It is related to HAZ. 3. It is not caused by household income.

Confounding factors were assessed by observing changes in the exposure effect when a variable is added to the model, not using p-value.

Model 1: household income only

m.income <- lm(haz ~ household_income, data = data.child)

tidy(m.income, conf.int = TRUE) |> 
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 2 × 7
##   term              estimate std.error statistic p.value  conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl> <chr>       <dbl>     <dbl>
## 1 (Intercept)      -1.12     0.114         -9.86 <0.001  -1.35     -0.898   
## 2 household_income  0.000538 0.0000356     15.1  <0.001   0.000468  0.000608

Model 2: household income + child age in months

m.income.age <- lm(haz ~ household_income + age_months, data = data.child)

tidy(m.income.age, conf.int = TRUE) |> 
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 3 × 7
##   term              estimate std.error statistic p.value  conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl> <chr>       <dbl>     <dbl>
## 1 (Intercept)      -0.699    0.136         -5.14 <0.001  -0.966    -0.432   
## 2 household_income  0.000541 0.0000347     15.6  <0.001   0.000473  0.000609
## 3 age_months       -0.0135   0.00252       -5.35 <0.001  -0.0184   -0.00852

Model 3: household income + maternal education

m.income.edu <- lm(haz ~ household_income  + maternal_education, data = data.child)

tidy(m.income.edu, conf.int = TRUE) |> 
  mutate(p.value = format.pval(p.value, digits = 2, eps = 0.001))

## # A tibble: 4 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)            -1.46e+0 0.116        -12.6  <0.001  -1.69e+0 -1.23    
## 2 household_income        5.41e-4 0.0000335     16.1  <0.001   4.75e-4  0.000607
## 3 maternal_educationSec…  4.25e-1 0.0868         4.90 <0.001   2.55e-1  0.596   
## 4 maternal_educationTer…  8.45e-1 0.107          7.93 <0.001   6.36e-1  1.05

We compare the values of estimated coefficients in these 3 models and if a significant percentage change indicates that the variables are the confounder.

% change in household income vs age in months

coeff_income <- tidy(m.income)$estimate [2]

coeff_income_age <- tidy(m.income.age)$estimate [2]

coeff_income_edu <- tidy(m.income.edu)$estimate [2]
 
pct_change_age <- ((coeff_income_age - coeff_income) / coeff_income) * 100

pct_change_age

## [1] 0.5233729

% change in household income vs maternal education

pct_change_edu <- ((coeff_income_edu - coeff_income) / coeff_income) * 100

pct_change_edu

## [1] 0.6511423

Interpretation:

A variable is considered a confounder if adjustment leads to a ≥10% change in the exposure coefficient. Neither age nor maternal education materially changed the income–HAZ association, with percentage changes of 0.52% and 0.65% respectively, indicating minimal confounding. Nevertheless, child age and maternal education were retained in adjusted models based on their theoretical relevance in child growth.

*Residence was not included in confounding assessment because it was considered a potential effect modifier rather than a confounder. The relationship between household income and child growth may differ between urban and rural settings due to differences in food access, cost of living, and caregiving environments. Therefore, residence was evaluated through an interaction term with household income.

Dietary diversity was also excluded from the confounding assessment as it lies on the causal pathway between household income and child growth. Thus dietary diversity was considered as a potential mediator.

STEP 4: Preliminary final model

Since no predictors were excluded, the preliminary final model is the full model from Step 1.

The fact that no predictors were removed, it indicates that the all the predictors independently contribute to the outcome and that the final model appropriately reflects the underlying relationship.

4.2.2 Stepwise selection approach

Now we explore the stepwise approach as an alternative for variable selection to compare with the purposeful selection approach

Backward

m.all <- lm(haz ~ age_months + household_income + dietary_diversity + residence + maternal_education, data = data.child)
step.bw <- MASS::stepAIC(m.all, direction = "backward")

## Start:  AIC=-331.21
## haz ~ age_months + household_income + dietary_diversity + residence + 
##     maternal_education
## 
##                      Df Sum of Sq    RSS     AIC
## <none>                            250.68 -331.21
## - age_months          1    20.744 271.43 -293.46
## - maternal_education  2    31.680 282.36 -275.71
## - dietary_diversity   1    36.670 287.35 -264.95
## - residence           1    65.188 315.87 -217.64
## - household_income    1    98.360 349.04 -167.71

summary(step.bw)

## 
## Call:
## lm(formula = haz ~ age_months + household_income + dietary_diversity + 
##     residence + maternal_education, data = data.child)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.07322 -0.47134  0.04735  0.46983  2.30035 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -2.361e+00  1.550e-01 -15.235  < 2e-16 ***
## age_months                  -1.277e-02  1.999e-03  -6.387 3.91e-10 ***
## household_income             4.262e-04  3.065e-05  13.908  < 2e-16 ***
## dietary_diversity            2.733e-01  3.218e-02   8.492 2.41e-16 ***
## residenceUrban               7.512e-01  6.635e-02  11.323  < 2e-16 ***
## maternal_educationSecondary  3.289e-01  7.262e-02   4.529 7.46e-06 ***
## maternal_educationTertiary   7.154e-01  9.215e-02   7.763 4.83e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7131 on 493 degrees of freedom
## Multiple R-squared:  0.5958, Adjusted R-squared:  0.5909 
## F-statistic: 121.1 on 6 and 493 DF,  p-value: < 2.2e-16

Interpretation:

Backward stepwise AIC selection indicated that all included variables contributed to model fit, as removal of any variable increased the AIC. Therefore, the full model was retained.

Forward

haz.only.m <- lm(haz ~ 1, data = data.child)

step.fw <- MASS::stepAIC(haz.only.m, 
                         direction = "forward",
                         scope = list (upper = m.all,
                                       lower = haz.only.m))

## Start:  AIC=109.76
## haz ~ 1
## 
##                      Df Sum of Sq    RSS     AIC
## + household_income    1   194.891 425.36 -76.831
## + dietary_diversity   1   170.763 449.49 -49.245
## + residence           1    57.296 562.96  63.300
## + maternal_education  2    48.599 571.66  72.965
## + age_months          1    21.163 599.09  94.404
## <none>                            620.26 109.762
## 
## Step:  AIC=-76.83
## haz ~ household_income
## 
##                      Df Sum of Sq    RSS      AIC
## + dietary_diversity   1    62.802 362.56 -154.705
## + residence           1    54.619 370.75 -143.547
## + maternal_education  2    50.536 374.83 -136.069
## + age_months          1    23.163 402.20 -102.827
## <none>                            425.36  -76.831
## 
## Step:  AIC=-154.71
## haz ~ household_income + dietary_diversity
## 
##                      Df Sum of Sq    RSS     AIC
## + residence           1    58.063 304.50 -239.97
## + age_months          1    23.410 339.15 -186.08
## + maternal_education  2    24.230 338.33 -185.29
## <none>                            362.56 -154.71
## 
## Step:  AIC=-239.97
## haz ~ household_income + dietary_diversity + residence
## 
##                      Df Sum of Sq    RSS     AIC
## + maternal_education  2    33.072 271.43 -293.46
## + age_months          1    22.136 282.36 -275.71
## <none>                            304.50 -239.97
## 
## Step:  AIC=-293.46
## haz ~ household_income + dietary_diversity + residence + maternal_education
## 
##              Df Sum of Sq    RSS     AIC
## + age_months  1    20.744 250.68 -331.21
## <none>                    271.43 -293.46
## 
## Step:  AIC=-331.21
## haz ~ household_income + dietary_diversity + residence + maternal_education + 
##     age_months

summary(step.fw)

## 
## Call:
## lm(formula = haz ~ household_income + dietary_diversity + residence + 
##     maternal_education + age_months, data = data.child)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.07322 -0.47134  0.04735  0.46983  2.30035 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -2.361e+00  1.550e-01 -15.235  < 2e-16 ***
## household_income             4.262e-04  3.065e-05  13.908  < 2e-16 ***
## dietary_diversity            2.733e-01  3.218e-02   8.492 2.41e-16 ***
## residenceUrban               7.512e-01  6.635e-02  11.323  < 2e-16 ***
## maternal_educationSecondary  3.289e-01  7.262e-02   4.529 7.46e-06 ***
## maternal_educationTertiary   7.154e-01  9.215e-02   7.763 4.83e-14 ***
## age_months                  -1.277e-02  1.999e-03  -6.387 3.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7131 on 493 degrees of freedom
## Multiple R-squared:  0.5958, Adjusted R-squared:  0.5909 
## F-statistic: 121.1 on 6 and 493 DF,  p-value: < 2.2e-16

Interpretation:

Forward stepwise selection sequentially added predictors based on their contribution to model fit. The algorithm selected all predictors. This confirms that all predictors were all independently associated with higher HAZ. Thus, all predictors were retained in the model.

Both direction

step.both <- MASS::stepAIC(haz.only.m, 
                           direction = "both",
                           scope = list (upper = m.all,
                                         lower = haz.only.m))

## Start:  AIC=109.76
## haz ~ 1
## 
##                      Df Sum of Sq    RSS     AIC
## + household_income    1   194.891 425.36 -76.831
## + dietary_diversity   1   170.763 449.49 -49.245
## + residence           1    57.296 562.96  63.300
## + maternal_education  2    48.599 571.66  72.965
## + age_months          1    21.163 599.09  94.404
## <none>                            620.26 109.762
## 
## Step:  AIC=-76.83
## haz ~ household_income
## 
##                      Df Sum of Sq    RSS      AIC
## + dietary_diversity   1    62.802 362.56 -154.705
## + residence           1    54.619 370.75 -143.547
## + maternal_education  2    50.536 374.83 -136.069
## + age_months          1    23.163 402.20 -102.827
## <none>                            425.36  -76.831
## - household_income    1   194.891 620.26  109.762
## 
## Step:  AIC=-154.71
## haz ~ household_income + dietary_diversity
## 
##                      Df Sum of Sq    RSS      AIC
## + residence           1    58.063 304.50 -239.969
## + age_months          1    23.410 339.15 -186.079
## + maternal_education  2    24.230 338.33 -185.289
## <none>                            362.56 -154.705
## - dietary_diversity   1    62.802 425.36  -76.831
## - household_income    1    86.929 449.49  -49.245
## 
## Step:  AIC=-239.97
## haz ~ household_income + dietary_diversity + residence
## 
##                      Df Sum of Sq    RSS     AIC
## + maternal_education  2    33.072 271.43 -293.46
## + age_months          1    22.136 282.36 -275.71
## <none>                            304.50 -239.97
## - residence           1    58.063 362.56 -154.71
## - dietary_diversity   1    66.245 370.75 -143.55
## - household_income    1    83.547 388.05 -120.74
## 
## Step:  AIC=-293.46
## haz ~ household_income + dietary_diversity + residence + maternal_education
## 
##                      Df Sum of Sq    RSS     AIC
## + age_months          1    20.744 250.68 -331.21
## <none>                            271.43 -293.46
## - maternal_education  2    33.072 304.50 -239.97
## - dietary_diversity   1    36.172 307.60 -232.90
## - residence           1    66.905 338.33 -185.29
## - household_income    1    97.709 369.14 -141.72
## 
## Step:  AIC=-331.21
## haz ~ household_income + dietary_diversity + residence + maternal_education + 
##     age_months
## 
##                      Df Sum of Sq    RSS     AIC
## <none>                            250.68 -331.21
## - age_months          1    20.744 271.43 -293.46
## - maternal_education  2    31.680 282.36 -275.71
## - dietary_diversity   1    36.670 287.35 -264.95
## - residence           1    65.188 315.87 -217.64
## - household_income    1    98.360 349.04 -167.71

summary(step.both)

## 
## Call:
## lm(formula = haz ~ household_income + dietary_diversity + residence + 
##     maternal_education + age_months, data = data.child)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.07322 -0.47134  0.04735  0.46983  2.30035 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -2.361e+00  1.550e-01 -15.235  < 2e-16 ***
## household_income             4.262e-04  3.065e-05  13.908  < 2e-16 ***
## dietary_diversity            2.733e-01  3.218e-02   8.492 2.41e-16 ***
## residenceUrban               7.512e-01  6.635e-02  11.323  < 2e-16 ***
## maternal_educationSecondary  3.289e-01  7.262e-02   4.529 7.46e-06 ***
## maternal_educationTertiary   7.154e-01  9.215e-02   7.763 4.83e-14 ***
## age_months                  -1.277e-02  1.999e-03  -6.387 3.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7131 on 493 degrees of freedom
## Multiple R-squared:  0.5958, Adjusted R-squared:  0.5909 
## F-statistic: 121.1 on 6 and 493 DF,  p-value: < 2.2e-16

Interpretation:

Using bidirectional (stepwise) selection, all predictors were retained in the final multivariable model, as their inclusion improved model fit. The consistency across all three stepwise approaches (backward, forward, and both) provides strong evidence for this model specification.

Compare results (ANOVA table)

anova_combined <- bind_rows(
  as.data.frame(step.bw$anova) |> mutate (Method = "Backward"),
  as.data.frame(step.fw$anova) |> mutate (Method = "Forward"),
  as.data.frame(step.both$anova) |> mutate (Method = "Both"),
) |>
relocate(Method, before = 1)

anova_combined

##      Method               before Df  Deviance Resid. Df Resid. Dev        AIC
## 1  Backward                      NA        NA       493   250.6835 -331.20844
## 2   Forward                      NA        NA       499   620.2555  109.76173
## 3   Forward   + household_income  1 194.89098       498   425.3646  -76.83075
## 4   Forward  + dietary_diversity  1  62.80182       497   362.5627 -154.70528
## 5   Forward          + residence  1  58.06290       496   304.4998 -239.96877
## 6   Forward + maternal_education  2  33.07244       494   271.4274 -293.45670
## 7   Forward         + age_months  1  20.74389       493   250.6835 -331.20844
## 8      Both                      NA        NA       499   620.2555  109.76173
## 9      Both   + household_income  1 194.89098       498   425.3646  -76.83075
## 10     Both  + dietary_diversity  1  62.80182       497   362.5627 -154.70528
## 11     Both          + residence  1  58.06290       496   304.4998 -239.96877
## 12     Both + maternal_education  2  33.07244       494   271.4274 -293.45670
## 13     Both         + age_months  1  20.74389       493   250.6835 -331.20844

Interpretation:

All three selection methods (forward, backward, and both) consistently identified household income, child age in months, dietary diversity, residence and maternal education as significant predictors of HAZ.

5 Model assessment

5.1 Model building

5.1.1 Confounder assessment

We compare the values of estimated coefficient change between model with key exposures only, and key exposures with the respective potential confounder.

In this analysis, potential confounders are:

Age in months
Maternal education

so, model to be compared were:

Household income only
Household income + age in months
Household income + maternal education

coeff_compare_table <- data.frame(
  Model = c("Income only", "Income + Age in months", "Income + Maternal education"), 
  Estimated_coefficients = c(coeff_income, coeff_income_age, coeff_income_edu),
  Percent_change = c(NA, pct_change_age, pct_change_edu)
)

coeff_compare_table

##                         Model Estimated_coefficients Percent_change
## 1                 Income only           0.0005377869             NA
## 2      Income + Age in months           0.0005406015      0.5233729
## 3 Income + Maternal education           0.0005412886      0.6511423

Interpretation:

5.1.2 Interaction assessment

Potential effect modification was assessed by testing the interaction terms in the model. Interaction terms were examined selectively based on theoretical plausibility rather than exhaustively testing all possible combinations, to avoid overfitting and maintain interpretability.

There were 3 potential interactions selected in this analysis.

Household income x residence (Continuous x categorical)
- Justification: The effect of income on child growth may differ between urban and rural settings
Household income x child age in months (Continuous x continuous)
- Justification: The effect of household income on HAZ differs by child age, because children at different ages have different nutritional needs.
Household income x maternal education (Continuous x categorical)
- Justification: Higher income may translate into better child growth only when mother have sufficient education to utilise household resources effectively.

Selection relevance:

Residence have been identified as potential effect modifier in the first place.
Since child age in months and maternal education have no effect as confounder, both were tested as interaction.
Dietary diversity is a mediator, thus not examined in interaction.

Interaction between household income and residence

ia.income.res <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + household_income*residence, data = data.child)

tidy(ia.income.res, conf.int = TRUE) |> 
       mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))

## # A tibble: 8 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)            -1.94e+0 0.191       -10.1   <0.001  -2.31e+0 -1.56    
## 2 household_income        2.90e-4 0.0000476     6.10  <0.001   1.97e-4  0.000384
## 3 age_months             -1.31e-2 0.00198      -6.63  <0.001  -1.70e-2 -0.00923 
## 4 dietary_diversity       2.74e-1 0.0318        8.61  <0.001   2.11e-1  0.336   
## 5 residenceUrban          1.26e-1 0.182         0.691 0.49    -2.31e-1  0.483   
## 6 maternal_educationSec…  3.09e-1 0.0719        4.30  <0.001   1.68e-1  0.451   
## 7 maternal_educationTer…  6.97e-1 0.0911        7.65  <0.001   5.18e-1  0.876   
## 8 household_income:resi…  2.10e-4 0.0000570     3.69  <0.001   9.83e-5  0.000322

Interpretation:

A significant interaction between household income and residence was observed (p < 0.001), indicating that the positive association between income and HAZ was stronger in urban than rural children.

Interaction between household income and child age in months

ia.income.age <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + household_income*age_months, data = data.child)

tidy(ia.income.age, conf.int = TRUE) |> 
       mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))

## # A tibble: 8 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)            -2.37e+0   2.27e-1  -10.4    <0.001  -2.82e+0  -1.92e+0
## 2 household_income        4.29e-4   6.31e-5    6.81   <0.001   3.05e-4   5.53e-4
## 3 age_months             -1.25e-2   5.71e-3   -2.18   0.0294  -2.37e-2  -1.26e-3
## 4 dietary_diversity       2.73e-1   3.22e-2    8.48   <0.001   2.10e-1   3.37e-1
## 5 residenceUrban          7.51e-1   6.65e-2   11.3    <0.001   6.21e-1   8.82e-1
## 6 maternal_educationSec…  3.29e-1   7.27e-2    4.52   <0.001   1.86e-1   4.72e-1
## 7 maternal_educationTer…  7.15e-1   9.23e-2    7.75   <0.001   5.34e-1   8.97e-1
## 8 household_income:age_… -1.01e-7   1.77e-6   -0.0569 0.9546  -3.58e-6   3.38e-6

Interpretation:

No evidence of effect modification by child age was observed (p = 0.95); therefore, the interaction term was excluded from the final model.

Interaction between household income and maternal education

ia.income.edu <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + household_income*maternal_education, data = data.child)

tidy(ia.income.edu, conf.int = TRUE) |> 
       mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))

## # A tibble: 9 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)            -2.53e+0 0.193       -13.1   <0.001  -2.91e+0  -2.15e+0
## 2 household_income        4.78e-4 0.0000494     9.69  <0.001   3.81e-4   5.75e-4
## 3 age_months             -1.27e-2 0.00199      -6.39  <0.001  -1.67e-2  -8.83e-3
## 4 dietary_diversity       2.76e-1 0.0321        8.58  <0.001   2.13e-1   3.39e-1
## 5 residenceUrban          7.56e-1 0.0664       11.4   <0.001   6.26e-1   8.87e-1
## 6 maternal_educationSec…  4.33e-1 0.204         2.12  0.0341   3.26e-2   8.34e-1
## 7 maternal_educationTer…  1.18e+0 0.234         5.04  <0.001   7.17e-1   1.64e+0
## 8 household_income:mate… -3.62e-5 0.0000634    -0.571 0.5679  -1.61e-4   8.83e-5
## 9 household_income:mate… -1.59e-4 0.0000738    -2.16  0.0315  -3.04e-4  -1.42e-5

Interpretation:

A significant interaction between household income and maternal education was observed for tertiary education (p = 0.032), indicating that the positive association between income and HAZ was weaker among children of mothers with tertiary education. This finding suggests that higher maternal education may mitigate the dependence of child growth on household income, possibly through improved caregiving practices and nutrition knowledge.

Although the interaction between household income and maternal education was explored, it was not retained in the final model because evidence of effect modification was limited to the tertiary education category and the magnitude of interaction was modest, as indicated by the small estimated coefficient for the income × maternal education (tertiary) term, despite statistical significance.

5.2 Model diagnostics

5.2.1 Preliminary final model

Based on the three interactions tested, the final model should include the interaction between household income and residence, while excluding the income × age and the income × maternal education interaction.

prelim_model <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + household_income*residence, data = data.child)

tidy(prelim_model, conf.int = TRUE) |> 
       mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))

## # A tibble: 8 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)            -1.94e+0 0.191       -10.1   <0.001  -2.31e+0 -1.56    
## 2 household_income        2.90e-4 0.0000476     6.10  <0.001   1.97e-4  0.000384
## 3 age_months             -1.31e-2 0.00198      -6.63  <0.001  -1.70e-2 -0.00923 
## 4 dietary_diversity       2.74e-1 0.0318        8.61  <0.001   2.11e-1  0.336   
## 5 residenceUrban          1.26e-1 0.182         0.691 0.49    -2.31e-1  0.483   
## 6 maternal_educationSec…  3.09e-1 0.0719        4.30  <0.001   1.68e-1  0.451   
## 7 maternal_educationTer…  6.97e-1 0.0911        7.65  <0.001   5.18e-1  0.876   
## 8 household_income:resi…  2.10e-4 0.0000570     3.69  <0.001   9.83e-5  0.000322

5.2.2 Assumption 1: Linearity of the relationship between predictors and residuals

Overall linearity

#Residuals vs Fitted plot
plot(prelim_model, which = 1)

Interpretation:

The residuals versus fitted plot shows a random scatter around zero with no clear pattern, indicating that the assumptions of linearity and homoscedasticity are reasonably met.

Linearity of each continuous predictor

library(gridExtra)

# Residuals vs Child age in month
p1 <- augment(prelim_model) |>
  ggplot(aes(x = age_months, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0, color = "red")

# Residuals vs Child household income
p2 <- augment(prelim_model) |>
  ggplot(aes(x = household_income, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0, color = "red")

# Residuals vs Child dietary diveristy
p3 <- augment(prelim_model) |>
  ggplot(aes(x = dietary_diversity, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0, color = "red")

grid.arrange (p1,p2, p3, ncol = 3)

Interpretation:

Residual plots against child age, household income, and dietary diversity showed random scatter around zero with no evident systematic patterns or curvature. This suggests that the linearity assumption for continuous predictors was reasonably met. The spread of residuals was relatively constant across the range of predictors, indicating no major violation of homoscedasticity.

5.2.3 Assumption 2: Normality of residuals

par(mfrow = c(1, 2))

data.res <- augment(prelim_model)

plot(prelim_model, which = 2)

hist(data.res$.resid)

shapiro.test(prelim_model$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  prelim_model$residuals
## W = 0.997, p-value = 0.4912

Interpretation:

The Q-Q plot shows that the residuals generally follow a straight line, indicating that they are approximately normally distributed.
The histogram of residuals appears roughly bell-shaped, further supporting the assumption of normality
The Shapiro-Wilk test was not statistically significant (p = 0.171), indicating that the residuals are approximately normally distributed.
Therefore, the normality assumption of linear regression is satisfied.

5.2.4 Assumption 3: Homoscedasticity (constant variance of residuals)

Visual diagnostic

Homoscedasticity was assessed by visual inspection of residuals versus fitted values and scale–location plots.

par(mfrow = c(1, 2))

plot(prelim_model, which = c(1, 3))

Interpretation:

Visual inspection of the residuals versus fitted values and scale–location plots showed a random scatter of residuals around zero with no obvious systematic pattern. The smooth lines in both plots were approximately horizontal, and the spread of residuals was relatively constant across fitted values. These findings indicate that the assumptions of linearity and homoscedasticity were reasonably met.

Breusch-Pagan test

library(lmtest)

bptest(prelim_model)

## 
##  studentized Breusch-Pagan test
## 
## data:  prelim_model
## BP = 2.588, df = 7, p-value = 0.9203

Interpretation:

The Breusch–Pagan test did not indicate evidence of heteroscedasticity (BP = 2.59, df = 7, p = 0.92), supporting the assumption of constant variance. This finding was consistent with visual inspection of residual and scale–location plots.

Box-Cox transformation

A Box–Cox transformation was considered to improve normality and variance homogeneity of the outcome. If lamda is outside the range of -1 to 1, transformation is needed due to validity of normality and homoscedasticity assumptions.

In this analysis, Box–Cox transformation was not applied because the outcome variable included negative values, violating the requirement for strictly positive responses. Model adequacy was instead assessed using visual diagnostics.

5.2.5 Check for multicollinearity

library(car)
vif(prelim_model)

##                                GVIF Df GVIF^(1/(2*Df))
## household_income           3.084380  1        1.756240
## age_months                 1.005642  1        1.002817
## dietary_diversity          1.341233  1        1.158116
## residence                  7.814222  1        2.795393
## maternal_education         1.132323  2        1.031556
## household_income:residence 9.709263  1        3.115969

Interpretation:

Multicollinearity was assessed using variance inflation factors. Adjusted GVIF values (GVIF^(1/(2·Df))) were examined due to the inclusion of categorical variables and interaction terms. All predictors had adjusted GVIF values below 5, indicating no evidence of problematic multicollinearity.

6 Check influentials

We used three diagnostic criteria to identify potentially influential observations:

Cook’s Distance.

Threshold: values above 1, or above \[\frac{4}{n}\]

Standardized Residuals

Threshold: values greater than 2 or less than −2

Leverage Threshold: values above \[\frac{2k+2}{n}\]

Where: \(n\) = number of observations \(k\) = number of predictors in the model

These thresholds help flag data points that may disproportionately influence model estimates or distort inference.

6.1 Identify influential observations and cook’s cutoff point

# Add diagnostics columns
child.inf <- augment(prelim_model, data.child)

# Define thresholds
n_obs <- nrow(data.child)
p_preds <- length(coef(prelim_model))
cooks_cutoff <- 4 / n_obs
leverage_cutoff <- (2 * p_preds + 2) / n_obs

# Filter Influential Observations
influential_obs <- child.inf %>%
  filter(.cooksd > cooks_cutoff | abs(.std.resid) > 2 | .hat > leverage_cutoff) 

# Print count
cat("Number of influential observation:", nrow(influential_obs), "\n")

## Number of influential observation: 43

cat("Cook's distance cutoff:", round(cooks_cutoff, 4), "\n")

## Cook's distance cutoff: 0.008

6.2 Cook’s distance plot

plot(prelim_model, which = 4)

library(DT)

child.inf |> 
  filter(.std.resid > 2 | .std.resid < -2 | .hat > 0.038) |> 
  datatable()

6.3 Leverage and standardized residuals

Residuals vs leverage plot

plot(prelim_model, which = 5)

Interpretation:

Using a Cook’s distance cutoff of 4/n, several observations were flagged as potentially influential. However, none exhibited Cook’s distance values approaching 1, and residuals versus leverage plots did not reveal observations with both high leverage and large residuals. Sensitivity analyses excluding the most influential observations did not materially alter model estimates, indicating that no individual observation exerted undue influence on the results.

final.m <- lm(haz ~ household_income + age_months + dietary_diversity + residence + maternal_education + (household_income*residence), data = data.child)

tidy(final.m, conf.int = TRUE) |>
  mutate(p.value = format.pval(p.value, digits = 3, eps = 0.001))

## # A tibble: 8 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
## 1 (Intercept)            -1.94e+0 0.191       -10.1   <0.001  -2.31e+0 -1.56    
## 2 household_income        2.90e-4 0.0000476     6.10  <0.001   1.97e-4  0.000384
## 3 age_months             -1.31e-2 0.00198      -6.63  <0.001  -1.70e-2 -0.00923 
## 4 dietary_diversity       2.74e-1 0.0318        8.61  <0.001   2.11e-1  0.336   
## 5 residenceUrban          1.26e-1 0.182         0.691 0.49    -2.31e-1  0.483   
## 6 maternal_educationSec…  3.09e-1 0.0719        4.30  <0.001   1.68e-1  0.451   
## 7 maternal_educationTer…  6.97e-1 0.0911        7.65  <0.001   5.18e-1  0.876   
## 8 household_income:resi…  2.10e-4 0.0000570     3.69  <0.001   9.83e-5  0.000322

7 Model presentation (Summary and interpretation)

library(gt)

tbl_regression(final.m,
               intercept = TRUE,
               label = list(household_income ~ "Household Income (RM)",
                            age_months ~ "Age (months)",
                            dietary_diversity ~ "Dietary Diversity (Food group count)", 
                            residence ~ "Residence", 
                            maternal_education ~ "Maternal Education")) |>
              bold_p(t = 0.05) |>
              bold_labels() |>
              add_glance_table(include = c(r.squared, adj.r.squared, AIC, BIC)) |>
              modify_fmt_fun(
                estimate ~ function(x) style_number(x, digits = 6),
                conf.low ~ function(x) style_number(x, digits = 6),
                conf.high ~ function(x) style_number(x, digits = 6)) |>
              modify_header(label = "**Variables**",
                            estimate = "**Adj. B**",
                            statistic = "**t-stat**") |>
              modify_caption("**Final Multivariable Linear Regression Model for Height-For-Age (Z-score)**")

**Final Multivariable Linear Regression Model for Height-For-Age (Z-score)**
Variables	Adj. B	t-stat	95% CI	p-value
(Intercept)	-1.939003	-10.1	-2.314524, -1.563481	<0.001
Household Income (RM)	0.000290	6.10	0.000197, 0.000384	<0.001
Age (months)	-0.013111	-6.63	-0.016993, -0.009228	<0.001
Dietary Diversity (Food group count)	0.273727	8.61	0.211293, 0.336161	<0.001
Residence
Rural	—	—	—
Urban	0.125640	0.691	-0.231419, 0.482699	0.5
Maternal Education
Low	—	—	—
Secondary	0.309225	4.30	0.167944, 0.450507	<0.001
Tertiary	0.696868	7.65	0.517805, 0.875931	<0.001
Household Income (RM) * Residence
Household Income (RM) * Urban	0.000210	3.69	0.000098, 0.000322	<0.001
R²	0.606725
Adjusted R²	0.601130
AIC	1,078.076916
BIC	1,116.008389
Abbreviation: CI = Confidence Interval

7.1 Final model and interpretation

7.1.1 Final Linear Regression Model equation:

\(\widehat{income} = (-1.9390) + 0.0003(\text{Income}) −0.0131(\text{Age}) + 0.2737(\text{Dietary Diversity}) + 0.1256 (\text{Urban}) + 0.3092 (\text{Secondary}) + 0.6969 (\text{Tertiary}) + 0.0002 (\text{Income × Urban})\)

Interpretation:

Model fit and prediction:

The model explained 60.7% of the variation in HAZ (R\(^2\) = 0.607), with a comparable adjusted R\(^2\) of 0.601, indicating strong explanatory performance after adjustment.
AIC and BIC indicate acceptable model parsimony.

Analysis of coefficents:

These values indicate the effect of each predictor on HAZ after adjusting for the other variables.

a.Household income (RM) - For each additional RM 100 in household income associated with an increase of 0.029 HAZ, indicate better growth among children from higher-income households.

Child age in months

Among children aged 6–59 months, increasing age was associated with a modest decline in HAZ. Each additional month of age corresponded to a 0.013-unit reduction in HAZ, suggesting age-related growth faltering consistent with the cumulative effects of nutritional and environmental exposures in early childhood.

Diet diversity (food group count)

Each additional food group consumed is associated with 0.27 unit increased in HAZ, reflecting the strong role of diet quality in supporting linear growth.

Residence (Urban vs Rural)

After adjustment, urban residence alone is not significantly associated with HAZ, suggesting that urban advantage is not uniform and depends on other factors (notably income).

Maternal Education

When compared children of mothers with low education level, those whose mothers had secondary education level had HAZ that were 0.31 units higher, while children of mothers with tertiary education level had z-scores that were 0.70 units higher. This demonstrates a dose-response relationship, highlighting education as a key determinant of child linear growth.

Household income x urban residence

The positive and highly significant interaction between household income and urban residence (β ≈ 0.00021; p < 0.001) indicates that the association between household income and child HAZ is stronger among children living in urban areas than those In rural areas.
This suggests that income is more effective in supporting child growth in urban settings, likely due to better access to food variety, healthcare, sanitation, and services.

Model assessment and diagnostics

The final model was validated to ensure the stability and reliability of the estimates.

Homoscedasticity: The studentized Breusch-Pagan test (\(\text{p = 0.92}\)) indicates we fail to reject the null hypothesis of constant variance, meaning the model is homoscedastic.

Multicollinearity: All predictors showed VIF values below 5, indicating no evidence of problematic multicollinearity.

Influence and outliers: 43 influential observations were identified (Cook’s Distance cutoff = 0.008). Observation 300 is the most influential, but leverage plots confirm it does not cross critical thresholds to be considered a harmful outlier.

7.2 Forest plot of final model coefficients

# Create the Forest Plot
sjPlot::plot_model(final.m, 
           type = "est", 
           show.values = TRUE, 
           value.offset = .3,
           title = "Forest Plot of Height-for-age (Z-score",
           axis.labels = c(
             "household_income:residence" = "Interaction: Income x Residence",
             "maternal_education" = "Maternal Education",
             "residence" = "Residence",
             "dietary_diversity" = "Dietary Diversity", 
             "age_months" = "Child age in month",
             "household_income" = "Household Income"
           ),
           vline.color = "red")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the sjPlot package.
##   Please report the issue at <https://github.com/strengejacke/sjPlot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpretation:

Child HAZ is strongly associated with socioeconomic and nutritional factors. Dietary diversity and maternal education show the largest positive effects, while child age is negatively associated with HAZ. The impact of household income on child growth is significantly modified by urban residence.

7.3 Model performance visualization

#  Actual vs Predicted
augmented_data <- augment(final.m)
ggplot(augmented_data, aes(x = haz, y = .fitted)) +
  geom_point(alpha = 0.4, color = "#667eea", size = 2) +
  geom_abline(intercept = 0, slope = 1, color = "#F57C00", 
              linewidth = 1.2, linetype = "dashed") +
  geom_smooth(method = "lm", se = FALSE, color = "#D32F2F", linewidth = 1) +
  labs(title = "Actual vs Predicted HAZ",
       x = "Actual HAZ (z-score)",
       y = "Predicted HAZ (z-score)") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5)) +
  annotate("text", x = (augmented_data$sbp) + 5, 
           y = max(augmented_data$.fitted) - 5,
           size = 5, fontface = "bold", color = "#D32F2F")

## Warning: Unknown or uninitialised column: `sbp`.

Interpretation:

The Actual versus Predicted HAZ plot shows that predicted values closely align with observed HAZ, indicating good model fit, an appropriate linear specification, and no major systematic prediction bias.

8 Prediction

8.1 Create new data for prediction

new_data <- expand.grid(household_income = c(750, 1000, 1050),
                        age_months = c(8, 12, 25, 40, 55),
                        dietary_diversity = c(2.0, 3.2, 5.0, 7.9), 
                        residence = c('Urban', 'Rural'),
                        maternal_education = c('Low', 'Secondary', 'Tertiary'))


new_data

##     household_income age_months dietary_diversity residence maternal_education
## 1                750          8               2.0     Urban                Low
## 2               1000          8               2.0     Urban                Low
## 3               1050          8               2.0     Urban                Low
## 4                750         12               2.0     Urban                Low
## 5               1000         12               2.0     Urban                Low
## 6               1050         12               2.0     Urban                Low
## 7                750         25               2.0     Urban                Low
## 8               1000         25               2.0     Urban                Low
## 9               1050         25               2.0     Urban                Low
## 10               750         40               2.0     Urban                Low
## 11              1000         40               2.0     Urban                Low
## 12              1050         40               2.0     Urban                Low
## 13               750         55               2.0     Urban                Low
## 14              1000         55               2.0     Urban                Low
## 15              1050         55               2.0     Urban                Low
## 16               750          8               3.2     Urban                Low
## 17              1000          8               3.2     Urban                Low
## 18              1050          8               3.2     Urban                Low
## 19               750         12               3.2     Urban                Low
## 20              1000         12               3.2     Urban                Low
## 21              1050         12               3.2     Urban                Low
## 22               750         25               3.2     Urban                Low
## 23              1000         25               3.2     Urban                Low
## 24              1050         25               3.2     Urban                Low
## 25               750         40               3.2     Urban                Low
## 26              1000         40               3.2     Urban                Low
## 27              1050         40               3.2     Urban                Low
## 28               750         55               3.2     Urban                Low
## 29              1000         55               3.2     Urban                Low
## 30              1050         55               3.2     Urban                Low
## 31               750          8               5.0     Urban                Low
## 32              1000          8               5.0     Urban                Low
## 33              1050          8               5.0     Urban                Low
## 34               750         12               5.0     Urban                Low
## 35              1000         12               5.0     Urban                Low
## 36              1050         12               5.0     Urban                Low
## 37               750         25               5.0     Urban                Low
## 38              1000         25               5.0     Urban                Low
## 39              1050         25               5.0     Urban                Low
## 40               750         40               5.0     Urban                Low
## 41              1000         40               5.0     Urban                Low
## 42              1050         40               5.0     Urban                Low
## 43               750         55               5.0     Urban                Low
## 44              1000         55               5.0     Urban                Low
## 45              1050         55               5.0     Urban                Low
## 46               750          8               7.9     Urban                Low
## 47              1000          8               7.9     Urban                Low
## 48              1050          8               7.9     Urban                Low
## 49               750         12               7.9     Urban                Low
## 50              1000         12               7.9     Urban                Low
## 51              1050         12               7.9     Urban                Low
## 52               750         25               7.9     Urban                Low
## 53              1000         25               7.9     Urban                Low
## 54              1050         25               7.9     Urban                Low
## 55               750         40               7.9     Urban                Low
## 56              1000         40               7.9     Urban                Low
## 57              1050         40               7.9     Urban                Low
## 58               750         55               7.9     Urban                Low
## 59              1000         55               7.9     Urban                Low
## 60              1050         55               7.9     Urban                Low
## 61               750          8               2.0     Rural                Low
## 62              1000          8               2.0     Rural                Low
## 63              1050          8               2.0     Rural                Low
## 64               750         12               2.0     Rural                Low
## 65              1000         12               2.0     Rural                Low
## 66              1050         12               2.0     Rural                Low
## 67               750         25               2.0     Rural                Low
## 68              1000         25               2.0     Rural                Low
## 69              1050         25               2.0     Rural                Low
## 70               750         40               2.0     Rural                Low
## 71              1000         40               2.0     Rural                Low
## 72              1050         40               2.0     Rural                Low
## 73               750         55               2.0     Rural                Low
## 74              1000         55               2.0     Rural                Low
## 75              1050         55               2.0     Rural                Low
## 76               750          8               3.2     Rural                Low
## 77              1000          8               3.2     Rural                Low
## 78              1050          8               3.2     Rural                Low
## 79               750         12               3.2     Rural                Low
## 80              1000         12               3.2     Rural                Low
## 81              1050         12               3.2     Rural                Low
## 82               750         25               3.2     Rural                Low
## 83              1000         25               3.2     Rural                Low
## 84              1050         25               3.2     Rural                Low
## 85               750         40               3.2     Rural                Low
## 86              1000         40               3.2     Rural                Low
## 87              1050         40               3.2     Rural                Low
## 88               750         55               3.2     Rural                Low
## 89              1000         55               3.2     Rural                Low
## 90              1050         55               3.2     Rural                Low
## 91               750          8               5.0     Rural                Low
## 92              1000          8               5.0     Rural                Low
## 93              1050          8               5.0     Rural                Low
## 94               750         12               5.0     Rural                Low
## 95              1000         12               5.0     Rural                Low
## 96              1050         12               5.0     Rural                Low
## 97               750         25               5.0     Rural                Low
## 98              1000         25               5.0     Rural                Low
## 99              1050         25               5.0     Rural                Low
## 100              750         40               5.0     Rural                Low
## 101             1000         40               5.0     Rural                Low
## 102             1050         40               5.0     Rural                Low
## 103              750         55               5.0     Rural                Low
## 104             1000         55               5.0     Rural                Low
## 105             1050         55               5.0     Rural                Low
## 106              750          8               7.9     Rural                Low
## 107             1000          8               7.9     Rural                Low
## 108             1050          8               7.9     Rural                Low
## 109              750         12               7.9     Rural                Low
## 110             1000         12               7.9     Rural                Low
## 111             1050         12               7.9     Rural                Low
## 112              750         25               7.9     Rural                Low
## 113             1000         25               7.9     Rural                Low
## 114             1050         25               7.9     Rural                Low
## 115              750         40               7.9     Rural                Low
## 116             1000         40               7.9     Rural                Low
## 117             1050         40               7.9     Rural                Low
## 118              750         55               7.9     Rural                Low
## 119             1000         55               7.9     Rural                Low
## 120             1050         55               7.9     Rural                Low
## 121              750          8               2.0     Urban          Secondary
## 122             1000          8               2.0     Urban          Secondary
## 123             1050          8               2.0     Urban          Secondary
## 124              750         12               2.0     Urban          Secondary
## 125             1000         12               2.0     Urban          Secondary
## 126             1050         12               2.0     Urban          Secondary
## 127              750         25               2.0     Urban          Secondary
## 128             1000         25               2.0     Urban          Secondary
## 129             1050         25               2.0     Urban          Secondary
## 130              750         40               2.0     Urban          Secondary
## 131             1000         40               2.0     Urban          Secondary
## 132             1050         40               2.0     Urban          Secondary
## 133              750         55               2.0     Urban          Secondary
## 134             1000         55               2.0     Urban          Secondary
## 135             1050         55               2.0     Urban          Secondary
## 136              750          8               3.2     Urban          Secondary
## 137             1000          8               3.2     Urban          Secondary
## 138             1050          8               3.2     Urban          Secondary
## 139              750         12               3.2     Urban          Secondary
## 140             1000         12               3.2     Urban          Secondary
## 141             1050         12               3.2     Urban          Secondary
## 142              750         25               3.2     Urban          Secondary
## 143             1000         25               3.2     Urban          Secondary
## 144             1050         25               3.2     Urban          Secondary
## 145              750         40               3.2     Urban          Secondary
## 146             1000         40               3.2     Urban          Secondary
## 147             1050         40               3.2     Urban          Secondary
## 148              750         55               3.2     Urban          Secondary
## 149             1000         55               3.2     Urban          Secondary
## 150             1050         55               3.2     Urban          Secondary
## 151              750          8               5.0     Urban          Secondary
## 152             1000          8               5.0     Urban          Secondary
## 153             1050          8               5.0     Urban          Secondary
## 154              750         12               5.0     Urban          Secondary
## 155             1000         12               5.0     Urban          Secondary
## 156             1050         12               5.0     Urban          Secondary
## 157              750         25               5.0     Urban          Secondary
## 158             1000         25               5.0     Urban          Secondary
## 159             1050         25               5.0     Urban          Secondary
## 160              750         40               5.0     Urban          Secondary
## 161             1000         40               5.0     Urban          Secondary
## 162             1050         40               5.0     Urban          Secondary
## 163              750         55               5.0     Urban          Secondary
## 164             1000         55               5.0     Urban          Secondary
## 165             1050         55               5.0     Urban          Secondary
## 166              750          8               7.9     Urban          Secondary
## 167             1000          8               7.9     Urban          Secondary
## 168             1050          8               7.9     Urban          Secondary
## 169              750         12               7.9     Urban          Secondary
## 170             1000         12               7.9     Urban          Secondary
## 171             1050         12               7.9     Urban          Secondary
## 172              750         25               7.9     Urban          Secondary
## 173             1000         25               7.9     Urban          Secondary
## 174             1050         25               7.9     Urban          Secondary
## 175              750         40               7.9     Urban          Secondary
## 176             1000         40               7.9     Urban          Secondary
## 177             1050         40               7.9     Urban          Secondary
## 178              750         55               7.9     Urban          Secondary
## 179             1000         55               7.9     Urban          Secondary
## 180             1050         55               7.9     Urban          Secondary
## 181              750          8               2.0     Rural          Secondary
## 182             1000          8               2.0     Rural          Secondary
## 183             1050          8               2.0     Rural          Secondary
## 184              750         12               2.0     Rural          Secondary
## 185             1000         12               2.0     Rural          Secondary
## 186             1050         12               2.0     Rural          Secondary
## 187              750         25               2.0     Rural          Secondary
## 188             1000         25               2.0     Rural          Secondary
## 189             1050         25               2.0     Rural          Secondary
## 190              750         40               2.0     Rural          Secondary
## 191             1000         40               2.0     Rural          Secondary
## 192             1050         40               2.0     Rural          Secondary
## 193              750         55               2.0     Rural          Secondary
## 194             1000         55               2.0     Rural          Secondary
## 195             1050         55               2.0     Rural          Secondary
## 196              750          8               3.2     Rural          Secondary
## 197             1000          8               3.2     Rural          Secondary
## 198             1050          8               3.2     Rural          Secondary
## 199              750         12               3.2     Rural          Secondary
## 200             1000         12               3.2     Rural          Secondary
## 201             1050         12               3.2     Rural          Secondary
## 202              750         25               3.2     Rural          Secondary
## 203             1000         25               3.2     Rural          Secondary
## 204             1050         25               3.2     Rural          Secondary
## 205              750         40               3.2     Rural          Secondary
## 206             1000         40               3.2     Rural          Secondary
## 207             1050         40               3.2     Rural          Secondary
## 208              750         55               3.2     Rural          Secondary
## 209             1000         55               3.2     Rural          Secondary
## 210             1050         55               3.2     Rural          Secondary
## 211              750          8               5.0     Rural          Secondary
## 212             1000          8               5.0     Rural          Secondary
## 213             1050          8               5.0     Rural          Secondary
## 214              750         12               5.0     Rural          Secondary
## 215             1000         12               5.0     Rural          Secondary
## 216             1050         12               5.0     Rural          Secondary
## 217              750         25               5.0     Rural          Secondary
## 218             1000         25               5.0     Rural          Secondary
## 219             1050         25               5.0     Rural          Secondary
## 220              750         40               5.0     Rural          Secondary
## 221             1000         40               5.0     Rural          Secondary
## 222             1050         40               5.0     Rural          Secondary
## 223              750         55               5.0     Rural          Secondary
## 224             1000         55               5.0     Rural          Secondary
## 225             1050         55               5.0     Rural          Secondary
## 226              750          8               7.9     Rural          Secondary
## 227             1000          8               7.9     Rural          Secondary
## 228             1050          8               7.9     Rural          Secondary
## 229              750         12               7.9     Rural          Secondary
## 230             1000         12               7.9     Rural          Secondary
## 231             1050         12               7.9     Rural          Secondary
## 232              750         25               7.9     Rural          Secondary
## 233             1000         25               7.9     Rural          Secondary
## 234             1050         25               7.9     Rural          Secondary
## 235              750         40               7.9     Rural          Secondary
## 236             1000         40               7.9     Rural          Secondary
## 237             1050         40               7.9     Rural          Secondary
## 238              750         55               7.9     Rural          Secondary
## 239             1000         55               7.9     Rural          Secondary
## 240             1050         55               7.9     Rural          Secondary
## 241              750          8               2.0     Urban           Tertiary
## 242             1000          8               2.0     Urban           Tertiary
## 243             1050          8               2.0     Urban           Tertiary
## 244              750         12               2.0     Urban           Tertiary
## 245             1000         12               2.0     Urban           Tertiary
## 246             1050         12               2.0     Urban           Tertiary
## 247              750         25               2.0     Urban           Tertiary
## 248             1000         25               2.0     Urban           Tertiary
## 249             1050         25               2.0     Urban           Tertiary
## 250              750         40               2.0     Urban           Tertiary
## 251             1000         40               2.0     Urban           Tertiary
## 252             1050         40               2.0     Urban           Tertiary
## 253              750         55               2.0     Urban           Tertiary
## 254             1000         55               2.0     Urban           Tertiary
## 255             1050         55               2.0     Urban           Tertiary
## 256              750          8               3.2     Urban           Tertiary
## 257             1000          8               3.2     Urban           Tertiary
## 258             1050          8               3.2     Urban           Tertiary
## 259              750         12               3.2     Urban           Tertiary
## 260             1000         12               3.2     Urban           Tertiary
## 261             1050         12               3.2     Urban           Tertiary
## 262              750         25               3.2     Urban           Tertiary
## 263             1000         25               3.2     Urban           Tertiary
## 264             1050         25               3.2     Urban           Tertiary
## 265              750         40               3.2     Urban           Tertiary
## 266             1000         40               3.2     Urban           Tertiary
## 267             1050         40               3.2     Urban           Tertiary
## 268              750         55               3.2     Urban           Tertiary
## 269             1000         55               3.2     Urban           Tertiary
## 270             1050         55               3.2     Urban           Tertiary
## 271              750          8               5.0     Urban           Tertiary
## 272             1000          8               5.0     Urban           Tertiary
## 273             1050          8               5.0     Urban           Tertiary
## 274              750         12               5.0     Urban           Tertiary
## 275             1000         12               5.0     Urban           Tertiary
## 276             1050         12               5.0     Urban           Tertiary
## 277              750         25               5.0     Urban           Tertiary
## 278             1000         25               5.0     Urban           Tertiary
## 279             1050         25               5.0     Urban           Tertiary
## 280              750         40               5.0     Urban           Tertiary
## 281             1000         40               5.0     Urban           Tertiary
## 282             1050         40               5.0     Urban           Tertiary
## 283              750         55               5.0     Urban           Tertiary
## 284             1000         55               5.0     Urban           Tertiary
## 285             1050         55               5.0     Urban           Tertiary
## 286              750          8               7.9     Urban           Tertiary
## 287             1000          8               7.9     Urban           Tertiary
## 288             1050          8               7.9     Urban           Tertiary
## 289              750         12               7.9     Urban           Tertiary
## 290             1000         12               7.9     Urban           Tertiary
## 291             1050         12               7.9     Urban           Tertiary
## 292              750         25               7.9     Urban           Tertiary
## 293             1000         25               7.9     Urban           Tertiary
## 294             1050         25               7.9     Urban           Tertiary
## 295              750         40               7.9     Urban           Tertiary
## 296             1000         40               7.9     Urban           Tertiary
## 297             1050         40               7.9     Urban           Tertiary
## 298              750         55               7.9     Urban           Tertiary
## 299             1000         55               7.9     Urban           Tertiary
## 300             1050         55               7.9     Urban           Tertiary
## 301              750          8               2.0     Rural           Tertiary
## 302             1000          8               2.0     Rural           Tertiary
## 303             1050          8               2.0     Rural           Tertiary
## 304              750         12               2.0     Rural           Tertiary
## 305             1000         12               2.0     Rural           Tertiary
## 306             1050         12               2.0     Rural           Tertiary
## 307              750         25               2.0     Rural           Tertiary
## 308             1000         25               2.0     Rural           Tertiary
## 309             1050         25               2.0     Rural           Tertiary
## 310              750         40               2.0     Rural           Tertiary
## 311             1000         40               2.0     Rural           Tertiary
## 312             1050         40               2.0     Rural           Tertiary
## 313              750         55               2.0     Rural           Tertiary
## 314             1000         55               2.0     Rural           Tertiary
## 315             1050         55               2.0     Rural           Tertiary
## 316              750          8               3.2     Rural           Tertiary
## 317             1000          8               3.2     Rural           Tertiary
## 318             1050          8               3.2     Rural           Tertiary
## 319              750         12               3.2     Rural           Tertiary
## 320             1000         12               3.2     Rural           Tertiary
## 321             1050         12               3.2     Rural           Tertiary
## 322              750         25               3.2     Rural           Tertiary
## 323             1000         25               3.2     Rural           Tertiary
## 324             1050         25               3.2     Rural           Tertiary
## 325              750         40               3.2     Rural           Tertiary
## 326             1000         40               3.2     Rural           Tertiary
## 327             1050         40               3.2     Rural           Tertiary
## 328              750         55               3.2     Rural           Tertiary
## 329             1000         55               3.2     Rural           Tertiary
## 330             1050         55               3.2     Rural           Tertiary
## 331              750          8               5.0     Rural           Tertiary
## 332             1000          8               5.0     Rural           Tertiary
## 333             1050          8               5.0     Rural           Tertiary
## 334              750         12               5.0     Rural           Tertiary
## 335             1000         12               5.0     Rural           Tertiary
## 336             1050         12               5.0     Rural           Tertiary
## 337              750         25               5.0     Rural           Tertiary
## 338             1000         25               5.0     Rural           Tertiary
## 339             1050         25               5.0     Rural           Tertiary
## 340              750         40               5.0     Rural           Tertiary
## 341             1000         40               5.0     Rural           Tertiary
## 342             1050         40               5.0     Rural           Tertiary
## 343              750         55               5.0     Rural           Tertiary
## 344             1000         55               5.0     Rural           Tertiary
## 345             1050         55               5.0     Rural           Tertiary
## 346              750          8               7.9     Rural           Tertiary
## 347             1000          8               7.9     Rural           Tertiary
## 348             1050          8               7.9     Rural           Tertiary
## 349              750         12               7.9     Rural           Tertiary
## 350             1000         12               7.9     Rural           Tertiary
## 351             1050         12               7.9     Rural           Tertiary
## 352              750         25               7.9     Rural           Tertiary
## 353             1000         25               7.9     Rural           Tertiary
## 354             1050         25               7.9     Rural           Tertiary
## 355              750         40               7.9     Rural           Tertiary
## 356             1000         40               7.9     Rural           Tertiary
## 357             1050         40               7.9     Rural           Tertiary
## 358              750         55               7.9     Rural           Tertiary
## 359             1000         55               7.9     Rural           Tertiary
## 360             1050         55               7.9     Rural           Tertiary

8.2 Generate predictions with confidence intervals using final model

library(DT)

pred.haz <- augment(final.m, newdata = new_data)

pred.haz |> datatable()

8.3 Predication visualization

income.effect <- expand.grid(
  household_income = seq(1000, 6000, by = 500),
  residence = factor(c("Urban", "Rural"),
                     levels = c("Urban", "Rural"))
  ) 

income.effect$age_months <- 24
income.effect$dietary_diversity <- 4
income.effect$maternal_education <- factor ("Secondary", 
                                     levels = c("Secondary", "Tertiary"))
         

income.effect$Predicted_score <- predict(final.m, newdata = income.effect)
         

ggplot(income.effect, 
       aes(x = household_income, 
           y = Predicted_score,
           color = residence, 
           group = residence)) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  scale_color_manual(
    values = c("Urban" = "green", "Rural" = "red")
  )

  labs(
    title = "Effect of Household Income on Height-for-age (Z-score)",
    x = "Household Income (RM)",
    y = "Predicted HAZ",
    color = "residence"
    ) +
  theme_minimal()

## NULL

Interpretation: The graph shows how HAZ changes across levels of household income by place of residence. The prediction indicate a positive association between household income and child linear growth in urban and rural settings. However, the predicted gains in HAZ associated with increasing income greater in urban areas, resulting in consistently higher predicted HAZ among urban children and suggesting persistent contextual differences in rural settings.

9 Conclusion

This study examined the determinants of linear growth (HAZ) among children aged 6–59 months using a multivariable linear regression model. The final model demonstrated strong explanatory power, accounting for approximately 60% of the variation in HAZ (adjusted R² = 0.601), indicating a good model fit.

Child age was significantly and negatively associated with HAZ, with each additional month of age associated with a small but meaningful decline in HAZ, suggesting progressive growth faltering over early childhood. This finding is consistent with the cumulative effects of nutritional inadequacy, recurrent infections, and environmental exposures as children grow older.

Socioeconomic factors played a critical role in child linear growth. Household income showed a positive association with HAZ, indicating that higher economic resources are linked to better growth outcomes. Importantly, the significant interaction between household income and urban residence suggests that the beneficial effect of income on child growth is stronger in urban settings, highlighting contextual differences in how economic advantages translate into nutritional gains.

Dietary quality was also a key determinant. Higher dietary diversity was strongly associated with improved HAZ, reinforcing the importance of access to a variety of food groups for optimal child growth beyond caloric sufficiency alone.

Maternal education emerged as one of the strongest predictors of child linear growth. Compared with low maternal education, children of mothers with secondary and tertiary education had substantially higher HAZ scores, demonstrating a clear dose–response relationship. This underscores the role of maternal knowledge, caregiving practices, and health-seeking behaviour in shaping child nutrition outcomes.

From a public health perspective, the results underscore the need for integrated interventions that go beyond single factor approaches. Efforts to improve child linear growth should therefore combine nutrition-specific actions (such as improving dietary diversity and appropriate feeding practices) with nutrition-sensitive strategies (including maternal education support, poverty alleviation, and improved access to health and sanitation). Such a holistic approach is fundamental for addressing the underlying determinants of poor linear growth and advancing child health outcomes.

In conclusion, addressing child growth faltering among children aged 6–59 months requires multisectoral policies and targeted interventions that are responsive to the complex set of determinants identified in this analysis. Tailoring public health programming to the most salient predictors in the study population will be essential to promote optimal growth and reduce the burden of growth deficits during early childhood.

Factors associated with child linear growth among children aged 6 to 59 months old

Hafizah, Iskandar, Fazlina & Yoges

2025-12-20

1. Introduction

Research question

Objective

Variables in dataset

2. Prepare environment and read data

2.1 Load packages

2.2 Read data

2.3 Wrangling data

3 Explore data analysis (EDA)

3.1 Summary Statistics

3.2 Descriptive analysis

3.3 Dataset distribution

3.4 Dataset relationships

4 Analysis data

4.1 Univariable analysis

4.1.1 Age (months)

4.1.2 Household income (RM)

4.1.3 Dietary diversity (Food group count)

4.1.4 Residence

4.1.5 Maternal education

4.1.6 Univariable analysis summary table

4.2 Multivariable analysis

4.2.1 Purposeful selection approach

4.2.2 Stepwise selection approach

Compare results (ANOVA table)

5 Model assessment

5.1 Model building

5.1.1 Confounder assessment

5.1.2 Interaction assessment

Interaction between household income and residence

Interaction between household income and child age in months

Interaction between household income and maternal education

5.2 Model diagnostics

5.2.1 Preliminary final model

5.2.2 Assumption 1: Linearity of the relationship between predictors and residuals

Overall linearity

Linearity of each continuous predictor

5.2.3 Assumption 2: Normality of residuals

5.2.4 Assumption 3: Homoscedasticity (constant variance of residuals)

Visual diagnostic

Breusch-Pagan test

Box-Cox transformation

5.2.5 Check for multicollinearity

6 Check influentials

6.1 Identify influential observations and cook’s cutoff point

6.2 Cook’s distance plot

6.3 Leverage and standardized residuals

Residuals vs leverage plot

7 Model presentation (Summary and interpretation)

7.1 Final model and interpretation

7.1.1 Final Linear Regression Model equation:

Interpretation:

Model fit and prediction:

Analysis of coefficents:

Model assessment and diagnostics

7.2 Forest plot of final model coefficients

7.3 Model performance visualization

8 Prediction

8.1 Create new data for prediction

8.2 Generate predictions with confidence intervals using final model

8.3 Predication visualization

9 Conclusion