Introduction

This analysis explores the Maternal and Child Health dataset, which contains information on 2001 respondents. The objective of the study is to understand factors that influence maternal and child health outcomes. Using the gtsummary and dplyr packages in R, several statistical techniques were applied, including univariate and bivariate analyses, hypothesis testing with p-values, linear regression to predict household income, and logistic regression to study factors affecting child survival. The final step involved calculating odds ratios to interpret the strength and direction of each factor’s effect.

Loading Packages and Data

# Load required packages
library(gtsummary)
library(dplyr)

Dataset Load

data <- read.csv("C:\\Users\\USER\\Downloads\\maternal & child health.csv")

Dataset Print

head(data)
##   respondent_id age education_level residence parity antenatal_visits
## 1             1  37       Secondary     Rural      3                8
## 2             2  32    No education     Rural      5                3
## 3             3  48       Secondary     Rural      4                8
## 4             4  36    No education     Urban      2                6
## 5             5  19       Secondary     Urban      4                3
## 6             6  30         Primary     Rural      1                1
##    delivery_place child_alive contraceptive_use household_income
## 1 Public Facility         Yes               Yes            70150
## 2 Public Facility         Yes                No            63494
## 3            Home         Yes               Yes            67083
## 4            Home         Yes               Yes            72443
## 5 Public Facility         Yes               Yes            71171
## 6 Public Facility          No                No            70278

Interpretation: The dataset was successfully loaded from a CSV file using the read.csv() function. It includes key variables such as age, education level, residence, parity, number of antenatal visits, delivery place, contraceptive use, and household income. The first few rows of data confirm that it was imported correctly, with appropriate values for each column. The data contains a mix of numeric and categorical variables, making it suitable for both regression and summary analyses. This dataset provides a broad overview of mothers’ demographic, socioeconomic, and health-related information.

Univariate analysis

data %>% tbl_summary
Characteristic N = 2001
respondent_id 101 (51, 151)
age 33 (24, 42)
education_level
    Higher 21 (11%)
    No education 29 (15%)
    Primary 72 (36%)
    Secondary 78 (39%)
residence
    Rural 114 (57%)
    Urban 86 (43%)
parity
    0 24 (12%)
    1 34 (17%)
    2 34 (17%)
    3 37 (19%)
    4 32 (16%)
    5 39 (20%)
antenatal_visits 5 (2, 7)
delivery_place
    Home 59 (30%)
    Private Facility 42 (21%)
    Public Facility 99 (50%)
child_alive 182 (91%)
contraceptive_use 120 (60%)
household_income 46,360 (25,560, 62,864)
1 Median (Q1, Q3); n (%)

Interpretation: The univariate summary provides a general understanding of the dataset’s characteristics. The median age of the respondents is 33 years, indicating that most mothers are in their early adulthood. In terms of education, 39% have secondary education, 36% have primary education, 15% have no formal education, and 11% possess higher education. The residence distribution shows that 57% of the respondents live in rural areas, while 43% are urban residents. Most deliveries occurred in public facilities (50%), followed by home deliveries (30%) and private facilities (21%). The dataset also reveals that 91% of the children were alive at the time of data collection, and 60% of mothers reported using contraceptives. The median household income was approximately 46,360 units, with a broad range, suggesting income variability among households. Overall, the univariate analysis shows a population dominated by rural residents with moderate income and generally positive child survival outcomes.

bivariate analysis

data %>% tbl_summary(by = residence)
Characteristic Rural
N = 114
1
Urban
N = 86
1
respondent_id 101 (53, 155) 100 (43, 144)
age 34 (25, 42) 33 (24, 42)
education_level

    Higher 16 (14%) 5 (5.8%)
    No education 17 (15%) 12 (14%)
    Primary 45 (39%) 27 (31%)
    Secondary 36 (32%) 42 (49%)
parity

    0 16 (14%) 8 (9.3%)
    1 20 (18%) 14 (16%)
    2 18 (16%) 16 (19%)
    3 19 (17%) 18 (21%)
    4 17 (15%) 15 (17%)
    5 24 (21%) 15 (17%)
antenatal_visits 5 (2, 7) 6 (3, 8)
delivery_place

    Home 33 (29%) 26 (30%)
    Private Facility 22 (19%) 20 (23%)
    Public Facility 59 (52%) 40 (47%)
child_alive 103 (90%) 79 (92%)
contraceptive_use 72 (63%) 48 (56%)
household_income 47,346 (26,209, 60,360) 45,025 (25,407, 66,833)
1 Median (Q1, Q3); n (%)

Interpretation: The bivariate analysis compares maternal and household characteristics across rural and urban groups. It shows that urban mothers tend to have higher education levels, with 49% attaining secondary education compared to 32% among rural mothers. Conversely, primary education is more common among rural mothers. The number of antenatal visits is slightly higher for urban mothers (median of 6) than rural mothers (median of 5), reflecting possible better access to healthcare in urban areas. The median household income is similar across both groups, and the proportion of children alive is nearly equal (90% in rural vs. 92% in urban). This analysis suggests that urban residence is associated with higher education and slightly improved healthcare utilization, but differences in income and child survival are minimal between the two groups.

Summary with P-values

data %>% tbl_summary(by = residence) %>% add_p()
Characteristic Rural
N = 114
1
Urban
N = 86
1
p-value2
respondent_id 101 (53, 155) 100 (43, 144) 0.5
age 34 (25, 42) 33 (24, 42) 0.5
education_level

0.050
    Higher 16 (14%) 5 (5.8%)
    No education 17 (15%) 12 (14%)
    Primary 45 (39%) 27 (31%)
    Secondary 36 (32%) 42 (49%)
parity

0.8
    0 16 (14%) 8 (9.3%)
    1 20 (18%) 14 (16%)
    2 18 (16%) 16 (19%)
    3 19 (17%) 18 (21%)
    4 17 (15%) 15 (17%)
    5 24 (21%) 15 (17%)
antenatal_visits 5 (2, 7) 6 (3, 8) 0.081
delivery_place

0.7
    Home 33 (29%) 26 (30%)
    Private Facility 22 (19%) 20 (23%)
    Public Facility 59 (52%) 40 (47%)
child_alive 103 (90%) 79 (92%) 0.7
contraceptive_use 72 (63%) 48 (56%) 0.3
household_income 47,346 (26,209, 60,360) 45,025 (25,407, 66,833) 0.6
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

Interpretation: Adding p-values helps identify which group differences are statistically significant. The results show that only education level differs significantly between rural and urban mothers (p = 0.050). Urban mothers tend to have higher education, which may influence their healthcare awareness and decision-making. All other variables—including age, parity, antenatal visits, delivery place, contraceptive use, and income—show no significant difference between the two residence types. The lack of significant differences in most factors suggests a relatively uniform distribution of health outcomes and service utilization between rural and urban areas, possibly due to similar access to basic maternal health services.

linear regression

model <- lm(household_income ~ age+education_level+residence,data=data)
tbl_regression(model)
Characteristic Beta 95% CI p-value
age -158 -484, 168 0.3
education_level


    Higher
    No education -1,708 -13,909, 10,493 0.8
    Primary 2,829 -7,720, 13,379 0.6
    Secondary 1,534 -9,041, 12,110 0.8
residence


    Rural
    Urban 931 -5,243, 7,106 0.8
Abbreviation: CI = Confidence Interval

Interpretation: A linear regression model was fitted to evaluate the influence of age, education level, and residence on household income. The results indicate that none of these variables have a statistically significant impact on income (all p-values > 0.05). Age has a small negative coefficient, implying that income slightly decreases with age, but the relationship is not significant. Education level shows no clear pattern, with neither primary nor secondary education significantly increasing income compared to higher education. Similarly, residence (urban or rural) has no meaningful effect. This suggests that within this dataset, factors such as education and residence do not explain variations in household income, indicating that other variables—perhaps employment type or family size—may play a stronger role in determining income levels.

Convert categorical binary values into 0 and 1

data$child_alive <- ifelse(data$child_alive == "Yes", 1, 0)

Interpretation: Before conducting logistic regression, the variable child_alive was converted into numeric form using 1 for “Yes” and 0 for “No.” This transformation is essential because logistic regression models require a binary numerical outcome. The conversion ensures that the statistical model interprets the dependent variable correctly as a probability outcome.

logistic regression

model<-glm(child_alive~age+education_level+residence,data = data, family = "binomial")
tbl_regression(model)
Characteristic log(OR) 95% CI p-value
age 0.02 -0.03, 0.08 0.4
education_level


    Higher
    No education 1.0 -1.4, 4.1 0.4
    Primary -0.49 -2.4, 0.97 0.6
    Secondary 0.41 -1.6, 2.1 0.6
residence


    Rural
    Urban 0.12 -0.89, 1.2 0.8
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Interpretation: The logistic regression model assesses how maternal age, education level, and residence affect the probability of a child being alive. The results show that none of the predictors are statistically significant, as all p-values exceed 0.05. The coefficient for age (log odds = 0.02) suggests a small positive but non-significant relationship between maternal age and child survival. Education level shows mixed results, with “No education” and “Secondary education” having positive coefficients but without statistical significance. Residence (urban) also has a very small positive effect (log odds = 0.12), indicating a slightly higher likelihood of child survival in urban areas, though not significant. In summary, this model suggests that the analyzed factors do not significantly influence child survival, possibly because the overall survival rate (91%) is already high and shows little variation across subgroups.

Odd ratio

exp(cbind(OR=coef(model),confint(model)))
##                                    OR      2.5 %    97.5 %
## (Intercept)                 4.2600563 0.52842736 48.352688
## age                         1.0248217 0.97201138  1.081941
## education_levelNo education 2.7541471 0.24425460 62.160538
## education_levelPrimary      0.6151942 0.08854587  2.639987
## education_levelSecondary    1.5027635 0.20016932  7.833858
## residenceUrban              1.1325468 0.41203617  3.293093

Interpretation: The odds ratio (OR) transformation makes the logistic regression results easier to interpret. The odds ratio for age is 1.02, meaning each additional year of age slightly increases the odds of child survival by about 2%, although the confidence interval (0.97–1.08) includes 1, making the effect non-significant. Mothers with no education have an OR of 2.75, implying higher odds of child survival than those with higher education, but the wide confidence interval (0.24–62.16) indicates unreliability due to small subgroup variation. Primary education shows an OR of 0.61, indicating lower odds of survival compared to higher education, while secondary education (OR = 1.50) shows a modest increase in odds. Urban residence has an OR of 1.13, meaning urban mothers have slightly higher odds of their child being alive compared to rural mothers. Overall, none of the variables show statistically significant odds ratios, suggesting that the differences are not strong enough to confirm a relationship.

Conclusion

In conclusion, the Maternal and Child Health dataset reflects generally positive outcomes, with high rates of child survival and widespread access to public health facilities. The analysis shows that education level is the only variable with a significant difference between rural and urban mothers, while other factors like income, antenatal visits, and child survival remain similar across groups. Regression analyses indicate that maternal age, education, and residence do not significantly affect either income or child survival in this dataset. Although no strong associations were found, the patterns suggest that improving education access—particularly in rural areas—could further enhance maternal awareness, healthcare utilization, and long-term socioeconomic conditions.