This analysis explores the Maternal and Child Health dataset, which contains information on 2001 respondents. The objective of the study is to understand factors that influence maternal and child health outcomes. Using the gtsummary and dplyr packages in R, several statistical techniques were applied, including univariate and bivariate analyses, hypothesis testing with p-values, linear regression to predict household income, and logistic regression to study factors affecting child survival. The final step involved calculating odds ratios to interpret the strength and direction of each factor’s effect.
# Load required packages
library(gtsummary)
library(dplyr)
data <- read.csv("C:\\Users\\USER\\Downloads\\maternal & child health.csv")
head(data)
## respondent_id age education_level residence parity antenatal_visits
## 1 1 37 Secondary Rural 3 8
## 2 2 32 No education Rural 5 3
## 3 3 48 Secondary Rural 4 8
## 4 4 36 No education Urban 2 6
## 5 5 19 Secondary Urban 4 3
## 6 6 30 Primary Rural 1 1
## delivery_place child_alive contraceptive_use household_income
## 1 Public Facility Yes Yes 70150
## 2 Public Facility Yes No 63494
## 3 Home Yes Yes 67083
## 4 Home Yes Yes 72443
## 5 Public Facility Yes Yes 71171
## 6 Public Facility No No 70278
Interpretation: The dataset was successfully loaded from a CSV file using the read.csv() function. It includes key variables such as age, education level, residence, parity, number of antenatal visits, delivery place, contraceptive use, and household income. The first few rows of data confirm that it was imported correctly, with appropriate values for each column. The data contains a mix of numeric and categorical variables, making it suitable for both regression and summary analyses. This dataset provides a broad overview of mothers’ demographic, socioeconomic, and health-related information.
data %>% tbl_summary
| Characteristic | N = 2001 |
|---|---|
| respondent_id | 101 (51, 151) |
| age | 33 (24, 42) |
| education_level | |
| Higher | 21 (11%) |
| No education | 29 (15%) |
| Primary | 72 (36%) |
| Secondary | 78 (39%) |
| residence | |
| Rural | 114 (57%) |
| Urban | 86 (43%) |
| parity | |
| 0 | 24 (12%) |
| 1 | 34 (17%) |
| 2 | 34 (17%) |
| 3 | 37 (19%) |
| 4 | 32 (16%) |
| 5 | 39 (20%) |
| antenatal_visits | 5 (2, 7) |
| delivery_place | |
| Home | 59 (30%) |
| Private Facility | 42 (21%) |
| Public Facility | 99 (50%) |
| child_alive | 182 (91%) |
| contraceptive_use | 120 (60%) |
| household_income | 46,360 (25,560, 62,864) |
| 1 Median (Q1, Q3); n (%) | |
Interpretation: The univariate summary provides a general understanding of the dataset’s characteristics. The median age of the respondents is 33 years, indicating that most mothers are in their early adulthood. In terms of education, 39% have secondary education, 36% have primary education, 15% have no formal education, and 11% possess higher education. The residence distribution shows that 57% of the respondents live in rural areas, while 43% are urban residents. Most deliveries occurred in public facilities (50%), followed by home deliveries (30%) and private facilities (21%). The dataset also reveals that 91% of the children were alive at the time of data collection, and 60% of mothers reported using contraceptives. The median household income was approximately 46,360 units, with a broad range, suggesting income variability among households. Overall, the univariate analysis shows a population dominated by rural residents with moderate income and generally positive child survival outcomes.
data %>% tbl_summary(by = residence)
| Characteristic | Rural N = 1141 |
Urban N = 861 |
|---|---|---|
| respondent_id | 101 (53, 155) | 100 (43, 144) |
| age | 34 (25, 42) | 33 (24, 42) |
| education_level | ||
| Higher | 16 (14%) | 5 (5.8%) |
| No education | 17 (15%) | 12 (14%) |
| Primary | 45 (39%) | 27 (31%) |
| Secondary | 36 (32%) | 42 (49%) |
| parity | ||
| 0 | 16 (14%) | 8 (9.3%) |
| 1 | 20 (18%) | 14 (16%) |
| 2 | 18 (16%) | 16 (19%) |
| 3 | 19 (17%) | 18 (21%) |
| 4 | 17 (15%) | 15 (17%) |
| 5 | 24 (21%) | 15 (17%) |
| antenatal_visits | 5 (2, 7) | 6 (3, 8) |
| delivery_place | ||
| Home | 33 (29%) | 26 (30%) |
| Private Facility | 22 (19%) | 20 (23%) |
| Public Facility | 59 (52%) | 40 (47%) |
| child_alive | 103 (90%) | 79 (92%) |
| contraceptive_use | 72 (63%) | 48 (56%) |
| household_income | 47,346 (26,209, 60,360) | 45,025 (25,407, 66,833) |
| 1 Median (Q1, Q3); n (%) | ||
Interpretation: The bivariate analysis compares maternal and household characteristics across rural and urban groups. It shows that urban mothers tend to have higher education levels, with 49% attaining secondary education compared to 32% among rural mothers. Conversely, primary education is more common among rural mothers. The number of antenatal visits is slightly higher for urban mothers (median of 6) than rural mothers (median of 5), reflecting possible better access to healthcare in urban areas. The median household income is similar across both groups, and the proportion of children alive is nearly equal (90% in rural vs. 92% in urban). This analysis suggests that urban residence is associated with higher education and slightly improved healthcare utilization, but differences in income and child survival are minimal between the two groups.
data %>% tbl_summary(by = residence) %>% add_p()
| Characteristic | Rural N = 1141 |
Urban N = 861 |
p-value2 |
|---|---|---|---|
| respondent_id | 101 (53, 155) | 100 (43, 144) | 0.5 |
| age | 34 (25, 42) | 33 (24, 42) | 0.5 |
| education_level | 0.050 | ||
| Higher | 16 (14%) | 5 (5.8%) | |
| No education | 17 (15%) | 12 (14%) | |
| Primary | 45 (39%) | 27 (31%) | |
| Secondary | 36 (32%) | 42 (49%) | |
| parity | 0.8 | ||
| 0 | 16 (14%) | 8 (9.3%) | |
| 1 | 20 (18%) | 14 (16%) | |
| 2 | 18 (16%) | 16 (19%) | |
| 3 | 19 (17%) | 18 (21%) | |
| 4 | 17 (15%) | 15 (17%) | |
| 5 | 24 (21%) | 15 (17%) | |
| antenatal_visits | 5 (2, 7) | 6 (3, 8) | 0.081 |
| delivery_place | 0.7 | ||
| Home | 33 (29%) | 26 (30%) | |
| Private Facility | 22 (19%) | 20 (23%) | |
| Public Facility | 59 (52%) | 40 (47%) | |
| child_alive | 103 (90%) | 79 (92%) | 0.7 |
| contraceptive_use | 72 (63%) | 48 (56%) | 0.3 |
| household_income | 47,346 (26,209, 60,360) | 45,025 (25,407, 66,833) | 0.6 |
| 1 Median (Q1, Q3); n (%) | |||
| 2 Wilcoxon rank sum test; Pearson’s Chi-squared test | |||
Interpretation: Adding p-values helps identify which group differences are statistically significant. The results show that only education level differs significantly between rural and urban mothers (p = 0.050). Urban mothers tend to have higher education, which may influence their healthcare awareness and decision-making. All other variables—including age, parity, antenatal visits, delivery place, contraceptive use, and income—show no significant difference between the two residence types. The lack of significant differences in most factors suggests a relatively uniform distribution of health outcomes and service utilization between rural and urban areas, possibly due to similar access to basic maternal health services.
model <- lm(household_income ~ age+education_level+residence,data=data)
tbl_regression(model)
| Characteristic | Beta | 95% CI | p-value |
|---|---|---|---|
| age | -158 | -484, 168 | 0.3 |
| education_level | |||
| Higher | — | — | |
| No education | -1,708 | -13,909, 10,493 | 0.8 |
| Primary | 2,829 | -7,720, 13,379 | 0.6 |
| Secondary | 1,534 | -9,041, 12,110 | 0.8 |
| residence | |||
| Rural | — | — | |
| Urban | 931 | -5,243, 7,106 | 0.8 |
| Abbreviation: CI = Confidence Interval | |||
Interpretation: A linear regression model was fitted to evaluate the influence of age, education level, and residence on household income. The results indicate that none of these variables have a statistically significant impact on income (all p-values > 0.05). Age has a small negative coefficient, implying that income slightly decreases with age, but the relationship is not significant. Education level shows no clear pattern, with neither primary nor secondary education significantly increasing income compared to higher education. Similarly, residence (urban or rural) has no meaningful effect. This suggests that within this dataset, factors such as education and residence do not explain variations in household income, indicating that other variables—perhaps employment type or family size—may play a stronger role in determining income levels.
data$child_alive <- ifelse(data$child_alive == "Yes", 1, 0)
Interpretation: Before conducting logistic regression, the variable child_alive was converted into numeric form using 1 for “Yes” and 0 for “No.” This transformation is essential because logistic regression models require a binary numerical outcome. The conversion ensures that the statistical model interprets the dependent variable correctly as a probability outcome.
model<-glm(child_alive~age+education_level+residence,data = data, family = "binomial")
tbl_regression(model)
| Characteristic | log(OR) | 95% CI | p-value |
|---|---|---|---|
| age | 0.02 | -0.03, 0.08 | 0.4 |
| education_level | |||
| Higher | — | — | |
| No education | 1.0 | -1.4, 4.1 | 0.4 |
| Primary | -0.49 | -2.4, 0.97 | 0.6 |
| Secondary | 0.41 | -1.6, 2.1 | 0.6 |
| residence | |||
| Rural | — | — | |
| Urban | 0.12 | -0.89, 1.2 | 0.8 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
Interpretation: The logistic regression model assesses how maternal age, education level, and residence affect the probability of a child being alive. The results show that none of the predictors are statistically significant, as all p-values exceed 0.05. The coefficient for age (log odds = 0.02) suggests a small positive but non-significant relationship between maternal age and child survival. Education level shows mixed results, with “No education” and “Secondary education” having positive coefficients but without statistical significance. Residence (urban) also has a very small positive effect (log odds = 0.12), indicating a slightly higher likelihood of child survival in urban areas, though not significant. In summary, this model suggests that the analyzed factors do not significantly influence child survival, possibly because the overall survival rate (91%) is already high and shows little variation across subgroups.
exp(cbind(OR=coef(model),confint(model)))
## OR 2.5 % 97.5 %
## (Intercept) 4.2600563 0.52842736 48.352688
## age 1.0248217 0.97201138 1.081941
## education_levelNo education 2.7541471 0.24425460 62.160538
## education_levelPrimary 0.6151942 0.08854587 2.639987
## education_levelSecondary 1.5027635 0.20016932 7.833858
## residenceUrban 1.1325468 0.41203617 3.293093
Interpretation: The odds ratio (OR) transformation makes the logistic regression results easier to interpret. The odds ratio for age is 1.02, meaning each additional year of age slightly increases the odds of child survival by about 2%, although the confidence interval (0.97–1.08) includes 1, making the effect non-significant. Mothers with no education have an OR of 2.75, implying higher odds of child survival than those with higher education, but the wide confidence interval (0.24–62.16) indicates unreliability due to small subgroup variation. Primary education shows an OR of 0.61, indicating lower odds of survival compared to higher education, while secondary education (OR = 1.50) shows a modest increase in odds. Urban residence has an OR of 1.13, meaning urban mothers have slightly higher odds of their child being alive compared to rural mothers. Overall, none of the variables show statistically significant odds ratios, suggesting that the differences are not strong enough to confirm a relationship.
In conclusion, the Maternal and Child Health dataset reflects generally positive outcomes, with high rates of child survival and widespread access to public health facilities. The analysis shows that education level is the only variable with a significant difference between rural and urban mothers, while other factors like income, antenatal visits, and child survival remain similar across groups. Regression analyses indicate that maternal age, education, and residence do not significantly affect either income or child survival in this dataset. Although no strong associations were found, the patterns suggest that improving education access—particularly in rural areas—could further enhance maternal awareness, healthcare utilization, and long-term socioeconomic conditions.