In this analysis I focus on how socioeconomic and health factors influence birth weight in newborn babies. This dataset contains information on 3.8M births for 2018 in the United States. It is open-sourced made available by the CDC / National Center for Health Statistics (NCHS) and is publicly available through Kaggle. The dataset can be found here : https://www.kaggle.com/datasets/des137/us-births-2018
It is well known that smoking is TERRIBLE and this analysis is just another example of that! The analysis concludes that across different health factors and socioeconomic factors, if a mother smokes she will on average have a lighter weight baby. More importantly, a trend is seen when smoking, infection and risk are involved in pregnancy. Independently, across all variables of interest smoking, infection, and risk showed a dip in lowering birth weight. For example, smoking was associated with a decrease in baby weight across each category of mother’s age regardless of whether the mother was younger or older. The combination of the three variables along with a low BMI is rare but is associated with a very significant 1.34lb difference in baby weight.
library(data.table)
library(tidyverse)
library(knitr)
library(dplyr)
library(kableExtra)
dt = fread("C:/Users/Alexei/Desktop/Projects/births/births_2018.csv")
head(dt) %>% kable("html") %>% kable_styling("striped") %>% scroll_box(width = "100%")
| ATTEND | BFACIL | BMI | CIG_0 | DBWT | DLMP_MM | DLMP_YY | DMAR | DOB_MM | DOB_TT | DOB_WK | DOB_YY | DWgt_R | FAGECOMB | FEDUC | FHISPX | FRACE15 | FRACE31 | FRACE6 | ILLB_R | ILOP_R | ILP_R | IMP_SEX | IP_GON | LD_INDL | MAGER | MAGE_IMPFLG | MAR_IMP | MBSTATE_REC | MEDUC | MHISPX | MM_AICU | MRACE15 | MRACE31 | MRACEIMP | MRAVE6 | MTRAN | M_Ht_In | NO_INFEC | NO_MMORB | NO_RISKS | PAY | PAY_REC | PRECARE | PREVIS | PRIORDEAD | PRIORLIVE | PRIORTERM | PWgt_R | RDMETH_REC | RESTATUS | RF_CESAR | RF_CESARN | SEX | WTGAIN |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 30.7 | 0 | 3657 | 4 | 2017 | 1 | 1 | 1227 | 2 | 2018 | 231 | 31 | 3 | 1 | 1 | 1 | 1 | 16 | 33 | 16 | NA | N | N | 30 | NA | NA | 1 | 6 | 0 | N | 1 | 1 | NA | 1 | N | 66 | 1 | 1 | 1 | 2 | 2 | 3 | 8 | 0 | 1 | 2 | 190 | 1 | 2 | N | 0 | M | 41 |
| 1 | 1 | 33.3 | 2 | 3242 | 99 | 9999 | 2 | 1 | 1704 | 2 | 2018 | 185 | 35 | 4 | 0 | 3 | 3 | 3 | 180 | 888 | 180 | NA | N | N | 35 | NA | NA | 1 | 9 | 0 | N | 3 | 3 | NA | 3 | N | 63 | 1 | 1 | 0 | 1 | 1 | 3 | 9 | 0 | 2 | 0 | 188 | 4 | 2 | Y | 2 | F | 0 |
| 1 | 1 | 30.0 | 0 | 3470 | 4 | 2017 | 1 | 1 | 336 | 2 | 2018 | 273 | 31 | 4 | 0 | 1 | 1 | 1 | 999 | 888 | 999 | NA | N | N | 28 | NA | NA | 1 | 6 | 0 | N | 1 | 1 | NA | 1 | N | 71 | 1 | 1 | 0 | 5 | 4 | 5 | 17 | 0 | 1 | 0 | 215 | 1 | 1 | N | 0 | M | 58 |
| 3 | 1 | 23.7 | 0 | 3140 | 5 | 2017 | 2 | 1 | 938 | 2 | 2018 | 138 | 26 | 2 | 0 | 3 | 3 | 3 | 43 | 888 | 43 | NA | N | N | 23 | NA | NA | 1 | 2 | 0 | N | 3 | 3 | NA | 3 | N | 64 | 1 | 1 | 1 | 1 | 1 | 5 | 6 | 0 | 2 | 0 | 138 | 1 | 2 | N | 0 | F | 0 |
| 1 | 1 | 35.5 | 0 | 2125 | 99 | 9999 | 1 | 1 | 830 | 3 | 2018 | 219 | 35 | 3 | 0 | 2 | 2 | 2 | 999 | 999 | 999 | NA | N | N | 37 | NA | NA | 1 | 4 | 0 | N | 1 | 1 | NA | 1 | N | 66 | 1 | 1 | 1 | 1 | 1 | 5 | 15 | 0 | 1 | 4 | 220 | 3 | 1 | N | 0 | M | 0 |
| 4 | 2 | 31.3 | 0 | 4082 | 3 | 2017 | 1 | 1 | 28 | 2 | 2018 | 247 | 28 | 6 | 6 | 1 | 1 | 1 | 39 | 888 | 39 | NA | N | N | 26 | NA | NA | 1 | 6 | 0 | N | 1 | 1 | NA | 1 | N | 67 | 1 | 1 | 1 | 2 | 2 | 2 | 13 | 0 | 1 | 0 | 200 | 1 | 1 | N | 0 | F | 47 |
There are quite a few variables in the dataset that one could investigate but I focused the analysis on 6 main variables - mothers education, marital status, smoking, BMI, known risk factors and known infections and how they interact with each other and influence new born weight.
All of the columns were coded using numeric values, for instance, the column meduc was coded 1-9 and each number meant a different level of education. 1 was coded for least amount of education and 8 being highest level of education. To make sense of this the columns needed to be cleaned for ease of use. I used the column key that was attached to the data set, it can be found here: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/DVS/natality/UserGuide2018-508.pdf
cols = c("MAGER", "DMAR", "MEDUC", "CIG_0", "BMI", "DBWT", "NO_RISKS", "NO_INFEC")
dt = dt[ , cols, with = FALSE]
colnames(dt) = tolower(colnames(dt))
# dbwt = babyweight at birth is coded in oz, convert to lbs for ease of use.
#dbwt (need to convert to oz to lbs )
dt[dbwt == 9999, babyweight_oz := NA]
dt[, babyweight_oz := dbwt/28.35]
dt[ , dbwt := NULL]
dt[babyweight_oz == 9999, babyweight_lbs := NA]
dt[ , babyweight_lbs := babyweight_oz/16]
# Remove outliers for baby weight. Babies weighing in at > 15 lbs are very, very rare. https://www.nationwidechildrens.org/conditions/health-library/newborn-measurements
dt = dt[babyweight_lbs < 15.0]
dt[babyweight_lbs > 8.8, babyweight_category := "c.macrosomia"]
dt[babyweight_lbs >= 5.5 & babyweight_lbs <= 8.8, babyweight_category := "b.normal birthweight"]
dt[babyweight_lbs < 5.5, babyweight_category := "a.low birthweight"]
# Smoker = cig_0: Coded for cigarettes before pregnancy.
dt[cig_0 == 99, smoker := NA]
dt[cig_0 == 0, smoker := FALSE]
dt[cig_0 > 0, smoker := TRUE]
dt[ , cig_0 := NULL]
# no_risk = no risk factors reported. The guide does not specify what a risk is.
dt[no_risks == 1, has_risk := FALSE]
dt[no_risks == 0, has_risk := TRUE]
dt[no_risks == 9, has_risk := NA]
# no_infec = no infection reported. The guide does not specify what an infection could be.
dt[no_infec == 1, has_infection := FALSE]
dt[no_infec == 0, has_infection := TRUE]
dt[no_infec == 9, has_infection := NA]
#bmi
dt[ , bmi := ifelse(bmi == 99.9, NA, bmi)]
dt[bmi < 18.5, bmi_category := "a. mom underweight"]
dt[bmi >= 18.5 & bmi < 25.0, bmi_category := "b. mom healthy"]
dt[bmi >= 25.0 & bmi < 30.0, bmi_category := "c. mom overweight"]
dt[bmi >= 30, bmi_category := "e. mom obese"]
# mager = mothers age
dt[mager >= 13 & mager < 19, mother_age_category := "a. teen mom"]
dt[mager >= 19 & mager < 25, mother_age_category := "b. young aged mom"]
dt[mager >= 25 & mager < 35, mother_age_category := "c. standard aged mom"]
dt[mager >= 35, mother_age_category := "d. advanced aged mom"]
# meduc = mothers education level
dt[meduc == 1, mother_ed := "a. <= 8th grade"]
dt[meduc == 2, mother_ed := "b. high school no diploma"]
dt[meduc == 3, mother_ed := "c. high school"]
dt[meduc == 4, mother_ed := "d. college credit"]
dt[meduc %in% c(5,6), mother_ed := "e. undergraduate degree"]
dt[meduc %in% c(7,8), mother_ed := "f. graduate degree"]
dt[meduc == 9, mother_ed := "g. NA"]
dt[ , meduc := NULL]
# dmar = marriage status
dt[dmar == 1, married := TRUE]
dt[dmar == 2, married := FALSE]
dt[ , dmar := NULL]
Now that the data is nice and tidy I wanted to understand how each variable interacted with birth weight. First, I took a look at how heath factors interact and secondly how socioeconomic factors interact with weight.
ggplot(dt, aes(x = babyweight_lbs)) +
geom_density(fill = "light blue", alpha = 0.7)
For birth weight, there is a normal distribution of weight with a mean around 7lbs. The left tail is slightly wider with more lighter weight babies.
# smoking and BW
ggplot(dt, aes(x = smoker, y = babyweight_lbs, fill = smoker)) +
geom_violin()
This violin plot shows the relationship between a mother who is a smoker to a nonsmoker and how that impacts birth weight. It highlights that the median birth weight for smokers is lower than that of nonsmokers.
dt[ , .(.N, avg_weight = mean(babyweight_lbs, na.rm = TRUE)), by = smoker]
## smoker N avg_weight
## 1: FALSE 3462936 7.221064
## 2: TRUE 335587 6.874621
Mothers who smoke have a lower birth weight with a 0.35 lb difference.
#risk factors
ggplot(dt, aes(x= has_risk, y = babyweight_lbs, fill = has_risk)) +
geom_violin()
This violin plot shows the relationship between a mother who has a known risk-factor and who does not. It highlights that the median birth weight for a mother with a known risk is lower in relation to a mother who has no risk. The thicker tail also shows that are more cases where birth weight is lower with a known risk factor.
ggplot(dt, aes(x= has_infection, y = babyweight_lbs, fill = has_infection)) +
geom_violin()
This density plot compares mothers with a known infection to mothers without infection. The plot highlights that the median birth weight for a mother with a known infection is lower in relation to a mother without an infection.
#mothers BMI
ggplot(dt, aes(x = bmi)) +
geom_density(fill = "light blue", alpha = 0.7)
## Warning: Removed 85199 rows containing non-finite values (`stat_density()`).
By using a density plot we can visualize where the median for BMI is for the dataset. Most women fall WNL (which is 18.5 - 24.9 considered a healthy BMI) with spikes around 30 BMI (overweight).
ggplot(dt, aes(x = bmi, y = babyweight_lbs)) +
geom_bin2d(bin = 100) +
scale_fill_continuous(type = "viridis")
## Warning in geom_bin2d(bin = 100): Ignoring unknown parameters: `bin`
## Warning: Removed 85199 rows containing non-finite values (`stat_bin2d()`).
The density map shows us that there is a significant number of women with a higher BMI but it does not seem to show any real correlation to low birth weight.
dt[ , mean(babyweight_lbs), by = bmi_category][order(bmi_category)]
## bmi_category V1
## 1: a. mom underweight 6.737027
## 2: b. mom healthy 7.138015
## 3: c. mom overweight 7.254280
## 4: e. mom obese 7.280933
## 5: <NA> 6.936613
There seems to be a relation to low birth weight if the mother herself is underweight with a 0.4lb difference in weight compared to a mother with a healthy bmi. As the mothers bmi increases so does the weight of the newborn.
# mothers age
ggplot(dt, aes(x = mager, y = babyweight_lbs)) +
geom_bin2d(bin = 100) +
scale_fill_continuous(type = "viridis")
## Warning in geom_bin2d(bin = 100): Ignoring unknown parameters: `bin`
This is another way to show that there does not seem to be a strong correlation between BMI and birth weight. The density for birth weight is around 7lbs and does not drop as mother BMI increases.
ggplot(dt, aes(x = mother_age_category, y = babyweight_lbs, fill = mother_age_category)) +
geom_violin() +
geom_boxplot(width = 0.1, fill = "white", color = "black") +
theme(axis.text.x = element_text(angle=45))
As mothers age increases baby weight increases. Teen mothers have a lower median baby weight compared to older mothers.
# mothers education
dt_mother_ed = dt[ , .N, by = mother_ed][order(mother_ed)]
dt_mother_ed[ , pct := N / sum(N)] %>% kable("html") %>% kable_styling("striped")
| mother_ed | N | pct |
|---|---|---|
|
118024 | 0.0310710 |
|
358451 | 0.0943659 |
|
967710 | 0.2547595 |
|
751781 | 0.1979140 |
|
1090837 | 0.2871740 |
|
463153 | 0.1219298 |
|
48567 | 0.0127858 |
The table shows percents of the population according to mothers education levels. Majority of women have between a high school and undergraduate degree. A surprise in the data is how many women have lower educations.
ggplot(dt, aes(x = mother_ed, y = babyweight_lbs, fill = mother_ed)) +
geom_violin() +
geom_boxplot(width = 0.1, fill = "white", color = "black") +
theme(axis.text.x = element_text(angle=45))
The median birth weight for a high school diploma or less is lower in relation to a mother with higher educational levels.
dt[ , mean(babyweight_lbs), by = mother_ed][order(mother_ed)]
## mother_ed V1
## 1: a. <= 8th grade 7.208248
## 2: b. high school no diploma 6.981034
## 3: c. high school 7.086086
## 4: d. college credit 7.174980
## 5: e. undergraduate degree 7.315141
## 6: f. graduate degree 7.313910
## 7: g. NA 7.034306
High school credit with no diploma has the lowest on average babyweight compared to higher educational levels. What is interesting is the women who have lower than a high school level education on average have average weighted babies.
#married
ggplot(dt, aes(x = married, y = babyweight_lbs, fill = married)) +
geom_violin() +
geom_boxplot(width = 0.1, fill = "white", color = "black")
Marriage seems to be a social tie that contributes to healthier weight for babies.
Let us compare the health factors smoking + risk and smoking + infection in relation to BMI, education, age and marital status.
dt[smoker == FALSE & has_risk == FALSE, smoke_risk_category := "Not a smoker and no risk"]
dt[smoker == FALSE & has_risk == TRUE, smoke_risk_category := "Not a smoker and has risk"]
dt[smoker == TRUE & has_risk == FALSE, smoke_risk_category := "smoker and no risk"]
dt[smoker == TRUE & has_risk == TRUE, smoke_risk_category := "smoker and has risk"]
dt[smoker == FALSE & has_infection == FALSE, smoke_infect_category := "Not a smoker and no infection"]
dt[smoker == FALSE & has_infection == TRUE, smoke_infect_category := "Not a smoker and has infection"]
dt[smoker == TRUE & has_infection == FALSE, smoke_infect_category := "smoker and no infection"]
dt[smoker == TRUE & has_infection == TRUE, smoke_infect_category := "smoker and has infection"]
dt[smoker == FALSE & has_infection == FALSE & has_risk == FALSE, smoke_i_r_category := "Nonsmoker:no infection or risk"]
dt[smoker == FALSE & has_infection == TRUE & has_risk == TRUE, smoke_i_r_category := "Nonsmoker:has infection & risk"]
dt[smoker == TRUE & has_infection == FALSE & has_risk == FALSE, smoke_i_r_category := "Smoker:no infection or risk"]
dt[smoker == TRUE & has_infection == TRUE & has_risk == TRUE, smoke_i_r_category := "Smoker: has infection & risk"]
dt[ , .(mean(babyweight_lbs)), by = .(bmi_category, smoke_risk_category)][order(smoke_risk_category, bmi_category)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| bmi_category | smoke_risk_category | V1 |
|---|---|---|
|
Not a smoker and has risk | 6.557711 |
|
Not a smoker and has risk | 6.968372 |
|
Not a smoker and has risk | 7.101688 |
|
Not a smoker and has risk | 7.185557 |
| NA | Not a smoker and has risk | 6.829529 |
|
Not a smoker and no risk | 6.846959 |
|
Not a smoker and no risk | 7.234404 |
|
Not a smoker and no risk | 7.362274 |
|
Not a smoker and no risk | 7.395239 |
| NA | Not a smoker and no risk | 7.072394 |
|
smoker and has risk | 6.074468 |
|
smoker and has risk | 6.507460 |
|
smoker and has risk | 6.773441 |
|
smoker and has risk | 6.985766 |
| NA | smoker and has risk | 6.339734 |
|
smoker and no risk | 6.484968 |
|
smoker and no risk | 6.834372 |
|
smoker and no risk | 7.051636 |
|
smoker and no risk | 7.159840 |
| NA | smoker and no risk | 6.537244 |
|
NA | 6.445976 |
|
NA | 6.781874 |
|
NA | 7.069293 |
|
NA | 7.186677 |
| NA | NA | 6.504703 |
WOW! Underweight mothers who have a known risk and smoke have on average a significantly lower weight baby in comparison to a healthy mother by a 1.16 lb difference.
dt[ , .(mean(babyweight_lbs)), by = .(bmi_category, smoke_infect_category)][order(smoke_infect_category, bmi_category)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| bmi_category | smoke_infect_category | V1 |
|---|---|---|
|
Not a smoker and has infection | 6.557840 |
|
Not a smoker and has infection | 6.879859 |
|
Not a smoker and has infection | 7.037077 |
|
Not a smoker and has infection | 7.112889 |
| NA | Not a smoker and has infection | 6.625991 |
|
Not a smoker and no infection | 6.802793 |
|
Not a smoker and no infection | 7.179467 |
|
Not a smoker and no infection | 7.286381 |
|
Not a smoker and no infection | 7.308549 |
| NA | Not a smoker and no infection | 7.012842 |
|
smoker and has infection | 6.267857 |
|
smoker and has infection | 6.567159 |
|
smoker and has infection | 6.737807 |
|
smoker and has infection | 6.931946 |
| NA | smoker and has infection | 6.208082 |
|
smoker and no infection | 6.409434 |
|
smoker and no infection | 6.770930 |
|
smoker and no infection | 6.980036 |
|
smoker and no infection | 7.090450 |
| NA | smoker and no infection | 6.508547 |
|
NA | 6.053150 |
|
NA | 6.542826 |
|
NA | 6.732803 |
|
NA | 6.736082 |
| NA | NA | 6.409067 |
An underweight mother who smokes and has a known infection has on average a 0.91 lb difference compared to a healthy mother.
dt[ , .(n = .N, babyweight_lbs = mean(babyweight_lbs)), by = .(bmi_category, smoke_i_r_category)][order(smoke_i_r_category, bmi_category)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| bmi_category | smoke_i_r_category | n | babyweight_lbs |
|---|---|---|---|
|
Nonsmoker:has infection & risk | 650 | 6.302164 |
|
Nonsmoker:has infection & risk | 7251 | 6.640815 |
|
Nonsmoker:has infection & risk | 5867 | 6.812269 |
|
Nonsmoker:has infection & risk | 8923 | 6.987495 |
| NA | Nonsmoker:has infection & risk | 574 | 6.450014 |
|
Nonsmoker:no infection or risk | 79910 | 6.855686 |
|
Nonsmoker:no infection or risk | 1076981 | 7.241397 |
|
Nonsmoker:no infection or risk | 607300 | 7.368130 |
|
Nonsmoker:no infection or risk | 512472 | 7.400858 |
| NA | Nonsmoker:no infection or risk | 50664 | 7.086106 |
|
Smoker: has infection & risk | 357 | 5.902515 |
|
Smoker: has infection & risk | 3519 | 6.354822 |
|
Smoker: has infection & risk | 2186 | 6.575385 |
|
Smoker: has infection & risk | 2573 | 6.827571 |
| NA | Smoker: has infection & risk | 318 | 6.086084 |
|
Smoker:no infection or risk | 11774 | 6.497483 |
|
Smoker:no infection or risk | 83970 | 6.855321 |
|
Smoker:no infection or risk | 48894 | 7.072248 |
|
Smoker:no infection or risk | 52647 | 7.170103 |
| NA | Smoker:no infection or risk | 5719 | 6.571920 |
|
NA | 25529 | 6.498823 |
|
NA | 393543 | 6.931580 |
|
NA | 323837 | 7.080848 |
|
NA | 465141 | 7.169487 |
| NA | NA | 27924 | 6.759759 |
The lowest on average weight we’ve seen yet in the analysis. Underweight mothers who smoke and have a risk and infection have significantly lower weights with a 5.9 lbs average weight - A 1.34 lb difference.
dt[ , .(mean(babyweight_lbs)), by = .(smoke_risk_category, mother_age_category)][order(mother_age_category, smoke_risk_category)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| smoke_risk_category | mother_age_category | V1 |
|---|---|---|
| Not a smoker and has risk |
|
6.643661 |
| Not a smoker and no risk |
|
6.946099 |
| smoker and has risk |
|
6.629075 |
| smoker and no risk |
|
6.886098 |
| NA |
|
6.443932 |
| Not a smoker and has risk |
|
6.910346 |
| Not a smoker and no risk |
|
7.146110 |
| smoker and has risk |
|
6.802486 |
| smoker and no risk |
|
6.961263 |
| NA |
|
6.784995 |
| Not a smoker and has risk |
|
7.118856 |
| Not a smoker and no risk |
|
7.348293 |
| smoker and has risk |
|
6.774697 |
| smoker and no risk |
|
6.951784 |
| NA |
|
6.901465 |
| Not a smoker and has risk |
|
7.087011 |
| Not a smoker and no risk |
|
7.338563 |
| smoker and has risk |
|
6.602384 |
| smoker and no risk |
|
6.810126 |
| NA |
|
6.850426 |
| Not a smoker and has risk | NA | 6.560112 |
| Not a smoker and no risk | NA | 6.224721 |
Regardless of age, if a mother is a smoker and has risk the average weight decreases. For older mothers and teen mothers, the weight of a smoker and known risk significantly decreases weight.
Advanced maternal age coupled with smoking and a known infection seems to be a key factor when determining lower birth weight.
dt[ , .(mean(babyweight_lbs)), by = .(smoke_infect_category, mother_age_category)][order(smoke_infect_category, mother_age_category)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| smoke_infect_category | mother_age_category | V1 |
|---|---|---|
| Not a smoker and has infection |
|
6.825696 |
| Not a smoker and has infection |
|
6.932546 |
| Not a smoker and has infection |
|
7.024985 |
| Not a smoker and has infection |
|
6.998519 |
| Not a smoker and has infection | NA | 5.936949 |
| Not a smoker and no infection |
|
6.918317 |
| Not a smoker and no infection |
|
7.106121 |
| Not a smoker and no infection |
|
7.281290 |
| Not a smoker and no infection |
|
7.230882 |
| Not a smoker and no infection | NA | 6.287104 |
| smoker and has infection |
|
6.871506 |
| smoker and has infection |
|
6.781263 |
| smoker and has infection |
|
6.611574 |
| smoker and has infection |
|
6.424698 |
| smoker and no infection |
|
6.854834 |
| smoker and no infection |
|
6.939886 |
| smoker and no infection |
|
6.910951 |
| smoker and no infection |
|
6.730659 |
| NA |
|
6.330203 |
| NA |
|
6.477386 |
| NA |
|
6.658244 |
| NA |
|
6.577888 |
Regardless of age, if a mother has a known infection and smokes the weight of the baby lowers signifcantly. Older mothers who smoke and have an infection show the lowest at a 6.4 lb average.
dt[ , .(mean(babyweight_lbs)), by = .(smoke_i_r_category, mother_age_category)][order(smoke_i_r_category, mother_age_category)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| smoke_i_r_category | mother_age_category | V1 |
|---|---|---|
| Nonsmoker:has infection & risk |
|
6.565531 |
| Nonsmoker:has infection & risk |
|
6.731292 |
| Nonsmoker:has infection & risk |
|
6.865472 |
| Nonsmoker:has infection & risk |
|
6.835796 |
| Nonsmoker:no infection or risk |
|
6.953718 |
| Nonsmoker:no infection or risk |
|
7.154507 |
| Nonsmoker:no infection or risk |
|
7.352264 |
| Nonsmoker:no infection or risk |
|
7.341065 |
| Nonsmoker:no infection or risk | NA | 6.231261 |
| Smoker: has infection & risk |
|
6.558642 |
| Smoker: has infection & risk |
|
6.662261 |
| Smoker: has infection & risk |
|
6.497953 |
| Smoker: has infection & risk |
|
6.349676 |
| Smoker:no infection or risk |
|
6.883792 |
| Smoker:no infection or risk |
|
6.977089 |
| Smoker:no infection or risk |
|
6.976632 |
| Smoker:no infection or risk |
|
6.834178 |
| NA |
|
6.737533 |
| NA |
|
6.911568 |
| NA |
|
7.087237 |
| NA |
|
7.059787 |
| NA | NA | 6.497795 |
Smokers who have a known risk and infection and who are of advanced age have on average lower weight babies.
ggplot(dt, aes(x = mother_ed, y = smoke_infect_category, fill = babyweight_lbs)) +
geom_tile() +
scale_fill_gradient(low="white", high="blue") +
theme(axis.text.x = element_text(angle=45))
A mothers level of education also seems to play a role. Mothers who smoke or have a known infection and have a high school or lower education seem to have lower than average birth weights.
ggplot(dt, aes(x = mother_ed, y = smoke_risk_category, fill = babyweight_lbs)) +
geom_tile() +
scale_fill_gradient(low="white", high="blue") +
theme(axis.text.x = element_text(angle=45))
Risk coupled with smoking also is indicated to have an impact on weight for mothers who have a high school or lower education.
dt[ , .(mean(babyweight_lbs)), by = .(mother_ed, smoke_risk_category)][order(smoke_risk_category, mother_ed)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| mother_ed | smoke_risk_category | V1 |
|---|---|---|
|
Not a smoker and has risk | 7.128094 |
|
Not a smoker and has risk | 6.942638 |
|
Not a smoker and has risk | 7.000002 |
|
Not a smoker and has risk | 7.059247 |
|
Not a smoker and has risk | 7.152244 |
|
Not a smoker and has risk | 7.148772 |
|
Not a smoker and has risk | 6.925055 |
|
Not a smoker and no risk | 7.282870 |
|
Not a smoker and no risk | 7.083236 |
|
Not a smoker and no risk | 7.175663 |
|
Not a smoker and no risk | 7.271049 |
|
Not a smoker and no risk | 7.400901 |
|
Not a smoker and no risk | 7.391967 |
|
Not a smoker and no risk | 7.164486 |
|
smoker and has risk | 6.546461 |
|
smoker and has risk | 6.576046 |
|
smoker and has risk | 6.752725 |
|
smoker and has risk | 6.810587 |
|
smoker and has risk | 6.940088 |
|
smoker and has risk | 7.003810 |
|
smoker and has risk | 6.313459 |
|
smoker and no risk | 6.773087 |
|
smoker and no risk | 6.780626 |
|
smoker and no risk | 6.924282 |
|
smoker and no risk | 7.016880 |
|
smoker and no risk | 7.154075 |
|
smoker and no risk | 7.306444 |
|
smoker and no risk | 6.595227 |
|
NA | 7.430649 |
|
NA | 6.659285 |
|
NA | 6.594177 |
|
NA | 6.864671 |
|
NA | 7.258410 |
|
NA | 7.236790 |
|
NA | 6.292635 |
Smoking and a known risk regardless of education level drives down weight.
dt[ , .(mean(babyweight_lbs)), by = .(mother_ed, smoke_infect_category)][order(smoke_infect_category, mother_ed)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| mother_ed | smoke_infect_category | V1 |
|---|---|---|
|
Not a smoker and has infection | 7.085872 |
|
Not a smoker and has infection | 6.895973 |
|
Not a smoker and has infection | 6.928241 |
|
Not a smoker and has infection | 6.987702 |
|
Not a smoker and has infection | 7.099854 |
|
Not a smoker and has infection | 7.180005 |
|
Not a smoker and has infection | 6.776868 |
|
Not a smoker and no infection | 7.238241 |
|
Not a smoker and no infection | 7.050682 |
|
Not a smoker and no infection | 7.131319 |
|
Not a smoker and no infection | 7.209835 |
|
Not a smoker and no infection | 7.325956 |
|
Not a smoker and no infection | 7.316162 |
|
Not a smoker and no infection | 7.096050 |
|
smoker and has infection | 6.489807 |
|
smoker and has infection | 6.570212 |
|
smoker and has infection | 6.697327 |
|
smoker and has infection | 6.727684 |
|
smoker and has infection | 6.737024 |
|
smoker and has infection | 6.910538 |
|
smoker and has infection | 6.138921 |
|
smoker and no infection | 6.719711 |
|
smoker and no infection | 6.733619 |
|
smoker and no infection | 6.885887 |
|
smoker and no infection | 6.963012 |
|
smoker and no infection | 7.089085 |
|
smoker and no infection | 7.203003 |
|
smoker and no infection | 6.548371 |
|
NA | 7.101600 |
|
NA | 6.388365 |
|
NA | 6.424440 |
|
NA | 6.577856 |
|
NA | 6.837060 |
|
NA | 6.805563 |
|
NA | 6.270521 |
Infection also drives down weight in mothers with less education. The trend still holds that smoking and infection (regardless of education level drives down weight)
dt[ , .(mean(babyweight_lbs)), by = .(smoke_i_r_category, mother_ed)][order(smoke_i_r_category, mother_ed)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| smoke_i_r_category | mother_ed | V1 |
|---|---|---|
| Nonsmoker:has infection & risk |
|
7.010168 |
| Nonsmoker:has infection & risk |
|
6.738782 |
| Nonsmoker:has infection & risk |
|
6.767663 |
| Nonsmoker:has infection & risk |
|
6.819223 |
| Nonsmoker:has infection & risk |
|
6.891021 |
| Nonsmoker:has infection & risk |
|
7.030520 |
| Nonsmoker:has infection & risk |
|
6.541675 |
| Nonsmoker:no infection or risk |
|
7.287876 |
| Nonsmoker:no infection or risk |
|
7.091954 |
| Nonsmoker:no infection or risk |
|
7.183927 |
| Nonsmoker:no infection or risk |
|
7.277111 |
| Nonsmoker:no infection or risk |
|
7.403139 |
| Nonsmoker:no infection or risk |
|
7.393043 |
| Nonsmoker:no infection or risk |
|
7.170110 |
| Smoker: has infection & risk |
|
6.440378 |
| Smoker: has infection & risk |
|
6.407769 |
| Smoker: has infection & risk |
|
6.552478 |
| Smoker: has infection & risk |
|
6.586868 |
| Smoker: has infection & risk |
|
6.650195 |
| Smoker: has infection & risk |
|
6.842458 |
| Smoker: has infection & risk |
|
6.000490 |
| Smoker:no infection or risk |
|
6.807455 |
| Smoker:no infection or risk |
|
6.798876 |
| Smoker:no infection or risk |
|
6.941398 |
| Smoker:no infection or risk |
|
7.034957 |
| Smoker:no infection or risk |
|
7.168269 |
| Smoker:no infection or risk |
|
7.309864 |
| Smoker:no infection or risk |
|
6.634552 |
| NA |
|
7.098757 |
| NA |
|
6.881254 |
| NA |
|
6.967949 |
| NA |
|
7.033591 |
| NA |
|
7.146604 |
| NA |
|
7.148474 |
| NA |
|
6.864992 |
dt[ , .(mean(babyweight_lbs)), by = .(married, smoke_risk_category)][order(smoke_risk_category, married)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| married | smoke_risk_category | V1 |
|---|---|---|
| FALSE | Not a smoker and has risk | 6.883207 |
| TRUE | Not a smoker and has risk | 7.159108 |
| NA | Not a smoker and has risk | 7.163161 |
| FALSE | Not a smoker and no risk | 7.091426 |
| TRUE | Not a smoker and no risk | 7.404219 |
| NA | Not a smoker and no risk | 7.291443 |
| FALSE | smoker and has risk | 6.675995 |
| TRUE | smoker and has risk | 6.879813 |
| NA | smoker and has risk | 6.952230 |
| FALSE | smoker and no risk | 6.882694 |
| TRUE | smoker and no risk | 7.079143 |
| NA | smoker and no risk | 7.023344 |
| FALSE | NA | 6.561099 |
| TRUE | NA | 7.212676 |
| NA | NA | 6.985597 |
Unwed mothers who have a known risk and/or smoke have on average lower birth weights.
dt[ , .(mean(babyweight_lbs)), by = .(married, smoke_infect_category)][order(smoke_infect_category, married)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| married | smoke_infect_category | V1 |
|---|---|---|
| FALSE | Not a smoker and has infection | 6.894787 |
| TRUE | Not a smoker and has infection | 7.139267 |
| NA | Not a smoker and has infection | 7.137239 |
| FALSE | Not a smoker and no infection | 7.039423 |
| TRUE | Not a smoker and no infection | 7.325896 |
| NA | Not a smoker and no infection | 7.256991 |
| FALSE | smoker and has infection | 6.646922 |
| TRUE | smoker and has infection | 6.710516 |
| NA | smoker and has infection | 6.812935 |
| FALSE | smoker and no infection | 6.837096 |
| TRUE | smoker and no infection | 7.017957 |
| NA | smoker and no infection | 7.010407 |
| FALSE | NA | 6.367787 |
| TRUE | NA | 6.848663 |
| NA | NA | 6.672506 |
Unwed mothers who also have a known infection also have the lowest weight on average. However, there is still a pattern regardless of marital status where birth weight is lower when there is a known infection and the mother smokes.
dt[ , .(mean(babyweight_lbs)), by = .(married, smoke_i_r_category)][order(smoke_i_r_category, married)] %>% kable("html") %>% kable_styling("striped") %>% scroll_box(height = "400px")
| married | smoke_i_r_category | V1 |
|---|---|---|
| FALSE | Nonsmoker:has infection & risk | 6.713503 |
| TRUE | Nonsmoker:has infection & risk | 6.977176 |
| NA | Nonsmoker:has infection & risk | 7.004074 |
| FALSE | Nonsmoker:no infection or risk | 7.099507 |
| TRUE | Nonsmoker:no infection or risk | 7.406384 |
| NA | Nonsmoker:no infection or risk | 7.292522 |
| FALSE | Smoker: has infection & risk | 6.499502 |
| TRUE | Smoker: has infection & risk | 6.580918 |
| NA | Smoker: has infection & risk | 6.615853 |
| FALSE | Smoker:no infection or risk | 6.903229 |
| TRUE | Smoker:no infection or risk | 7.092864 |
| NA | Smoker:no infection or risk | 7.029073 |
| FALSE | NA | 6.859020 |
| TRUE | NA | 7.146295 |
| NA | NA | 7.160098 |
A trend is seen when smoking, infection and risk are involved in pregnancy. Independently, across all variables of interest smoking, infection, and risk showed a dip in lowering birth weight. For example, smoking was associated with a decrease in baby weight across each category of mother’s age regardless of whether the mother was younger or older. The combination of the three variables along with a low BMI is rare but is associated with a very significant 1.34lb difference in baby weight.
dt[ , .(n = .N, babyweight_lbs = mean(babyweight_lbs)), by = smoker]
## smoker n babyweight_lbs
## 1: FALSE 3462936 7.221064
## 2: TRUE 335587 6.874621
Smokers have on average a 0.346443lb difference in birth weight.
dt[ , .(n = .N, babyweight_lbs = mean(babyweight_lbs)), by = has_risk]
## has_risk n babyweight_lbs
## 1: FALSE 2606982 7.256789
## 2: TRUE 1189034 7.045747
## 3: NA 2507 6.846969
At risk pregnancies have a 0.211042 lb difference in birth weights.
dt[ , .(n = .N, babyweight_lbs = mean(babyweight_lbs)), by = has_infection]
## has_infection n babyweight_lbs
## 1: FALSE 3685781 7.200389
## 2: TRUE 104663 6.886994
## 3: NA 8079 6.590603
Infections during pregnancy has a 0.313395 lb difference in birth weights.
dt[ , unhealthy_category := "all other"]
dt[smoker == TRUE & has_infection == TRUE & has_risk == TRUE & bmi_category == "a. mom underweight", unhealthy_category := "underweight smoker w/ risk & infec"]
dt[ , .(mean(babyweight_lbs)), by = unhealthy_category] %>% kable("html") %>% kable_styling("striped")
| unhealthy_category | V1 |
|---|---|
| all other | 7.190578 |
| underweight smoker w/ risk & infec | 5.902515 |
The combination of the three variables along with a low BMI is rare but is associated with a very significant 1.288063 difference in baby weight.
ggplot(dt, aes(x = unhealthy_category, y = babyweight_lbs, fill = unhealthy_category)) +
geom_violin() +
geom_boxplot(width = 0.1, fill = "white", color = "black") +
theme(axis.text.x = element_text(angle=45))
Next steps would be to do a linear regression model to see if infection, risk or smoking is more of a driver on weight. Other factors included in the data could also be considered.