What is the difference in weight between babies born to mothers who smoked during pregnancy and those who did not? Is this difference important to the health of the baby?

library(ggplot2)
library(dplyr)
babies <- read.csv("babies.csv")

1. Summarize numerically the two distributions of birth weight for babies born to women who smoked during their pregnancy and for babies born to women who did not smoke during their pregnancy. That is, compute the five number summary, and mean and sd for each group of babies.

The five number summary consists of: the minimum, quartile one, the median, quartile three, the maximum.

##   X bwt smoke
## 1 1 120     0
## 2 2 113     0
## 3 3 128     1
## 4 4 123     0
## 5 5 108     1
## 6 6 136     0
# Babies by non-smoking mothers (0)
nosmoke_babies <- babies %>%
  filter(babies$smoke == 0)

summary(nosmoke_babies$bwt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      55     113     123     123     134     176
mean(nosmoke_babies$bwt)
## [1] 123.0472
# Babies by smoking mothers (1)
smoke_babies <- babies %>%
  filter(babies$smoke == 1)

summary(smoke_babies$bwt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    58.0   102.0   115.0   114.1   126.0   163.0
mean(smoke_babies$bwt)
## [1] 114.1095
# Babies by mothers with unknown smoking habits (9)
unknown_babies <- babies %>%
  filter(babies$smoke == 9)

summary(unknown_babies$bwt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    90.0   109.8   128.0   126.7   141.8   158.0
mean(unknown_babies$bwt)
## [1] 126.7

Calculate standard deviation:

get_sd <- function(x) {
  n <- length(x)
  result <- sd(x) * sqrt((n-1)/n)
  return(result)
}

get_sd(nosmoke_babies$bwt)
## [1] 17.38696
get_sd(smoke_babies$bwt)
## [1] 18.08024
get_sd(unknown_babies$bwt)
## [1] 20.69324

2. Use graphical methods to compare the two distributions of birth weight. That is, plot histograms and box plots for both groups. Make sure that the scales are the same.

# We only want to focus on babies with mothers who have smokes or not.
babies1 <- filter(babies, babies$smoke != 9)

3. What do you conclude? Is maternal smoking harmful to fetal health (as measured by low birthweight)? Discuss what might be confounding factors here.

We observe that with mothers who smoked, the median birthweight of their baby is less than the median birthweight of babies from mothers who do not smoke. This aggregate aligns with the US Surgeon General’s health warning that smoking may result in low birthweight. Still, without hypothesis testing, there is no sure way of deciding if there is a statistically significant difference between these two populations. The distributions are both roughly symmetric. Some confounding variables include other habits by the mother which may affect their newborn’s health, such as drug and alcohol use.