Credit: https://th.bing.com/th/id/OIP.ioZHGJfyP7MT-rbIg7KnsgHaER?w=294&h=180&c=7&r=0&o=5&dpr=1.3&pid=1.7

library(tidyverse)
## Warning: package 'dplyr' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/dylan/OneDrive/Desktop/Data 101")
babies <- read.csv("babies.csv")
head(babies)
##   case bwt gestation parity age height weight smoke
## 1    1 120       284      0  27     62    100     0
## 2    2 113       282      0  33     64    135     0
## 3    3 128       279      0  28     64    115     1
## 4    4 123        NA      0  36     69    190     0
## 5    5 108       282      0  23     67    125     1
## 6    6 136       286      0  25     62     93     0

Introduction:

Maternal smoking during pregnancy is a significant public health concern due to its potential impact on birth outcomes, particularly birth weight. Low birth weight is associated with various health complications in newborns, including increased risk of neonatal mortality, developmental delays, and chronic health conditions later in life. Understanding the relationship between smoking and birth weight is crucial for developing effective interventions to improve maternal and child health outcomes.

This study aims to investigate the effect of maternal smoking on birth weight using a dataset that includes variables such as birth weight, smoking status, and other relevant maternal characteristics. By analyzing the data, we seek to determine whether smoking during pregnancy has a statistically significant impact on the birth weight of newborns, which could inform public health strategies to reduce smoking among pregnant women and improve infant health outcomes.

Aproach:

To analyze the effect of maternal smoking on birth weight, we will employ a statistical approach involving linear regression analysis. First, we will clean the dataset by removing any rows with missing or inconsistent data to ensure accurate results. We will then create a new data frame focusing on the birth weight and smoking status of the mothers. A dot plot will be used for initial visual exploration, allowing us to observe the distribution of birth weights across smoking and non-smoking groups. Following this, a linear regression model will be fitted, with birth weight as the dependent variable and smoking status as the independent variable, to quantify the relationship and assess its statistical significance.

Analysis:

summary(babies)
##       case             bwt          gestation         parity      
##  Min.   :   1.0   Min.   : 55.0   Min.   :148.0   Min.   :0.0000  
##  1st Qu.: 309.8   1st Qu.:108.8   1st Qu.:272.0   1st Qu.:0.0000  
##  Median : 618.5   Median :120.0   Median :280.0   Median :0.0000  
##  Mean   : 618.5   Mean   :119.6   Mean   :279.3   Mean   :0.2549  
##  3rd Qu.: 927.2   3rd Qu.:131.0   3rd Qu.:288.0   3rd Qu.:1.0000  
##  Max.   :1236.0   Max.   :176.0   Max.   :353.0   Max.   :1.0000  
##                                   NA's   :13                      
##       age            height          weight          smoke       
##  Min.   :15.00   Min.   :53.00   Min.   : 87.0   Min.   :0.0000  
##  1st Qu.:23.00   1st Qu.:62.00   1st Qu.:114.8   1st Qu.:0.0000  
##  Median :26.00   Median :64.00   Median :125.0   Median :0.0000  
##  Mean   :27.26   Mean   :64.05   Mean   :128.6   Mean   :0.3948  
##  3rd Qu.:31.00   3rd Qu.:66.00   3rd Qu.:139.0   3rd Qu.:1.0000  
##  Max.   :45.00   Max.   :72.00   Max.   :250.0   Max.   :1.0000  
##  NA's   :2       NA's   :22      NA's   :36      NA's   :10
str(babies)
## 'data.frame':    1236 obs. of  8 variables:
##  $ case     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ bwt      : int  120 113 128 123 108 136 138 132 120 143 ...
##  $ gestation: int  284 282 279 NA 282 286 244 245 289 299 ...
##  $ parity   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ age      : int  27 33 28 36 23 25 33 23 25 30 ...
##  $ height   : int  62 64 64 69 67 62 62 65 62 66 ...
##  $ weight   : int  100 135 115 190 125 93 178 140 125 136 ...
##  $ smoke    : int  0 0 1 0 1 0 0 0 0 1 ...
ggplot(babies, aes(x = smoke, y = bwt)) +
  geom_jitter(width = 0.2, height = 0, alpha = 0.6, color = "blue") +
  labs(title = "Birthweight by Smoking Status",
       x = "Smoking Status (0 = Non-smoker, 1 = Smoker)",
       y = "Birthweight (oz)") +
  theme_minimal()
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).

model <- lm(bwt ~ smoke, data = babies)
summary(model)
## 
## Call:
## lm(formula = bwt ~ smoke, data = babies)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -68.05 -11.05   0.89  10.95  52.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  123.047      0.649 189.597   <2e-16 ***
## smoke         -8.938      1.033  -8.653   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.68 on 1224 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.05764,    Adjusted R-squared:  0.05687 
## F-statistic: 74.87 on 1 and 1224 DF,  p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(model)

## Discussion: The analysis revealed a significant relationship between maternal smoking and birth weight. The dot plot provided an initial visualization of the data, indicating that babies born to smoking mothers generally had lower birth weights compared to those born to non-smoking mothers. This visual trend was confirmed through the linear regression analysis, where the coefficient for smoking status was found to be negative and statistically significant. This suggests that smoking during pregnancy is associated with a reduction in birth weight, aligning with previous research findings on this topic.

However, while the regression model shows a significant association, it is important to consider potential confounding factors that might also influence birth weight. Variables such as maternal age, weight, height, parity, and length of gestation can play critical roles in determining birth weight and should be accounted for in a more comprehensive analysis. Future studies could extend this analysis by incorporating these additional variables into a multivariable regression model to control for their effects and better isolate the impact of smoking.

In conclusion, the results of this study underscore the adverse effects of maternal smoking on birth weight, reinforcing the need for targeted public health interventions. Smoking cessation programs aimed at pregnant women could significantly reduce the incidence of low birth weight and improve overall neonatal health outcomes. Further research with larger datasets and more comprehensive models is recommended to deepen the understanding of how smoking and other factors interact to influence birth weight, which could enhance the effectiveness of public health strategies.