## Warning: package 'purrr' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## You can cite this package as:
##      Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
##      Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167
## 
## 
## Attaching package: 'olsrr'
## 
## 
## The following object is masked from 'package:datasets':
## 
##     rivers
## Warning: package 'openintro' was built under R version 4.4.3
## Loading required package: airports
## Warning: package 'airports' was built under R version 4.4.3
## Loading required package: cherryblossom
## Warning: package 'cherryblossom' was built under R version 4.4.3
## Loading required package: usdata
## Warning: package 'usdata' was built under R version 4.4.3
## Warning: package 'statsr' was built under R version 4.4.3
## Loading required package: BayesFactor
## Loading required package: coda
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## ************
## Welcome to BayesFactor 0.9.12-4.7. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
## 
## Type BFManual() to open the manual.
## ************
## 
## Attaching package: 'statsr'
## 
## The following objects are masked from 'package:openintro':
## 
##     calc_streak, evals, nycflights, present
## Warning: package 'broom' was built under R version 4.4.3
## Warning: package 'blorr' was built under R version 4.4.3
## 
## Attaching package: 'blorr'
## 
## The following object is masked from 'package:openintro':
## 
##     hsb2
## Warning: package 'readxl' was built under R version 4.4.3

Introduction

In this project, I will be analyzing the data set on birth weights of babies and what impact certain variables may end up having. This report will go onto focus on the impacts of age and the weight of the mother specifically, while introducing hypotheses about other variables as well. This is something important to medical professionals as the best way to deal with an issue such as an underweight baby, is to be prepared for an underweight baby. If you are able to understand what factors can lead to an increased chance of a baby being underweight, you can not only be prepared yourself, but also prepare the mother and advise her on how to possibly prevent a case.

Basic Data Analysis

In this section, the preliminary analysis and prepping of the data will occur. The data set will be read in, tidied, cleaned, and wrangled, and then early hypotheses and tests are run to gather an understanding of the data being presented.

Read In Data

Birth_Rates <- read_excel("Project2_BirthWeight.xlsx")

Birth_Rates$Smoke <- as.factor(Birth_Rates$Smoke)

Summary of Data

summary(Birth_Rates)
##   Birth_weight      LowBW             Age            Weight         Smoke    
##  Min.   : 709   Min.   :0.0000   Min.   :14.00   Min.   :-999.0   FALSE:115  
##  1st Qu.:2414   1st Qu.:0.0000   1st Qu.:19.00   1st Qu.: 110.0   TRUE : 74  
##  Median :2977   Median :0.0000   Median :23.00   Median : 121.0              
##  Mean   :2945   Mean   :0.3122   Mean   :23.24   Mean   : 123.8              
##  3rd Qu.:3487   3rd Qu.:1.0000   3rd Qu.:26.00   3rd Qu.: 140.0              
##  Max.   :4990   Max.   :1.0000   Max.   :45.00   Max.   : 250.0              
##    Prev_labor         Heart          Num_DrVisits     
##  Min.   :0.0000   Min.   :0.00000   Min.   :-999.000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:   0.000  
##  Median :0.0000   Median :0.00000   Median :   0.000  
##  Mean   :0.1958   Mean   :0.06349   Mean   :  -4.492  
##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:   1.000  
##  Max.   :3.0000   Max.   :1.00000   Max.   :   6.000

Based off the summary of the data, there are 3 categorical and 5 continuous variables in the data set. The categorical categories are Smoke, Heart, and LowBW. Smoke represents whether the mother is a smoker or not, Heart represents if the mother has a heart condition with a 1 and if she doesn’t a 0, and LowBW is related to Birth_weight and is a 1 if the birth weight of the baby is below 2500 grams and 0 if the baby weighs above 2500 grams. The continuous variables are Birth_weight, age, Weight, Prev_labor, and Num_DrVisits. These represent the weight of the baby, the age of the mother, the weight of the mother, the amount of times the mother was pregnant previously, and the number of times the mother visited the doctor respectivly. There are negative numbers in both Weight and Num_DrVisits which represents an unknown value. The negative number in Num_DrVisits doesn’t need to be changed or deleted as it doesn’t impact the analysis into the age and weight of the mother. The case where the weight is unknown however, needs to be removed as it is an unknown weight, and thus can’t be used in the analysis.

Birth_Rates <- Birth_Rates[-c(130), ]

Hypothesis

Age

I think that the age of the mother will have an impact on baby weight. I believe that the older the mother is, the more mature she is likely to be and will take the pregnancy much more seriously and will know how her body will respond and how to react.

Weight

I predict that this will have a noticeable impact on the birth weight of the babies. If the mother weighs less, then she is more likely to be not eating enough for herself and her baby, which may lead to a low baby weight. A baby could be taking a lot of the nutrition away from the mother or the mother could have a fast metabolism and not put on much weight, but i predict this will not be a trend and it will ultimately lead to low baby weightI

Smoke

I think that this will have the largest impact on babies being born underweight. According to the journal SCIENCE, smoking can lead to a decreased appetite, and a decreased appetite can lead to a malnourished baby.

Previous labor

I believe that this is similar to age where the mother will be more mature and experienced and will be more likely to know how to behave during a pregnancy to achieve the best outcome, such as a normal baby weight.

Heart

I don’t predict this impacting the weight of the babies. I think that any heart conditions that may arise, will not directly cause a baby to be underweight.

Number of Doctor visits

I don’t predict this to impact the baby’s weight. There are many reasons to go to the doctor when you are pregnant, whether a condition is already known, or you are being cautious. I predict that enough careful mothers and mothers who are already aware of an issue will visit the doctor enough to not create a strong correlation.

Primary Data Analysis

Logistic regression vs mother’s weight

The first analysis being done is comparing the mother’s weight to the baby’s weight. First, I need to create a linear model with the birth weight of the baby and the weight of the mother being compared. I will then use the summary() function and the ols_regress() function on it to see if there is a possible correlation between the two variables.

Fit logistical model

Weight_model <- lm(Birth_weight ~ Weight, data = Birth_Rates)

ols_regress(Weight_model)
##                            Model Summary                             
## --------------------------------------------------------------------
## R                         0.186       RMSE                  716.526 
## R-Squared                 0.034       MSE                513409.588 
## Adj. R-Squared            0.029       Coef. Var              24.466 
## Pred R-Squared            0.016       AIC                  3011.501 
## MAE                     574.559       SBC                  3021.210 
## --------------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
##  AIC: Akaike Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
## 
##                                  ANOVA                                   
## ------------------------------------------------------------------------
##                     Sum of                                              
##                    Squares         DF    Mean Square      F        Sig. 
## ------------------------------------------------------------------------
## Regression     3447597.177          1    3447597.177    6.644    0.0107 
## Residual      96521002.461        186     518930.121                    
## Total         99968599.638        187                                   
## ------------------------------------------------------------------------
## 
##                                      Parameter Estimates                                       
## ----------------------------------------------------------------------------------------------
##       model        Beta    Std. Error    Std. Beta      t        Sig        lower       upper 
## ----------------------------------------------------------------------------------------------
## (Intercept)    2369.621       229.107                 10.343    0.000    1917.638    2821.603 
##      Weight       4.429         1.718        0.186     2.578    0.011       1.039       7.819 
## ----------------------------------------------------------------------------------------------
summary(Weight_model)
## 
## Call:
## lm(formula = Birth_weight ~ Weight, data = Birth_Rates)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2192.14  -499.39    -0.42   508.76  2075.58 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2369.621    229.107  10.343   <2e-16 ***
## Weight         4.429      1.718   2.578   0.0107 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 720.4 on 186 degrees of freedom
## Multiple R-squared:  0.03449,    Adjusted R-squared:  0.0293 
## F-statistic: 6.644 on 1 and 186 DF,  p-value: 0.01073

Model Equation

For this model equation, we will be using b0 and b1, as opposed to B0 and B1. b0 and b1 represent values in the sample equation while B0 and B!=1 represent the population equation values. Since there are more than 189 births, this is a sample, meaning a selected group from the larger population. Based off the summary table above, b0 = 2369.62 and b1 = 4.43. This means that for a mother that weighs 0 pounds, a baby will weigh 2,369.62 grams and for every pound increase in the mothers weight, the child will go up 4.43 grams.

Baby_weight = 2369.62 + 4.43(Weight)

Hosmer-Lemeshow test

# Define models
lm_weight     <- lm(Birth_weight ~ Weight, data = Birth_Rates)

alpha <- 0.05

# Build summary table
summary_tbl <- tibble(
  Analysis = c(
    "Birth Weight vs Weight"
  ),
  
  Types = c(
    rep("Quantitative vs Quantitative", 1)
  ),
  Test = c(
    rep("Linear regression", 1)
  ),
  Pvalue = c(
    broom::glance(lm_weight)$p.value
  )
) %>%
  mutate(
    H0 = if_else(Pvalue < alpha, "Reject", "Do not Reject"),
    Investigation = if_else(
      H0 == "Reject",
        if_else(grepl("ANOVA", Test),
          "Conduct Tukey HSD",
          "Examine regression coefficients"),
      "None"
    )
  )

# Output table
knitr::kable(
  summary_tbl,
  caption = "Summary of preliminary statistical tests",
  digits = 4
)
Summary of preliminary statistical tests
Analysis Types Test Pvalue H0 Investigation
Birth Weight vs Weight Quantitative vs Quantitative Linear regression 0.0107 Reject Examine regression coefficients

For this test, we decide whether there is a correlation between a mother’s weight and their baby’s weight. Since the p-value is below 0.05, we can conclude that there is a correlation between the two variables.

Logistic regression vs Age

Birth_model <- lm(Birth_weight ~ Age, data = Birth_Rates)

ols_regress(Birth_model)
##                            Model Summary                             
## --------------------------------------------------------------------
## R                         0.091       RMSE                  726.207 
## R-Squared                 0.008       MSE                527376.961 
## Adj. R-Squared            0.003       Coef. Var              24.796 
## Pred R-Squared           -0.019       AIC                  3016.547 
## MAE                     591.969       SBC                  3026.256 
## --------------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
##  AIC: Akaike Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
## 
##                                  ANOVA                                   
## ------------------------------------------------------------------------
##                     Sum of                                              
##                    Squares         DF    Mean Square      F        Sig. 
## ------------------------------------------------------------------------
## Regression      821731.027          1     821731.027    1.542    0.2159 
## Residual      99146868.611        186     533047.681                    
## Total         99968599.638        187                                   
## ------------------------------------------------------------------------
## 
##                                      Parameter Estimates                                       
## ----------------------------------------------------------------------------------------------
##       model        Beta    Std. Error    Std. Beta      t        Sig        lower       upper 
## ----------------------------------------------------------------------------------------------
## (Intercept)    2653.689       240.133                 11.051    0.000    2179.955    3127.422 
##         Age      12.499        10.067        0.091     1.242    0.216      -7.361      32.358 
## ----------------------------------------------------------------------------------------------
summary(Birth_model)
## 
## Call:
## lm(formula = Birth_weight ~ Age, data = Birth_Rates)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2294.65  -517.53    10.85   533.59  1773.87 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2653.69     240.13  11.051   <2e-16 ***
## Age            12.50      10.07   1.242    0.216    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 730.1 on 186 degrees of freedom
## Multiple R-squared:  0.00822,    Adjusted R-squared:  0.002888 
## F-statistic: 1.542 on 1 and 186 DF,  p-value: 0.2159
# Define models
lm_age     <- lm(Birth_weight ~ Age, data = Birth_Rates)

alpha <- 0.05

# Build summary table
summary_tbl <- tibble(
  Analysis = c(
    "Birth Weight vs Age"
  ),
  
  Types = c(
    rep("Quantitative vs Quantitative", 1)
  ),
  Test = c(
    rep("Linear regression", 1)
  ),
  Pvalue = c(
    broom::glance(lm_age)$p.value
  )
) %>%
  mutate(
    H0 = if_else(Pvalue < alpha, "Reject", "Do not Reject"),
    Investigation = if_else(
      H0 == "Reject",
        if_else(grepl("ANOVA", Test),
          "Conduct Tukey HSD",
          "Examine regression coefficients"),
      "None"
    )
  )

# Output table
knitr::kable(
  summary_tbl,
  caption = "Summary of preliminary statistical tests",
  digits = 4
)
Summary of preliminary statistical tests
Analysis Types Test Pvalue H0 Investigation
Birth Weight vs Age Quantitative vs Quantitative Linear regression 0.2159 Do not Reject None

For age, we accomplish the same tasks. We find the equation of Birth_weight = 2635.69 + 12.50(Age) to create the linear regression line. based off the Hosmer-Lemeshow test, we see a correlation, but not as strong as the weight of the mother.

Reflection

Overall, there is a correlation between age and a baby’s weight and a mother’s weight and a baby’s weight.

Citations

https://www.science.org/content/article/why-smokers-are-skinny