Primary Data Analysis
Logistic regression vs mother’s weight
The first analysis being done is comparing the mother’s weight to the baby’s weight. First, I need to create a linear model with the birth weight of the baby and the weight of the mother being compared. I will then use the summary() function and the ols_regress() function on it to see if there is a possible correlation between the two variables.
Fit logistical model
Weight_model <- lm(Birth_weight ~ Weight, data = Birth_Rates)
ols_regress(Weight_model)
## Model Summary
## --------------------------------------------------------------------
## R 0.186 RMSE 716.526
## R-Squared 0.034 MSE 513409.588
## Adj. R-Squared 0.029 Coef. Var 24.466
## Pred R-Squared 0.016 AIC 3011.501
## MAE 574.559 SBC 3021.210
## --------------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
## AIC: Akaike Information Criteria
## SBC: Schwarz Bayesian Criteria
##
## ANOVA
## ------------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ------------------------------------------------------------------------
## Regression 3447597.177 1 3447597.177 6.644 0.0107
## Residual 96521002.461 186 518930.121
## Total 99968599.638 187
## ------------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------------
## (Intercept) 2369.621 229.107 10.343 0.000 1917.638 2821.603
## Weight 4.429 1.718 0.186 2.578 0.011 1.039 7.819
## ----------------------------------------------------------------------------------------------
summary(Weight_model)
##
## Call:
## lm(formula = Birth_weight ~ Weight, data = Birth_Rates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2192.14 -499.39 -0.42 508.76 2075.58
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2369.621 229.107 10.343 <2e-16 ***
## Weight 4.429 1.718 2.578 0.0107 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 720.4 on 186 degrees of freedom
## Multiple R-squared: 0.03449, Adjusted R-squared: 0.0293
## F-statistic: 6.644 on 1 and 186 DF, p-value: 0.01073
Model Equation
For this model equation, we will be using b0 and b1, as opposed to B0 and B1. b0 and b1 represent values in the sample equation while B0 and B!=1 represent the population equation values. Since there are more than 189 births, this is a sample, meaning a selected group from the larger population. Based off the summary table above, b0 = 2369.62 and b1 = 4.43. This means that for a mother that weighs 0 pounds, a baby will weigh 2,369.62 grams and for every pound increase in the mothers weight, the child will go up 4.43 grams.
Baby_weight = 2369.62 + 4.43(Weight)
Hosmer-Lemeshow test
# Define models
lm_weight <- lm(Birth_weight ~ Weight, data = Birth_Rates)
alpha <- 0.05
# Build summary table
summary_tbl <- tibble(
Analysis = c(
"Birth Weight vs Weight"
),
Types = c(
rep("Quantitative vs Quantitative", 1)
),
Test = c(
rep("Linear regression", 1)
),
Pvalue = c(
broom::glance(lm_weight)$p.value
)
) %>%
mutate(
H0 = if_else(Pvalue < alpha, "Reject", "Do not Reject"),
Investigation = if_else(
H0 == "Reject",
if_else(grepl("ANOVA", Test),
"Conduct Tukey HSD",
"Examine regression coefficients"),
"None"
)
)
# Output table
knitr::kable(
summary_tbl,
caption = "Summary of preliminary statistical tests",
digits = 4
)
| Analysis | Types | Test | Pvalue | H0 | Investigation |
|---|---|---|---|---|---|
| Birth Weight vs Weight | Quantitative vs Quantitative | Linear regression | 0.0107 | Reject | Examine regression coefficients |
For this test, we decide whether there is a correlation between a mother’s weight and their baby’s weight. Since the p-value is below 0.05, we can conclude that there is a correlation between the two variables.
Logistic regression vs Age
Birth_model <- lm(Birth_weight ~ Age, data = Birth_Rates)
ols_regress(Birth_model)
## Model Summary
## --------------------------------------------------------------------
## R 0.091 RMSE 726.207
## R-Squared 0.008 MSE 527376.961
## Adj. R-Squared 0.003 Coef. Var 24.796
## Pred R-Squared -0.019 AIC 3016.547
## MAE 591.969 SBC 3026.256
## --------------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
## AIC: Akaike Information Criteria
## SBC: Schwarz Bayesian Criteria
##
## ANOVA
## ------------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ------------------------------------------------------------------------
## Regression 821731.027 1 821731.027 1.542 0.2159
## Residual 99146868.611 186 533047.681
## Total 99968599.638 187
## ------------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------------
## (Intercept) 2653.689 240.133 11.051 0.000 2179.955 3127.422
## Age 12.499 10.067 0.091 1.242 0.216 -7.361 32.358
## ----------------------------------------------------------------------------------------------
summary(Birth_model)
##
## Call:
## lm(formula = Birth_weight ~ Age, data = Birth_Rates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2294.65 -517.53 10.85 533.59 1773.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2653.69 240.13 11.051 <2e-16 ***
## Age 12.50 10.07 1.242 0.216
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 730.1 on 186 degrees of freedom
## Multiple R-squared: 0.00822, Adjusted R-squared: 0.002888
## F-statistic: 1.542 on 1 and 186 DF, p-value: 0.2159
# Define models
lm_age <- lm(Birth_weight ~ Age, data = Birth_Rates)
alpha <- 0.05
# Build summary table
summary_tbl <- tibble(
Analysis = c(
"Birth Weight vs Age"
),
Types = c(
rep("Quantitative vs Quantitative", 1)
),
Test = c(
rep("Linear regression", 1)
),
Pvalue = c(
broom::glance(lm_age)$p.value
)
) %>%
mutate(
H0 = if_else(Pvalue < alpha, "Reject", "Do not Reject"),
Investigation = if_else(
H0 == "Reject",
if_else(grepl("ANOVA", Test),
"Conduct Tukey HSD",
"Examine regression coefficients"),
"None"
)
)
# Output table
knitr::kable(
summary_tbl,
caption = "Summary of preliminary statistical tests",
digits = 4
)
| Analysis | Types | Test | Pvalue | H0 | Investigation |
|---|---|---|---|---|---|
| Birth Weight vs Age | Quantitative vs Quantitative | Linear regression | 0.2159 | Do not Reject | None |
For age, we accomplish the same tasks. We find the equation of Birth_weight = 2635.69 + 12.50(Age) to create the linear regression line. based off the Hosmer-Lemeshow test, we see a correlation, but not as strong as the weight of the mother.