The data set will be used to create logistic regression models based on a few selected predictors. The dependent variable, “low” is an indicator of birth weight less than 2.5 kg (5 lbs 8 ounces) in which “1” an unhealthy birth weight less than 2.5 kg and “0” represented a healthy birth weight above 2.5 kg. The independent variables used in this study are “smoke”, “race”, “ht” and “ui”. Smoke measured whether the mother smoked during pregnancy, race was recoded for “1” to represent white and “0” to represent all other variables and “ht” measures whether the mother had a history of hypertension or not.
library(tidyverse)
library(magrittr)
library(dplyr)
library(Zelig)
library(pander)
library(texreg)
library(visreg)
library(lmtest)
library(visreg)
library(sjmisc)
library(MASS)
data("birthwt")
birthwt$race <- as.factor(birthwt$race)
birthwt$smoke <- as.factor(birthwt$smoke)
birthwt$ui <- as.factor(birthwt$ui)
birthwt$low <- as.factor(birthwt$low)
head(birthwt)
summary(birthwt)
low age lwt race smoke ptl ht
0:130 Min. :14.00 Min. : 80.0 1:96 0:115 Min. :0.0000 Min. :0.00000
1: 59 1st Qu.:19.00 1st Qu.:110.0 2:26 1: 74 1st Qu.:0.0000 1st Qu.:0.00000
Median :23.00 Median :121.0 3:67 Median :0.0000 Median :0.00000
Mean :23.24 Mean :129.8 Mean :0.1958 Mean :0.06349
3rd Qu.:26.00 3rd Qu.:140.0 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :45.00 Max. :250.0 Max. :3.0000 Max. :1.00000
ui ftv bwt
0:161 Min. :0.0000 Min. : 709
1: 28 1st Qu.:0.0000 1st Qu.:2414
Median :0.0000 Median :2977
Mean :0.7937 Mean :2945
3rd Qu.:1.0000 3rd Qu.:3487
Max. :6.0000 Max. :4990
dim(birthwt)
[1] 189 10
birthwt2 <- birthwt%>%
rename(hypertension = ht)%>%
mutate(race = ifelse(race == 1, "white", "non-white"))
head(birthwt2)
In this first model i looked at whether a child is born with a below normal birthweight is affected by race. Looking at the results from this first model we can see that being white had a negative affect on whether a child is born below normal birth weight (-0.6954) as opposed to other races. Depending on race one may get better care than other races because of income, health, etc. Based on the p value we can see the results are statistically significant.
m0 <- glm(low ~ race, family = binomial, data = birthwt2)
summary(m0)
Call:
glm(formula = low ~ race, family = binomial, data = birthwt2)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9895 -0.9895 -0.7401 1.3777 1.6905
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.4595 0.2129 -2.159 0.0309 *
racewhite -0.6954 0.3202 -2.172 0.0298 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 229.86 on 187 degrees of freedom
AIC: 233.86
Number of Fisher Scoring iterations: 4
In this second model i added the variable smoke which indicated whether the mother smoked during pregnancy or not. In this model we can see that mothers who smoked during pregnancy had a higher chance (1.1130) of giving birth to a baby that would be below normal birth weight of 2.5 kg. This relationship is significant at a .01 confidence level.
m1 <- glm(low ~ race + smoke, family = binomial, data = birthwt2)
summary(m1)
Call:
glm(formula = low ~ race + smoke, family = binomial, data = birthwt2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3402 -0.8840 -0.5433 1.4968 1.9930
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.7382 0.2379 -3.103 0.00191 **
racewhite -1.1003 0.3645 -3.019 0.00254 **
smoke1 1.1130 0.3643 3.056 0.00225 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 219.98 on 186 degrees of freedom
AIC: 225.98
Number of Fisher Scoring iterations: 4
In this third model we added another variable called hypertension which recorded whether the mother had a history of high blood pressure. Having a history of smoking and high blood pressure can add to the chances of a child birth weight being below normal. In this model we see that hypertension does increase the likelihood that a childs birth weight will be below 2.5 kg (1.1725). However, this indicator is not statistically significant with a p-value of >.05.
m2 <- glm(low ~ race + smoke + hypertension, family = binomial, data = birthwt2)
summary(m2)
Call:
glm(formula = low ~ race + smoke + hypertension, family = binomial,
data = birthwt2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3371 -0.8496 -0.5222 1.0587 2.0297
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8331 0.2460 -3.386 0.000708 ***
racewhite -1.0904 0.3686 -2.958 0.003095 **
smoke1 1.1190 0.3681 3.040 0.002367 **
hypertension 1.1725 0.6225 1.883 0.059633 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 216.38 on 185 degrees of freedom
AIC: 224.38
Number of Fisher Scoring iterations: 4
In model 4 we examine race, smoking, hypertension and whether the mother had a UI or not to analyze the impact it has on the likelihood a child being born below normal child birth of 2.5 kg. The model shows that people of white race are still more likely to have a child that is of normal birth weight or higher also that smoking negatively affect that as does hypertension and whether the mother had a UI or not. When adding UI to the model we see that UI does increase the likelihood that a childs birth weight will be below 2.5 kg (1.0067).
m3 <- glm(low ~ race + smoke + hypertension + ui, family = binomial, data = birthwt2)
summary(m3)
Call:
glm(formula = low ~ race + smoke + hypertension + ui, family = binomial,
data = birthwt2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6588 -0.7920 -0.4829 1.1445 2.1009
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0136 0.2637 -3.844 0.000121 ***
racewhite -1.0766 0.3749 -2.872 0.004082 **
smoke1 1.0915 0.3742 2.917 0.003532 **
hypertension 1.3582 0.6290 2.159 0.030822 *
ui1 1.0067 0.4378 2.299 0.021485 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 211.17 on 184 degrees of freedom
AIC: 221.17
Number of Fisher Scoring iterations: 4
In model 4 we examine the interaction between race and hypertension while including smoke and UI to analyze the impact it has on the likelihood a child being born below normal child birth of 2.5 kg. The model shows that people of white race are still more likely to have a child that is of normal birth weight or higher also that smoking negatively affect that but there is no relationship between UI and having a child of lower birth weight.
m4 <- glm(low ~ race * hypertension + smoke + ui, family = binomial, data = birthwt2)
summary(m4)
Call:
glm(formula = low ~ race * hypertension + smoke + ui, family = binomial,
data = birthwt2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6521 -0.8115 -0.4885 1.1502 2.0905
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0597 0.2711 -3.909 9.26e-05 ***
racewhite -1.0061 0.3829 -2.628 0.00860 **
hypertension 1.8508 0.8846 2.092 0.03642 *
smoke1 1.1241 0.3765 2.986 0.00283 **
ui1 1.0052 0.4375 2.298 0.02157 *
racewhite:hypertension -1.1185 1.3127 -0.852 0.39419
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 210.42 on 183 degrees of freedom
AIC: 222.42
Number of Fisher Scoring iterations: 4
The likelihood ratio test illustrates that Model 4 is the best fit because it has the smallest deviance. Lower deviance in Model 4 means that it is a better fit for the data.
anova(m0, m1, m2, m3, m4, test = "Chisq")
Analysis of Deviance Table
Model 1: low ~ race
Model 2: low ~ race + smoke
Model 3: low ~ race + smoke + hypertension
Model 4: low ~ race + smoke + hypertension + ui
Model 5: low ~ race * hypertension + smoke + ui
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 187 229.86
2 186 219.98 1 9.8802 0.001671 **
3 185 216.38 1 3.5949 0.057956 .
4 184 211.17 1 5.2148 0.022396 *
5 183 210.42 1 0.7515 0.385993
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
From this test we see that model 4 is the best fit because it contained the lowest AIC(221.17) even though model 3 had the lowest BIC(237.35) this may be due to the fact that we found hypertension was not a serious indicator of whether a child was born with a normal birth weight above 2.5 kg.
htmlreg(list(m0, m1, m2, m3, m4))
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | ||
|---|---|---|---|---|---|---|
| (Intercept) | -0.46* | -0.74** | -0.83*** | -1.01*** | -1.06*** | |
| (0.21) | (0.24) | (0.25) | (0.26) | (0.27) | ||
| racewhite | -0.70* | -1.10** | -1.09** | -1.08** | -1.01** | |
| (0.32) | (0.36) | (0.37) | (0.37) | (0.38) | ||
| smoke1 | 1.11** | 1.12** | 1.09** | 1.12** | ||
| (0.36) | (0.37) | (0.37) | (0.38) | |||
| hypertension | 1.17 | 1.36* | 1.85* | |||
| (0.62) | (0.63) | (0.88) | ||||
| ui1 | 1.01* | 1.01* | ||||
| (0.44) | (0.44) | |||||
| racewhite:hypertension | -1.12 | |||||
| (1.31) | ||||||
| AIC | 233.86 | 225.98 | 224.38 | 221.17 | 222.42 | |
| BIC | 240.34 | 235.70 | 237.35 | 237.38 | 241.87 | |
| Log Likelihood | -114.93 | -109.99 | -108.19 | -105.58 | -105.21 | |
| Deviance | 229.86 | 219.98 | 216.38 | 211.17 | 210.42 | |
| Num. obs. | 189 | 189 | 189 | 189 | 189 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||||
Likelihood of child being born below normal birth weight by race. The graph shows us that whites have a lower chance of their child being born with low normal birth weight than non-whites.
library(visreg)
visreg(m1, "race", scale = "response")
The graphs clearly shows the relationship by which if mothers who responded yes to smoking during pregnancy had a higher chance of being born with a child that was below normal birth weight.
library(visreg)
visreg(m1, "smoke", scale = "response")