This dataset contains 10 variables. The data were collected at the Baystate Medical Center, Springfield, Mass during the year of 1986. The data contains risk factors associated with low infant birth weight. I will look at the birth weight of the infants and conduct a regression analysis using a dependent variable “bwt” which is birth weight and regress it on a mothers age, mothers race and whether or not the mother smoked during pregnancy.
library(tidyverse)
library(radiant.data)
library(magrittr)
library(dplyr)
library(Zelig)
library(pander)
library(texreg)
library(visreg)
library(lmtest)
library(sjmisc)
library(MASS)
Previewing The Dataset
data("birthwt")
head(birthwt)
Summary of birthwt
summary(birthwt)
low age lwt race smoke ptl
Min. :0.0000 Min. :14.00 Min. : 80.0 Min. :1.000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:19.00 1st Qu.:110.0 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :23.00 Median :121.0 Median :1.000 Median :0.0000 Median :0.0000
Mean :0.3122 Mean :23.24 Mean :129.8 Mean :1.847 Mean :0.3915 Mean :0.1958
3rd Qu.:1.0000 3rd Qu.:26.00 3rd Qu.:140.0 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :45.00 Max. :250.0 Max. :3.000 Max. :1.0000 Max. :3.0000
ht ui ftv bwt
Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. : 709
1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:2414
Median :0.00000 Median :0.0000 Median :0.0000 Median :2977
Mean :0.06349 Mean :0.1481 Mean :0.7937 Mean :2945
3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:3487
Max. :1.00000 Max. :1.0000 Max. :6.0000 Max. :4990
dim(birthwt)
[1] 189 10
Piping to recode variables and selecting variables that will be used to analyze
birthwt2 <- birthwt %>%
rename(hypertension = ht, mothers_age = age, previous_premature_labours = ptl, dr_visits_first_trimester = ftv, birth_weight_ingrams = bwt) %>%
mutate (race = factor(ifelse(race == 1, "white",
ifelse(race == 2, "black",
ifelse(race == 3, "other", "error")))))
head(birthwt2)
Linear regression model 1 we can see that babies who are born from white mothers of the same age are born on average weighing 444.07 grams heavier than mothers of black babies(intercept). This relationship is significant at a .001 level. Model 1 also shows that mothers who smoke on average are their babies are born -426.09 grams less than babies of mothers who did not smoke. This relationship is significant at a 0 level.
lm0 <- lm(birth_weight_ingrams ~ race + smoke + mothers_age, data = birthwt2)
summary(lm0)
Call:
lm(formula = birth_weight_ingrams ~ race + smoke + mothers_age,
data = birthwt2)
Residuals:
Min 1Q Median 3Q Max
-2322.6 -447.3 28.4 502.2 1612.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2837.604 257.573 11.017 < 0.0000000000000002 ***
raceother -3.789 161.115 -0.024 0.981264
racewhite 444.069 156.194 2.843 0.004973 **
smoke -426.093 109.988 -3.874 0.000149 ***
mothers_age 2.134 9.771 0.218 0.827326
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 690 on 184 degrees of freedom
Multiple R-squared: 0.1236, Adjusted R-squared: 0.1046
F-statistic: 6.49 on 4 and 184 DF, p-value: 0.00006592
In model 2 we included an interaction between whether a mother has smoked during pregnancy and a mothers race. The findings show that among white mothers their infants are born weighing an average of more than 580 grams heavier than infants of black mothers however mothers who were white on average had a baby which weighed -259 grams less. -The model shows that mothers of same age and were white had a significant relationship at a .001 level.
lm1 <- lm(birth_weight_ingrams ~ smoke * race + mothers_age, data = birthwt2)
summary(lm1)
Call:
lm(formula = birth_weight_ingrams ~ smoke * race + mothers_age,
data = birthwt2)
Residuals:
Min 1Q Median 3Q Max
-2404.55 -417.73 29.57 464.72 1581.63
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2875.916 262.925 10.938 < 0.0000000000000002 ***
smoke -346.029 279.453 -1.238 0.21722
raceother -36.112 196.220 -0.184 0.85419
racewhite 580.786 209.173 2.777 0.00607 **
mothers_age -1.074 10.001 -0.107 0.91459
smoke:raceother 287.560 354.521 0.811 0.41835
smoke:racewhite -259.308 318.580 -0.814 0.41674
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 685.5 on 182 degrees of freedom
Multiple R-squared: 0.1445, Adjusted R-squared: 0.1163
F-statistic: 5.123 on 6 and 182 DF, p-value: 0.00006899
library(texreg)
screenreg(list(lm0, lm1))
=========================================
Model 1 Model 2
-----------------------------------------
(Intercept) 2837.60 *** 2875.92 ***
(257.57) (262.93)
raceother -3.79 -36.11
(161.11) (196.22)
racewhite 444.07 ** 580.79 **
(156.19) (209.17)
smoke -426.09 *** -346.03
(109.99) (279.45)
mothers_age 2.13 -1.07
(9.77) (10.00)
smoke:raceother 287.56
(354.52)
smoke:racewhite -259.31
(318.58)
-----------------------------------------
R^2 0.12 0.14
Adj. R^2 0.10 0.12
Num. obs. 189 189
RMSE 690.02 685.50
=========================================
*** p < 0.001, ** p < 0.01, * p < 0.05
library(texreg)
htmlreg(list(lm0, lm1))
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<table cellspacing="0" align="center" style="border: none;">
<caption align="bottom" style="margin-top:0.3em;">Statistical models</caption>
<tr>
<th style="text-align: left; border-top: 2px solid black; border-bottom: 1px solid black; padding-right: 12px;"><b></b></th>
<th style="text-align: left; border-top: 2px solid black; border-bottom: 1px solid black; padding-right: 12px;"><b>Model 1</b></th>
<th style="text-align: left; border-top: 2px solid black; border-bottom: 1px solid black; padding-right: 12px;"><b>Model 2</b></th>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">(Intercept)</td>
<td style="padding-right: 12px; border: none;">2837.60<sup style="vertical-align: 0px;">***</sup></td>
<td style="padding-right: 12px; border: none;">2875.92<sup style="vertical-align: 0px;">***</sup></td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">(257.57)</td>
<td style="padding-right: 12px; border: none;">(262.93)</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">raceother</td>
<td style="padding-right: 12px; border: none;">-3.79</td>
<td style="padding-right: 12px; border: none;">-36.11</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">(161.11)</td>
<td style="padding-right: 12px; border: none;">(196.22)</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">racewhite</td>
<td style="padding-right: 12px; border: none;">444.07<sup style="vertical-align: 0px;">**</sup></td>
<td style="padding-right: 12px; border: none;">580.79<sup style="vertical-align: 0px;">**</sup></td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">(156.19)</td>
<td style="padding-right: 12px; border: none;">(209.17)</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">smoke</td>
<td style="padding-right: 12px; border: none;">-426.09<sup style="vertical-align: 0px;">***</sup></td>
<td style="padding-right: 12px; border: none;">-346.03</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">(109.99)</td>
<td style="padding-right: 12px; border: none;">(279.45)</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">mothers_age</td>
<td style="padding-right: 12px; border: none;">2.13</td>
<td style="padding-right: 12px; border: none;">-1.07</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">(9.77)</td>
<td style="padding-right: 12px; border: none;">(10.00)</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">smoke:raceother</td>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">287.56</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">(354.52)</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">smoke:racewhite</td>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">-259.31</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;"></td>
<td style="padding-right: 12px; border: none;">(318.58)</td>
</tr>
<tr>
<td style="border-top: 1px solid black;">R<sup style="vertical-align: 0px;">2</sup></td>
<td style="border-top: 1px solid black;">0.12</td>
<td style="border-top: 1px solid black;">0.14</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">Adj. R<sup style="vertical-align: 0px;">2</sup></td>
<td style="padding-right: 12px; border: none;">0.10</td>
<td style="padding-right: 12px; border: none;">0.12</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;">Num. obs.</td>
<td style="padding-right: 12px; border: none;">189</td>
<td style="padding-right: 12px; border: none;">189</td>
</tr>
<tr>
<td style="border-bottom: 2px solid black;">RMSE</td>
<td style="border-bottom: 2px solid black;">690.02</td>
<td style="border-bottom: 2px solid black;">685.50</td>
</tr>
<tr>
<td style="padding-right: 12px; border: none;" colspan="4"><span style="font-size:0.8em"><sup style="vertical-align: 0px;">***</sup>p < 0.001, <sup style="vertical-align: 0px;">**</sup>p < 0.01, <sup style="vertical-align: 0px;">*</sup>p < 0.05</span></td>
</tr>
</table>
Group by summary of the dependent variable The summary below aligns with the regression models above in showing that depites a mothers race, whether a mother smoked or not during pregnancy had an affect on the birth weight of that mothers child. All three races who did not smoke had a higher average birth wt amongst their infants. The summary below also shows what we described in model 2 that mothers who were of white race saw the largest change in average infants birth weight in smokers and non-smokers.
library(dplyr)
Low_WT <- birthwt2 %>%
dplyr::select(birth_weight_ingrams, smoke, race) %>%
group_by (race, smoke) %>%
summarise (mean = mean(birth_weight_ingrams))
print(Low_WT)
visreg(lm1, "birth_weight_ingrams", by = "race", scale = "response")
visreg(lm1, "birth_weight_ingrams", by = "smoke", scale = "response")
visreg(lm1, "race", by = "smoke", scale = "response")
After plotting the results we can see that the findings above were correct. In the third plot amongst non-smokers, white mothers had the highest infant birth weight and the biggest jump in average birth weight amongst all races compared to smokers. In plot 1 we see how race affects birth weight in grams. Mothers of white race are more likely to have a baby of a healthy birth weight compared to blacks or other races.