Viewing all variables
library(MASS)
data("birthwt")
str(birthwt)
'data.frame': 189 obs. of 10 variables:
$ low : int 0 0 0 0 0 0 0 0 0 0 ...
$ age : int 19 33 20 21 18 21 22 17 29 26 ...
$ lwt : int 182 155 105 108 107 124 118 103 123 113 ...
$ race : int 2 3 1 1 1 3 1 3 1 1 ...
$ smoke: int 0 0 1 1 1 0 0 0 1 1 ...
$ ptl : int 0 0 0 0 0 0 0 0 0 0 ...
$ ht : int 0 0 0 0 0 0 0 0 0 0 ...
$ ui : int 1 0 0 1 1 0 0 0 0 0 ...
$ ftv : int 0 3 1 2 0 0 1 1 1 0 ...
$ bwt : int 2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...
library(ggplot2)
ggplot(birthwt, aes(x=bwt))+ geom_histogram()
The birth weight distribution resembles the normal bell curve, indicating the average birth weight is around 3000 grams.
Birth weight is the dependent variable (continuous) and race and smoking are the independent variables (factor variables)
library(dplyr)
library(magrittr)
birthwt2<-birthwt %>%
mutate(race2=factor(ifelse(race==1,"white",
ifelse(race==2,"black",
ifelse(race==3,"other", "NA"))),
levels=c("white","black","other")))
head(birthwt2)
birthwt3<-birthwt2 %>%
mutate(smoke2=factor(ifelse(smoke==0,"nonsmoker",
ifelse(smoke==1,"smoker", "NA")),
levels=c("nonsmoker","smoker")))
head(birthwt3)
m1<-lm(bwt ~ race2, data=birthwt3)
summary(m1)
Call:
lm(formula = bwt ~ race2, data = birthwt3)
Residuals:
Min 1Q Median 3Q Max
-2096.28 -502.72 -12.72 526.28 1887.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3102.72 72.92 42.548 < 2e-16 ***
race2black -383.03 157.96 -2.425 0.01627 *
race2other -297.44 113.74 -2.615 0.00965 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 714.5 on 186 degrees of freedom
Multiple R-squared: 0.05017, Adjusted R-squared: 0.03996
F-statistic: 4.913 on 2 and 186 DF, p-value: 0.008336
Model1 shows the average infant birth weight of whites (reference group) is 3,103 grams, and is statistically significant. The average infant weight of the black group decreases by 383 grams and the average infant weight of other race groups decreases by 297 grams, both groups are statistically significant.
m2<-lm(bwt ~ smoke2, data=birthwt3)
summary(m2)
Call:
lm(formula = bwt ~ smoke2, data = birthwt3)
Residuals:
Min 1Q Median 3Q Max
-2062.9 -475.9 34.3 545.1 1934.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3055.70 66.93 45.653 < 2e-16 ***
smoke2smoker -283.78 106.97 -2.653 0.00867 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 717.8 on 187 degrees of freedom
Multiple R-squared: 0.03627, Adjusted R-squared: 0.03112
F-statistic: 7.038 on 1 and 187 DF, p-value: 0.008667
Model2 shows the average infant birth weight of nonsmokers (reference group) is 3,056 grams. However, the average infant birth weight for smokers is decreased by 283 grams. Both smokers and nonsmokers are statistically significant. It is important to note that the average infant birth weight for smokers is less than nonsmokers, as expected.
m3 <- lm(bwt ~ race2+smoke2, data=birthwt3)
summary(m3)
Call:
lm(formula = bwt ~ race2 + smoke2, data = birthwt3)
Residuals:
Min 1Q Median 3Q Max
-2313.95 -440.22 15.78 492.14 1655.05
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3334.95 91.78 36.338 < 2e-16 ***
race2black -450.36 153.12 -2.941 0.003687 **
race2other -452.88 116.48 -3.888 0.000141 ***
smoke2smoker -428.73 109.04 -3.932 0.000119 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 688.2 on 185 degrees of freedom
Multiple R-squared: 0.1234, Adjusted R-squared: 0.1092
F-statistic: 8.683 on 3 and 185 DF, p-value: 2.027e-05
Model 3 shows the average infant birth weight for whites and nonsmokers is 3,335 grams, and is statistically significant. Controlling for smoking, the black group has on average 450 grams lower infant birth weight and the other race group on average has 452 grams lower infant birth weight. Controlling for race, smokers on average have 428 grams lower infant birth weight. This model shows that both race and smoking influences low infant birth weight.If we take a look at the p value, we see that p<.05 indicating that the results are statistically significant.
m4<-lm(bwt ~ race2+smoke2+as.factor(ht),data=birthwt3)
summary(m4)
Call:
lm(formula = bwt ~ race2 + smoke2 + as.factor(ht), data = birthwt3)
Residuals:
Min 1Q Median 3Q Max
-2331.70 -462.03 -6.03 474.30 1637.30
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3352.70 91.65 36.580 < 2e-16 ***
race2black -425.47 152.68 -2.787 0.005882 **
race2other -448.49 115.72 -3.876 0.000148 ***
smoke2smoker -424.68 108.33 -3.920 0.000125 ***
as.factor(ht)1 -383.06 204.73 -1.871 0.062932 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 683.6 on 184 degrees of freedom
Multiple R-squared: 0.1398, Adjusted R-squared: 0.1211
F-statistic: 7.475 on 4 and 184 DF, p-value: 1.335e-05
Model4 includes an addition of another variable which is the history of hypertension. I convered this variable to a factor variable as it is numeric in the dataset. The dummy variables indicates whether the mother had history of hypertension or not. I wanted to see whether all three variables had an effect on low infant birth weight.The average infant weight of the white group is 3,353 grams, and we see that women with a history of hypertension on average have an infant of 383 grams lower birth weight. Hypertension does indeed have an effect on low infant birth rate.
library(texreg)
screenreg(list(m1,m2,m3,m4))
==================================================================
Model 1 Model 2 Model 3 Model 4
------------------------------------------------------------------
(Intercept) 3102.72 *** 3055.70 *** 3334.95 *** 3352.70 ***
(72.92) (66.93) (91.78) (91.65)
race2black -383.03 * -450.36 ** -425.47 **
(157.96) (153.12) (152.68)
race2other -297.44 ** -452.88 *** -448.49 ***
(113.74) (116.48) (115.72)
smoke2smoker -283.78 ** -428.73 *** -424.68 ***
(106.97) (109.04) (108.33)
as.factor(ht)1 -383.06
(204.73)
------------------------------------------------------------------
R^2 0.05 0.04 0.12 0.14
Adj. R^2 0.04 0.03 0.11 0.12
Num. obs. 189 189 189 189
RMSE 714.50 717.78 688.25 683.64
==================================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
m5<- lm(bwt ~ race2*smoke2, data=birthwt3)
summary(m5)
Call:
lm(formula = bwt ~ race2 * smoke2, data = birthwt3)
Residuals:
Min 1Q Median 3Q Max
-2407.75 -416.50 31.25 462.50 1561.25
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3428.7 103.1 33.268 < 2e-16 ***
race2black -574.3 199.6 -2.877 0.00449 **
race2other -613.0 138.3 -4.433 1.60e-05 ***
smoke2smoker -601.9 140.0 -4.298 2.79e-05 ***
race2black:smoke2smoker 251.4 309.1 0.813 0.41712
race2other:smoke2smoker 543.3 259.0 2.098 0.03728 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 683.6 on 183 degrees of freedom
Multiple R-squared: 0.1444, Adjusted R-squared: 0.1211
F-statistic: 6.179 on 5 and 183 DF, p-value: 2.562e-05
Model 5 shows the average infant birth weight of the white group is 3,428 grams. For the black group who are smokers, on average their infant birth weight is 251 grams more. For the other race group who are smokers, the average infant birth weight is 543 grams more. The interaction between the other race group and smoking is significant, p<.05. Therefore, smoking is dependent on race.
m6<- lm(bwt ~ race2+smoke2*as.factor(ht), data=birthwt3)
summary(m6)
Call:
lm(formula = bwt ~ race2 + smoke2 * as.factor(ht), data = birthwt3)
Residuals:
Min 1Q Median 3Q Max
-2333.75 -460.78 19.56 468.86 1635.25
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3354.75 91.97 36.478 < 2e-16 ***
race2black -421.62 153.25 -2.751 0.006536 **
race2other -443.97 116.41 -3.814 0.000187 ***
smoke2smoker -435.34 111.14 -3.917 0.000126 ***
as.factor(ht)1 -461.02 268.86 -1.715 0.088088 .
smoke2smoker:as.factor(ht)1 186.93 416.58 0.449 0.654156
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 685.1 on 183 degrees of freedom
Multiple R-squared: 0.1407, Adjusted R-squared: 0.1173
F-statistic: 5.994 on 5 and 183 DF, p-value: 3.672e-05
Model6 shows that there is no interaction between smoking and hypertension as the results are not significant p>.05.
library(texreg)
screenreg(list(m5,m6))
For this assignment, I decided to use the “birthwt” data frame which was collected at Baystate Medical Center, Springfield, Mass during 1986. I wanted to create linear models for a few variables to see which risk factors are associated with low infant birth weight. I started with a simple model using one independent variable to see the relationship between race and birth weight. The first model shows the average infant birth weight based on the race groups in this dataset which include whites, blacks, and an other group. In the second model, I chose a different independent variable which is smoking. Research has shown that smoking during pregnancy is harmful to the baby’s health. This model shows the average infant birth weight between smokers and nonsmokers. The third model is to see the effect race and smoking has on infant birth weight. In the fourth model, I wanted to see whether 3 variables: race, smoking, and the history of hypertension in women had an effect on infant birth weight. I included an interaction model in the fifth model to see whether there is an interaction between race and smoking. In other words, if smoking is dependent on race. The sixth and last model includes the interaction between smoking and hypertension with differences in race to see the effect on infant birth weight. The results are shown belown each model.
Hosmer, D.W. and Lemeshow, S. (1989) Applied Logistic Regression. New York: Wiley
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.