What is the relationship between low birth rates and smoking and alcohol consumption in the U.S.? It is important to note that is aggregate data and not individual cases.
library(nlme)
library(dplyr)
library(magrittr)
library(tidyr)
library(haven)
library(lmerTest)
library(ggplot2)
library(texreg)
library(radiant.data)
library(readr)
lbr1 <- read_csv("/Users/Cruz/Desktop/lbr.csv", col_names = TRUE)
Parsed with column specification:
cols(
Geo_FIPS = col_character(),
Geo_NAME = col_character(),
Geo_QNAME = col_character(),
Geo_STATE = col_character(),
Geo_COUNTY = col_character(),
SE_T003_001 = col_double(),
SE_NV002_001 = col_integer(),
SE_T011_001 = col_double(),
SE_T011_002 = col_double()
)
head(lbr1)
lbr1 <- rename(lbr1, state = Geo_STATE, county = Geo_COUNTY, lowbirth = SE_NV002_001,smokers = SE_T011_001,alcohol = SE_T011_002)
Originally the data contained 3141 observations. After running na.omit the leftover cases were 3041 observations.
lbr1<-na.omit(lbr1)
head(lbr1)
ggplot(lbr1, aes(x = lowbirth, y = state)) + geom_jitter(alpha = 0.5)
ggplot(lbr1, aes(x = lowbirth, y = smokers*alcohol)) + geom_jitter(alpha = 0.5)
ggplot(lbr1, aes(x = lowbirth, y = smokers, color = "Respondents")) + geom_jitter(alpha = 0.5)
ggplot(lbr1, aes(x = lowbirth, y = alcohol, color = "Respondents")) + geom_jitter(alpha = 0.5)
We see that Texas and California have the highest low birth rates in the United States.
library(radiant.data)
library(readr)
library(stringr)
X<-data.frame(str_locate(lbr1$Geo_QNAME,"County,"))
X2<-X%>%select(end)
lbr1$loc <- X2$end
lbr1<-lbr1%>%
mutate(statename = substr(Geo_QNAME, loc+1,length(Geo_QNAME)))
lbr1
ggplot(data=lbr1, aes(x=statename, y=lowbirth))+
geom_col(color ="orange", fill = "black")+coord_flip()
In understanding low birth rates in the U.S I am using state as higher level observation.
Model 1 displays that the lowbirth intercept is 2416.17. Smokers have less of an impact on low birth rate than alcohol consumers with a value of (-95.11). In contrast alcohol consumers have more of an impact on low birth rates in the U.S with a value of 6.19.
lbreg <- lm(lowbirth ~ smokers + alcohol, data = lbr1)
screenreg(lbreg)
========================
Model 1
------------------------
(Intercept) 2416.17 ***
(451.70)
smokers -95.11 ***
(13.93)
alcohol 6.19
(15.69)
------------------------
R^2 0.02
Adj. R^2 0.02
Num. obs. 3041
RMSE 2499.42
========================
*** p < 0.001, ** p < 0.01, * p < 0.05
visreg::visreg(lbreg)
htmlreg(list(lbreg))
Model 1 | ||
---|---|---|
(Intercept) | 2416.17*** | |
(451.70) | ||
smokers | -95.11*** | |
(13.93) | ||
alcohol | 6.19 | |
(15.69) | ||
R2 | 0.02 | |
Adj. R2 | 0.02 | |
Num. obs. | 3041 | |
RMSE | 2499.42 | |
p < 0.001, p < 0.01, p < 0.05 |
As we see this model displays a low birth intercept of 1497.37. Smokers have less impact on low birth weight than alcohol consumers with a value of (-41.50) and alcohol consumers with a value of (70.46). When the interacton term is introduced we see that smokers and alcohol cosumers have a slightly negative impact on low birth weights in the U.S.
lbr3 <- gls(lowbirth ~ smokers*alcohol, data = lbr1, method = "ML")
screenreg(lbr3)
===========================
Model 1
---------------------------
(Intercept) 1397.37
(1129.86)
smokers -41.50
(56.25)
alcohol 70.46
(67.19)
smokers:alcohol -3.46
(3.52)
---------------------------
AIC 56220.45
BIC 56250.55
Log Likelihood -28105.22
Num. obs. 3041
===========================
*** p < 0.001, ** p < 0.01, * p < 0.05
plot(lbr3)
htmlreg(list(lbr3))
Model 1 | ||
---|---|---|
(Intercept) | 1397.37 | |
(1129.86) | ||
smokers | -41.50 | |
(56.25) | ||
alcohol | 70.46 | |
(67.19) | ||
smokers:alcohol | -3.46 | |
(3.52) | ||
AIC | 56220.45 | |
BIC | 56250.55 | |
Log Likelihood | -28105.22 | |
Num. obs. | 3041 | |
p < 0.001, p < 0.01, p < 0.05 |
Intercept of low birth rates in the U.S range from 1,000 to 5,000 by states when looking at smokers.
dcoef <- lbr1 %>%
group_by(state) %>%
do(mod = lm(lowbirth ~ smokers, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram(color = "black", fill = "aqua marine1")
Alcohol Coefficient range is from 100 -500.
dcoef <- lbr1 %>%
group_by(state) %>%
do(mod = lm(lowbirth ~ smokers + alcohol, data = .))
coef2 <- dcoef %>%
do(data.frame(alcohol = coef(.$mod)[3]))
ggplot(coef2, aes(x = alcohol)) + geom_histogram(color = "black", fill = "orange")
The relationship between smokers and low birth weight will vary by state. Some will display a more a negative relationship and others will display a negative relationship however, it seems in this distribution it will more likely display a negative intercept.
dcoef <- lbr1 %>%
group_by(state) %>%
do(mod = lm(lowbirth ~ smokers, data = .))
coef <- dcoef %>% do(data.frame(smokers = coef(.$mod)[2]))
ggplot(coef, aes(x = smokers)) + geom_histogram(color = "orange", fill = "purple")
Accounting for lowbirth weight variations between states. We see consistent relationships with the models above.
m1_lme <- lme(lowbirth ~ smokers, data = lbr1, random = ~1|state, method = "ML")
summary(m1_lme)
Linear mixed-effects model fit by maximum likelihood
Data: lbr1
AIC BIC logLik
56118.83 56142.91 -28055.41
Random effects:
Formula: ~1 | state
(Intercept) Residual
StdDev: 690.6763 2425.356
Fixed effects: lowbirth ~ smokers
Value Std.Error DF t-value p-value
(Intercept) 2636.0208 316.6104 2989 8.325756 0
smokers -95.9151 16.2930 2989 -5.886878 0
Correlation:
(Intr)
smokers -0.934
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.60377699 -0.24778134 -0.13633179 -0.00628854 26.93777387
Number of Observations: 3041
Number of Groups: 51
screenreg(m1_lme)
=============================
Model 1
-----------------------------
(Intercept) 2636.02 ***
(316.61)
smokers -95.92 ***
(16.29)
-----------------------------
AIC 56118.83
BIC 56142.91
Log Likelihood -28055.41
Num. obs. 3041
Num. groups 51
=============================
*** p < 0.001, ** p < 0.01, * p < 0.05
m2_lme <- lme(lowbirth ~ alcohol, data = lbr1, random = ~ alcohol|state, method = "ML")
summary(m2_lme)
Linear mixed-effects model fit by maximum likelihood
Data: lbr1
AIC BIC logLik
56128.35 56164.47 -28058.18
Random effects:
Formula: ~alcohol | state
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 4390.1381 (Intr)
alcohol 210.0591 -0.989
Residual 2406.2980
Fixed effects: lowbirth ~ alcohol
Value Std.Error DF t-value p-value
(Intercept) 293.97498 796.7246 2989 0.3689794 0.7122
alcohol 42.85553 40.3693 2989 1.0615858 0.2885
Correlation:
(Intr)
alcohol -0.986
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.12606737 -0.21985387 -0.11572398 -0.02299664 26.75782851
Number of Observations: 3041
Number of Groups: 51
screenreg(m2_lme)
==========================
Model 1
--------------------------
(Intercept) 293.97
(796.72)
alcohol 42.86
(40.37)
--------------------------
AIC 56128.35
BIC 56164.47
Log Likelihood -28058.18
Num. obs. 3041
Num. groups 51
==========================
*** p < 0.001, ** p < 0.01, * p < 0.05
AIC(lbr3, m1_lme, m2_lme)
m3_lme <- lme(lowbirth ~ smokers + alcohol, data = lbr1, random = ~ alcohol|state, method = "ML")
summary(m3_lme)
Linear mixed-effects model fit by maximum likelihood
Data: lbr1
AIC BIC logLik
56113.34 56155.48 -28049.67
Random effects:
Formula: ~alcohol | state
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 3797.2395 (Intr)
alcohol 178.5651 -0.989
Residual 2405.1609
Fixed effects: lowbirth ~ smokers + alcohol
Value Std.Error DF t-value p-value
(Intercept) 2279.2826 857.7396 2988 2.657313 0.0079
smokers -77.7976 18.4066 2988 -4.226610 0.0000
alcohol 6.4955 37.2628 2988 0.174315 0.8616
Correlation:
(Intr) smokrs
smokers -0.546
alcohol -0.926 0.220
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.04670211 -0.23772084 -0.12566771 -0.00367731 26.78138179
Number of Observations: 3041
Number of Groups: 51
screenreg(m3_lme)
=============================
Model 1
-----------------------------
(Intercept) 2279.28 **
(857.74)
smokers -77.80 ***
(18.41)
alcohol 6.50
(37.26)
-----------------------------
AIC 56113.34
BIC 56155.48
Log Likelihood -28049.67
Num. obs. 3041
Num. groups 51
=============================
*** p < 0.001, ** p < 0.01, * p < 0.05
m4_lme <- lme(lowbirth ~ smokers*alcohol, data = lbr1, random = ~ alcohol|state, method = "ML")
summary(m4_lme)
Linear mixed-effects model fit by maximum likelihood
Data: lbr1
AIC BIC logLik
56115.31 56163.47 -28049.66
Random effects:
Formula: ~alcohol | state
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 3818.8961 (Intr)
alcohol 179.9422 -0.989
Residual 2404.9818
Fixed effects: lowbirth ~ smokers * alcohol
Value Std.Error DF t-value p-value
(Intercept) 2045.6769 1693.5496 2987 1.2079226 0.2272
smokers -64.9799 82.2736 2987 -0.7898031 0.4297
alcohol 20.3688 94.4881 2987 0.2155697 0.8293
smokers:alcohol -0.7636 4.8070 2987 -0.1588553 0.8738
Correlation:
(Intr) smokrs alcohl
smokers -0.901
alcohol -0.977 0.914
smokers:alcohol 0.861 -0.975 -0.918
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.047041515 -0.238595209 -0.125259165 -0.003600587 26.783126836
Number of Observations: 3041
Number of Groups: 51
plot(m4_lme)
screenreg(m4_lme)
===========================
Model 1
---------------------------
(Intercept) 2045.68
(1693.55)
smokers -64.98
(82.27)
alcohol 20.37
(94.49)
smokers:alcohol -0.76
(4.81)
---------------------------
AIC 56115.31
BIC 56163.47
Log Likelihood -28049.66
Num. obs. 3041
Num. groups 51
===========================
*** p < 0.001, ** p < 0.01, * p < 0.05
Strongest model that fits this data seems to be m3_lme.
AIC(lbr3, m1_lme, m2_lme, m3_lme, m4_lme)