What are the facorts associated with preterm birth in Texas counties?
Preterm birth is always higher compared to national preterm birth rate.On top of that preterm birth rate in Texas has been increase and now it 10.8%. 0.8% higher than total United States preterm birth. Literature says, absence of prenatal care has a impact over preterm birth as well as the race/ethnic backgroud. As most of the preterm birth occures among Black birhs compared to Hispanics and whites. However, for this study I have analysed whether there is a association between preterm birth and counties with shortage of hospitals because hospitals give the prenatal care.
I found in possion and in alternative models that counties which have shortage of hospitals are more likely to have higher preterm birth rates.
#Load Data
In the ARF datatset, preterm birth is a number of preterm deliveries which are before 37 weeks of pregnancy.
What are the facorts associated with preterm birth in Texas counties?
All the county population or number of birth are not same. The number of birth of each county is necessary because we need to find the percentage of preterm birth in each county. Hence we need use off-set term.
library(dplyr)
files2 <- files%>%
mutate(cofips=f00004,
coname=f00010,
state = f00011,
births1416=f1254614,
births0608=f1254608,
lowbw1416=f1255314,
lowbw0608=f1255308,
preterm = f1360814,
childpov10= f1332210,
rucc= as.factor(f0002013),
hpsa10= as.factor(f0978710),
obgyn10_pc= 1000*(f1168410/ f0453010) )%>%
dplyr::select(births1416, lowbw1416,births0608, lowbw0608, preterm, state, cofips, coname, childpov10, rucc, hpsa10, obgyn10_pc)%>%
filter(complete.cases(.))%>%
as.data.frame()
head(files2)
## births1416 lowbw1416 births0608 lowbw0608 preterm state cofips coname
## 1 657 62 701 65 92 01 01001 Autauga
## 2 2278 198 2194 189 291 01 01003 Baldwin
## 3 276 33 321 41 50 01 01005 Barbour
## 4 271 28 247 31 41 01 01007 Bibb
## 5 686 55 705 52 81 01 01009 Blount
## 6 136 22 173 24 28 01 01011 Bullock
## childpov10 rucc hpsa10 obgyn10_pc
## 1 17.5 02 1 0.00000000
## 2 20.2 03 1 0.11521685
## 3 36.2 06 2 0.00000000
## 4 28.6 01 1 0.00000000
## 5 24.1 01 1 0.00000000
## 6 40.3 06 1 0.09162544
summary(files2)
## births1416 lowbw1416 births0608 lowbw0608
## Min. : 67 Min. : 11.00 Min. : 84 Min. : 11.0
## 1st Qu.: 268 1st Qu.: 22.00 1st Qu.: 287 1st Qu.: 23.0
## Median : 499 Median : 39.00 Median : 521 Median : 43.0
## Mean : 1742 Mean : 141.01 Mean : 1810 Mean : 147.9
## 3rd Qu.: 1254 3rd Qu.: 99.75 3rd Qu.: 1300 3rd Qu.: 103.0
## Max. :126007 Max. :8969.00 Max. :140251 Max. :10211.0
##
## preterm state cofips coname
## Min. : 11.0 Length:2238 Length:2238 Length:2238
## 1st Qu.: 32.0 Class :character Class :character Class :character
## Median : 58.0 Mode :character Mode :character Mode :character
## Mean : 197.6
## 3rd Qu.: 142.8
## Max. :11494.0
##
## childpov10 rucc hpsa10 obgyn10_pc
## Min. : 2.70 06 :483 : 0 Min. :0.00000
## 1st Qu.:17.70 01 :405 0:433 1st Qu.:0.00000
## Median :24.10 02 :355 1:834 Median :0.05731
## Mean :24.42 03 :301 2:971 Mean :0.06926
## 3rd Qu.:30.10 07 :275 3rd Qu.:0.10382
## Max. :61.10 04 :214 Max. :0.82115
## (Other):205
library(tigris)
## To enable
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
##
## Attaching package: 'tigris'
## The following object is masked from 'package:graphics':
##
## plot
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(ggplot2)
options(tigris_class="sf")
usco<-counties(cb=T, year=2015, state = "TX")
usco$cofips<-usco$GEOID
sts<-states(cb = T, year=2015)
sts<-st_boundary(sts)%>%
filter(STATEFP==48)
files2_m<-geo_join(usco, files2, by_sp="cofips", by_df="cofips",how="left" )
## Warning: Column `cofips` has different attributes on LHS and RHS of join
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
files2_m%>%
filter(STATEFP==48)%>%
mutate(ppb=preterm/births1416)%>%
mutate(pb_group = cut(ppb, breaks=quantile(ppb, p=seq(0,1,length.out = 6), na.rm=T ), include.lowest=T ))%>%
ggplot()+
geom_sf(aes(fill=pb_group, color=NA))+
scale_color_brewer(palette = "Blues")+
scale_fill_brewer(palette = "Blues",na.value = "grey50")+
geom_sf(data=sts["STATEFP"=="48",], color="black")+
coord_sf(crs = 102008)+
ggtitle(label = "Proportion of births that were preterm, 2014-2016")
We can see from the map is, end of the southern part of the Texas is under high proportion of preterm birth compared to the total number of birth of 2014-2016.The Coastal parts are mostly has highest preterm birth. In addition we can also see that rural parts in the north west of the Texas counties like Pecos and Reeves are in between the highest rate of preterm births since 2014-2016. Important part to notice here is, majority of the border counties are slightly lower than the higher range.
files3<-filter(files2_m, is.na(preterm)==F)
fit_pois<- glm(preterm ~ offset(log(births1416)) + hpsa10,
family=poisson,
data=files3)
summary(fit_pois)
##
## Call:
## glm(formula = preterm ~ offset(log(births1416)) + hpsa10, family = poisson,
## data = files3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.2377 -0.5404 0.1685 0.9559 10.4849
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.15211 0.01514 -142.154 < 2e-16 ***
## hpsa101 0.05352 0.01616 3.312 0.000928 ***
## hpsa102 0.05903 0.01756 3.361 0.000775 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 607.13 on 160 degrees of freedom
## Residual deviance: 594.63 on 158 degrees of freedom
## AIC: 1609.3
##
## Number of Fisher Scoring iterations: 4
To get the risk ratio we will exponentiate the coefficients.
exp(coef(fit_pois))
## (Intercept) hpsa101 hpsa102
## 0.1162382 1.0549800 1.0608022
Counties which has shortage of hospitals, have a mean preterm birth is 5.4% which is higher than no shortage counties.Whereas counties which have partial shortage of hospitals have mean preterm birth rate of 6% which is higher than no shortage area.
files3$est_pois<- fitted(fit_pois)
files3$est_rate<- files3$est_pois/ files3$births1416
aggregate(est_rate ~ hpsa10, data=files3, FUN = mean)
## hpsa10 est_rate
## 1 0 0.1162382
## 2 1 0.1226290
## 3 2 0.1233057
Difference between the mean of preterm birth rate of no shortage area and whole shortage area is 5.4%.
Difference between the mean of preterm birth rate of no shortage area and partially shortage area is 6%.
Doing arithmatic to see the same result.
(0.1226290-0.1162382)/0.1162382
## [1] 0.0549802
(0.1233057-0.1162382)/0.1162382
## [1] 0.06080187
scale<-sqrt(fit_pois$deviance/fit_pois$df.residual)
scale
## [1] 1.939966
If there were no difference between mean and variance then it was to be 1. This tell that there is overdispersion in this model. We can also check it by godness of fit. This will also tell us that by larger that this model is fit for the model. However, if it yields lower p-value, it is not appropriate for this model.
1-pchisq(fit_pois$deviance, df = fit_pois$df.residual)
## [1] 0
The value is lower than the significant level. Therefore we can conclude that poisson is not a appropriate model for this fit.
Let us see whether quasi-poisson model yields p-value more than 1 or not. I am using quasi-poisson model because it includes overdisperson parameter.
fit_q_pois<- glm(preterm ~ offset(log(births1416)) + hpsa10,
family=quasipoisson,
data=files3)
summary(fit_q_pois)
##
## Call:
## glm(formula = preterm ~ offset(log(births1416)) + hpsa10, family = quasipoisson,
## data = files3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.2377 -0.5404 0.1685 0.9559 10.4849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.15211 0.02950 -72.960 <2e-16 ***
## hpsa101 0.05352 0.03149 1.700 0.0912 .
## hpsa102 0.05903 0.03421 1.725 0.0864 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 3.796133)
##
## Null deviance: 607.13 on 160 degrees of freedom
## Residual deviance: 594.63 on 158 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
Here we see disperson parameter for quasipoissan is 3.7 and substantive difference is similar. I would use this model.
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
fit_nb<- glm.nb(preterm ~ offset(log(births1416)) + hpsa10,
data=files3)
summary(fit_nb)
##
## Call:
## glm.nb(formula = preterm ~ offset(log(births1416)) + hpsa10,
## data = files3, init.theta = 119.2885051, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1721 -0.4639 0.0305 0.6846 2.4871
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.13528 0.02620 -81.491 <2e-16 ***
## hpsa101 0.06136 0.03155 1.945 0.0518 .
## hpsa102 0.03740 0.03165 1.181 0.2374
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(119.2885) family taken to be 1)
##
## Null deviance: 127.26 on 160 degrees of freedom
## Residual deviance: 123.44 on 158 degrees of freedom
## AIC: 1254.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 119.3
## Std. Err.: 23.5
##
## 2 x log-likelihood: -1246.861
Risk fators are different compared to poisson model.
exp(coef(fit_nb))
## (Intercept) hpsa101 hpsa102
## 0.1182118 1.0632823 1.0381035
6.3% increased risk of preterm birth for counties with shortage of hospitals.
3.8% increased risk of preterm birth for countires with partial shortage of hospitals.
If we look at the AIC value, compared to poisson model, negetive binomial model has much lower AIC value.
pbrate<-(sum(files3$preterm, na.rm=T)/sum(files3$births1416, na.rm=T))
pbrate
## [1] 0.122197
Which estimates that on average, 12% of births coulb be preterm. We can apply this rate to each county’s births.
files3$E <- pbrate* files3$births1416
head(files3[, c("coname", "births1416", "preterm", "E")])
## Simple feature collection with 6 features and 4 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -98.64623 ymin: 27.83954 xmax: -94.04272 ymax: 33.31201
## epsg (SRID): 4269
## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## coname births1416 preterm E geometry
## 1 Aransas 263 32 32.13782 MULTIPOLYGON (((-96.8229 28...
## 2 Bee 397 48 48.51222 MULTIPOLYGON (((-98.08976 2...
## 3 Cass 360 52 43.99093 MULTIPOLYGON (((-94.65384 3...
## 4 Comal 1510 195 184.51752 MULTIPOLYGON (((-98.64612 2...
## 5 Ellis 2171 237 265.28976 MULTIPOLYGON (((-97.08743 3...
## 6 Galveston 4174 574 510.05042 MULTIPOLYGON (((-94.52566 2...
This shows us the observed number of preterm births, and the expected number.
files3%>%
ggplot( aes(preterm))+geom_histogram()+ggtitle("Distribution of preterm births", "y")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
files3%>%
ggplot( aes(preterm/E))+geom_histogram()+ggtitle("Distribution of standardized preterm births", "y/E")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In order to estimate the model with the expected counts, changed the offset term in the model, otherwise everything is the same:
fit_pois_E<-glm(preterm ~ offset(log(E+.000001)) + hpsa10,
family=poisson,
data=files3)
summary(fit_pois_E)
##
## Call:
## glm(formula = preterm ~ offset(log(E + 1e-06)) + hpsa10, family = poisson,
## data = files3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.2377 -0.5404 0.1685 0.9559 10.4849
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.04999 0.01514 -3.302 0.000959 ***
## hpsa101 0.05352 0.01616 3.312 0.000928 ***
## hpsa102 0.05903 0.01756 3.361 0.000775 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 607.13 on 160 degrees of freedom
## Residual deviance: 594.63 on 158 degrees of freedom
## AIC: 1609.3
##
## Number of Fisher Scoring iterations: 4
In fact, these results are identical to those from the Poisson model with the births as the offset term.
fit_bin<-glm(cbind(preterm, births1416)~ hpsa10,
family=binomial,
data=files3)
summary(fit_bin)
##
## Call:
## glm(formula = cbind(preterm, births1416) ~ hpsa10, family = binomial,
## data = files3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.8548 -0.5108 0.1594 0.9005 9.8501
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.15211 0.01600 -134.549 < 2e-16 ***
## hpsa101 0.05352 0.01708 3.133 0.00173 **
## hpsa102 0.05903 0.01857 3.179 0.00148 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 540.39 on 160 degrees of freedom
## Residual deviance: 529.23 on 158 degrees of freedom
## AIC: 1525.1
##
## Number of Fisher Scoring iterations: 4
In this model, we see the exact same results compared to the Poisson model. The only difference is the AIC for the binomial model is substantially lower, suggesting that that distribution is preferred for this analysis.