I will be looking at suicide rates per 100,000 by counties. I will be looking at the percent of people in poverty within a county. I will be using an offset term in my analysis because the populations within different counties are not the same. I will be using the log of the population size for the county as the offset term.
Do counties with higher percentages of people in poverty have higher suicide rates?
options(tigris_class="sf")
library(tigris)
## To enable
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
usco<-counties(cb = T, year= 2019)
usco$cofips<-usco$GEOID
sts<-states(cb = T, year = 2019)
sts<-st_boundary(sts)%>%
filter(!STATEFP %in% c("02", "15", "60", "66", "69", "72", "78"))%>%
st_transform(crs = 2163)
ahrf_m<-left_join(usco, ahrf2,
by = "cofips")%>%
filter(is.na(suiciderate)==F,
!STATEFP %in% c("02", "15", "60", "66", "69", "72", "78"))%>%
st_transform(crs = 2163)
glimpse(ahrf_m)
ahrf_m$povgroup = cut(ahrf_m$perpov, c(0, 10, 20, 30, 40), labels = c("1", "2", "3", "4"))
ahrf_m%>%
ggplot()+
geom_histogram(aes(x = suiciderate))+
labs(title = "Distribution of Suicide Rate US Counties",
subtitle = "2016 - 2018")+
xlab("Rate per 100000 people")+
ylab ("Frequency")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
library(tmap)
tm_shape(ahrf_m)+
tm_polygons(col = "suiciderate",
border.col = NULL,
title="Suicide Rt",
palette="Blues",
style="quantile",
n=5,
showNA=T, colorNA = "grey50")+
tm_format(format= "World",
main.title="US Suicide Rate by County",
legend.position = c("left", "bottom"),
main.title.position =c("center"))+
tm_scale_bar(position = c(.1,0))+
tm_compass()+
tm_shape(sts)+
tm_lines( col = "black")
| Characteristic | IRR1 | 95% CI1 | p-value |
|---|---|---|---|
| factor(povgroup) | |||
| 1 | — | — | |
| 2 | 1.28 | 1.24, 1.33 | <0.001 |
| 3 | 1.29 | 1.21, 1.38 | <0.001 |
| 4 | 7.32 | 5.86, 9.01 | <0.001 |
|
1
IRR = Incidence Rate Ratio, CI = Confidence Interval
|
|||
## [1] 5.525691
There is a fair amount of dispersion for the negative binomial. So, the poisson regression model is not a good choice.
| Characteristic | IRR1 | 95% CI1 | p-value |
|---|---|---|---|
| factor(povgroup) | |||
| 1 | — | — | |
| 2 | 1.74 | 1.50, 2.02 | <0.001 |
| 3 | 1.70 | 1.30, 2.26 | <0.001 |
| 4 | 3.13 | 1.19, 12.3 | 0.048 |
|
1
IRR = Incidence Rate Ratio, CI = Confidence Interval
|
|||
## [1] 1.068082
The dispersion for the negative binomial is lower than for the poisson model.
AIC(glmnb, glmp_s)
## df AIC
## glmnb 5 8236.869
## glmp_s 4 Inf
Therefore the negative binomial is a better model fit.