I will be looking at suicide rates per 100,000 by counties. I will be looking at the percent of people in poverty within a county. I will be using an offset term in my analysis because the populations within different counties are not the same. I will be using the log of the population size for the county as the offset term.

Do counties with higher percentages of people in poverty have higher suicide rates?

options(tigris_class="sf")
library(tigris)
## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
usco<-counties(cb = T, year= 2019)
usco$cofips<-usco$GEOID
sts<-states(cb = T, year = 2019)
sts<-st_boundary(sts)%>%
  filter(!STATEFP %in% c("02", "15", "60", "66", "69", "72", "78"))%>%
  st_transform(crs = 2163)
ahrf_m<-left_join(usco, ahrf2,
                    by = "cofips")%>%
  filter(is.na(suiciderate)==F, 
         !STATEFP %in% c("02", "15", "60", "66", "69", "72", "78"))%>%
  st_transform(crs = 2163)
glimpse(ahrf_m)


ahrf_m$povgroup = cut(ahrf_m$perpov, c(0, 10, 20, 30, 40), labels = c("1", "2", "3", "4"))
ahrf_m%>%
  ggplot()+
  geom_histogram(aes(x = suiciderate))+
  labs(title = "Distribution of Suicide Rate US Counties",
       subtitle = "2016 - 2018")+
       xlab("Rate per 100000 people")+
  ylab ("Frequency")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

library(tmap)
tm_shape(ahrf_m)+
  tm_polygons(col = "suiciderate",
              border.col = NULL,
              title="Suicide Rt",
              palette="Blues",
              style="quantile",
              n=5,
              showNA=T, colorNA = "grey50")+
   tm_format(format= "World",
             main.title="US Suicide Rate by County",
            legend.position =  c("left", "bottom"),
            main.title.position =c("center"))+
  tm_scale_bar(position = c(.1,0))+
  tm_compass()+
tm_shape(sts)+
  tm_lines( col = "black")

Characteristic IRR1 95% CI1 p-value
factor(povgroup)
1
2 1.28 1.24, 1.33 <0.001
3 1.29 1.21, 1.38 <0.001
4 7.32 5.86, 9.01 <0.001

1 IRR = Incidence Rate Ratio, CI = Confidence Interval

## [1] 5.525691

There is a fair amount of dispersion for the negative binomial. So, the poisson regression model is not a good choice.

Characteristic IRR1 95% CI1 p-value
factor(povgroup)
1
2 1.74 1.50, 2.02 <0.001
3 1.70 1.30, 2.26 <0.001
4 3.13 1.19, 12.3 0.048

1 IRR = Incidence Rate Ratio, CI = Confidence Interval

## [1] 1.068082

The dispersion for the negative binomial is lower than for the poisson model.

AIC(glmnb, glmp_s)
##        df      AIC
## glmnb   5 8236.869
## glmp_s  4      Inf

Therefore the negative binomial is a better model fit.