HW6

I will be looking at suicide rates per 100,000 by counties. I will be looking at the percent of people in poverty within a county. I will be using an offset term in my analysis because the populations within different counties are not the same. I will be using the log of the population size for the county as the offset term.

Do counties with higher percentages of people in poverty have higher suicide rates?

options(tigris_class="sf")
library(tigris)

## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.

library(sf)

## Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE

usco<-counties(cb = T, year= 2019)
usco$cofips<-usco$GEOID
sts<-states(cb = T, year = 2019)
sts<-st_boundary(sts)%>%
  filter(!STATEFP %in% c("02", "15", "60", "66", "69", "72", "78"))%>%
  st_transform(crs = 2163)
ahrf_m<-left_join(usco, ahrf2,
                    by = "cofips")%>%
  filter(is.na(suiciderate)==F, 
         !STATEFP %in% c("02", "15", "60", "66", "69", "72", "78"))%>%
  st_transform(crs = 2163)
glimpse(ahrf_m)


ahrf_m$povgroup = cut(ahrf_m$perpov, c(0, 10, 20, 30, 40), labels = c("1", "2", "3", "4"))

ahrf_m%>%
  ggplot()+
  geom_histogram(aes(x = suiciderate))+
  labs(title = "Distribution of Suicide Rate US Counties",
       subtitle = "2016 - 2018")+
       xlab("Rate per 100000 people")+
  ylab ("Frequency")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

library(tmap)
tm_shape(ahrf_m)+
  tm_polygons(col = "suiciderate",
              border.col = NULL,
              title="Suicide Rt",
              palette="Blues",
              style="quantile",
              n=5,
              showNA=T, colorNA = "grey50")+
   tm_format(format= "World",
             main.title="US Suicide Rate by County",
            legend.position =  c("left", "bottom"),
            main.title.position =c("center"))+
  tm_scale_bar(position = c(.1,0))+
  tm_compass()+
tm_shape(sts)+
  tm_lines( col = "black")

Characteristic	IRR¹	95% CI¹	p-value
factor(povgroup)
1	—	—
2	1.28	1.24, 1.33	<0.001
3	1.29	1.21, 1.38	<0.001
4	7.32	5.86, 9.01	<0.001
¹ IRR = Incidence Rate Ratio, CI = Confidence Interval

## [1] 5.525691

There is a fair amount of dispersion for the negative binomial. So, the poisson regression model is not a good choice.

Characteristic	IRR¹	95% CI¹	p-value
factor(povgroup)
1	—	—
2	1.74	1.50, 2.02	<0.001
3	1.70	1.30, 2.26	<0.001
4	3.13	1.19, 12.3	0.048
¹ IRR = Incidence Rate Ratio, CI = Confidence Interval

## [1] 1.068082

The dispersion for the negative binomial is lower than for the poisson model.

AIC(glmnb, glmp_s)

##        df      AIC
## glmnb   5 8236.869
## glmp_s  4      Inf

Therefore the negative binomial is a better model fit.

HW6

Drew Schaefer

3/20/2022