::opts_chunk$set(comment = NA)
knitr
library(broom)
library(conflicted)
library(gt)
library(janitor)
library(mosaic)
library(sessioninfo)
library(topmodels) ## for drawing rootograms
library(mgcv) ## for fitting generalized additive models (gams)
library(gratia) ## for drawing rootograms
library(tidyverse)
theme_set(theme_light()) # set theme for ggplots
conflicts_prefer(dplyr::count, base::sum)
Rootograms for Poisson Regression Models
Using the topmodels
and gratia
packages
1 R Setup
2 Finding This
You can find this document posted at https://rpubs.com/TELOVE/poisson-rootograms-432.
The data are available through our 432-data page at https://github.com/THOMASELOVE/432-data.
3 The Data
We have a data set, called ohc_2022.Rds
that describes some information about each of the 88 counties of Ohio using the 2022 County Health Rankings report. For each county, we created (see below) a variable called goodcount
which is a count of the number (out of 4) of standards met by the county. The standards are:
- the county has a
sroh_fairpoor
value below the Ohio-wide mean of 18.1 - the county has an
obese_pct
value below the Ohio-wide average of 34.6 - the county has an
exer_access
value above the Ohio-wide average of 77.2 - the county has NOT had a water violation in the past year (as shown by
h2oviol
= No)
3.1 Creating the ohc_2022
data
The data we’re using here start with the same oh_counties_2022.csv
file that we used in Lab 6.
<-
ohc_2022 read_csv("https://raw.githubusercontent.com/THOMASELOVE/432-data/master/data/oh_counties_2022.csv",
show_col_types = FALSE) |>
clean_names() |>
mutate(fips = as.character(fips))
Here we create the new outcome variable, goodcount
for the ohc_2022
data.
<- ohc_2022 |>
ohc_2022 mutate(goodcount = 0 + (sroh_fairpoor < 18.1) +
< 34.6) + (exer_access > 77.2) +
(obese_pct == "No"))
(h2oviol |> count(goodcount) ohc_2022
# A tibble: 5 × 2
goodcount n
<dbl> <int>
1 0 16
2 1 46
3 2 17
4 3 5
5 4 4
Among the 88 counties we find 16 counties which meet 0 of these standards, 46 which meet 1, 17 which meet 2, 5 which meet 3 and 4 which meet all 4.
3.2 Selecting Predictors
We will use the following four predictors to model our goodcount
variable.
age_adj_mortality
, the age-adjusted mortality rate for the countysmoker_pct
, the percentage of adults who smoke tobacco in the countyhsgrads
, the percentage of adult county residents who graduated high school, andinactive_pct
, the percentage of adult county residents who are largely inactive.
Below, we’ll create new variables which center each of these predictors on their mean. The scale()
function can both center the observations (by subtracting the mean) and scale them to have standard deviation 1 (by dividing by the standard deviation) if we like, but here, we’ll skip the division step using the parameter scale = FALSE
within the scale()
function.
So, scale(variablename, scale = FALSE)
does the same thing as variablename - mean(variablename)
<- ohc_2022 |>
ohc_2022 mutate(mortality_c = scale(age_adj_mortality, scale = FALSE),
smoker_c = scale(smoker_pct, scale = FALSE),
hsgrad_c = scale(hsgrads, scale = FALSE),
inactive_c = scale(inactive_pct, scale = FALSE))
## sanity check to make sure this worked properly
favstats(~ age_adj_mortality, data = ohc_2022) |>
gt() |> fmt_number(mean:sd, decimals = 2)
min | Q1 | median | Q3 | max | mean | sd | n | missing |
---|---|---|---|---|---|---|---|---|
215.2 | 373.825 | 419.05 | 476.525 | 706.3 | 429.02 | 88.97 | 88 | 0 |
favstats(~ mortality_c, data = ohc_2022) |>
gt() |> fmt_number(mean:sd, decimals = 2)
min | Q1 | median | Q3 | max | mean | sd | n | missing |
---|---|---|---|---|---|---|---|---|
-213.825 | -55.2 | -9.975 | 47.5 | 277.275 | 0.00 | 88.97 | 88 | 0 |
<- ohc_2022 |>
ohc_2022 select(fips, county, goodcount, mortality_c, smoker_c, hsgrad_c, inactive_c)
write_rds(ohc_2022, "ohc_2022.Rds")
I’ve placed this ohc_2022.Rds
file on our 432 Data page.
4 Rootogram for a Poisson regression model
4.1 Fitting the Poisson model for goodcount
with glm()
<-
m1_Poisson glm(goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c,
family = poisson(), data = ohc_2022)
4.2 Summarizing the Poisson model fit with glm()
summary(m1_Poisson)
Call:
glm(formula = goodcount ~ mortality_c + smoker_c + hsgrad_c +
inactive_c, family = poisson(), data = ohc_2022)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.130237 0.103653 1.256 0.2089
mortality_c 0.002688 0.001881 1.429 0.1529
smoker_c -0.012870 0.093976 -0.137 0.8911
hsgrad_c -0.033788 0.028522 -1.185 0.2362
inactive_c -0.209768 0.084996 -2.468 0.0136 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 72.907 on 87 degrees of freedom
Residual deviance: 48.884 on 83 degrees of freedom
AIC: 223.34
Number of Fisher Scoring iterations: 5
glance(m1_Poisson) |> gt()
null.deviance | df.null | logLik | AIC | BIC | deviance | df.residual | nobs |
---|---|---|---|---|---|---|---|
72.90686 | 87 | -106.6698 | 223.3396 | 235.7262 | 48.88433 | 83 | 88 |
4.3 Rootogram for m1_Poisson
using the topmodels
package
::rootogram(m1_Poisson) topmodels
5 Alternative Rootogram for same Poisson model
5.1 Fitting the same Poisson model with gam()
from the mgcv
package
<-
m2_Poisson gam(goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c,
family = poisson(), data = ohc_2022)
5.2 Summarizing the Poisson model fit with gam()
summary(m2_Poisson)
Family: poisson
Link function: log
Formula:
goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.130237 0.103653 1.256 0.2089
mortality_c 0.002688 0.001881 1.429 0.1529
smoker_c -0.012870 0.093976 -0.137 0.8911
hsgrad_c -0.033788 0.028522 -1.185 0.2362
inactive_c -0.209768 0.084996 -2.468 0.0136 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.452 Deviance explained = 32.9%
UBRE = -0.33086 Scale est. = 1 n = 88
We can see that the same model is fit here with gam()
as we built with glm()
.
The tidy()
function applied to m2_Poisson
produces nothing of value. However, the glance()
function works well…
glance(m2_Poisson) |> gt()
df | logLik | AIC | BIC | deviance | df.residual | nobs |
---|---|---|---|---|---|---|
5 | -106.6698 | 223.3396 | 235.7262 | 48.88433 | 83 | 88 |
5.3 Rootogram for m2_Poisson
using the gratia
package
Here’s what’s contained in the rootogram for m2_Poisson
, which we obtain here with the rootogram()
function from the gratia
package:
<- gratia::rootogram(m2_Poisson)
rg2 rg2
A `rootogram` object without mandatory attributes `scale` and `style`
# A tibble: 21 × 3
.bin .observed .fitted
<dbl> <int> <dbl>
1 0 16 28.8
2 1 46 29.1
3 2 17 17.1
4 3 5 7.79
5 4 4 3.15
6 5 0 1.22
7 6 0 0.488
8 7 0 0.205
9 8 0 0.0878
10 9 0 0.0370
# ℹ 11 more rows
To plot the information from this rootogram, we use the draw()
function from the gratia
package:
draw(rg2) + labs(title = "Rootogram using gratia")
6 References
- Get started with gratia at https://gavinsimpson.github.io/gratia/articles/gratia.html
- Rootograms in gratia at https://gavinsimpson.github.io/gratia/reference/rootogram.html
- gams in the mgcv package at https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/gam.html
- Kleiber, C., Zeileis, A., (2016) Visualizing Count Data Regressions Using Rootograms. The American Statistician, 70, 296–303. doi:10.1080/00031305.2016.1173590
7 Session Information
Note that the stars below indicate the packages that I’ve attached in this work.
session_info() ## from sessioninfo package
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.3 (2024-02-29 ucrt)
os Windows 11 x64 (build 22631)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_United States.utf8
ctype English_United States.utf8
tz America/New_York
date 2024-04-01
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.1)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1)
broom * 1.0.5 2023-06-09 [1] CRAN (R 4.3.3)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1)
conflicted * 1.2.0 2023-02-01 [1] CRAN (R 4.3.1)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1)
curl 5.2.1 2024-03-01 [1] CRAN (R 4.3.3)
digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.3)
distributions3 0.2.1 2022-09-07 [1] CRAN (R 4.3.3)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1)
ggformula * 0.12.0 2023-11-09 [1] CRAN (R 4.3.2)
ggokabeito 0.1.0 2021-10-18 [1] CRAN (R 4.3.3)
ggplot2 * 3.5.0 2024-02-23 [1] CRAN (R 4.3.3)
ggridges 0.5.6 2024-01-23 [1] CRAN (R 4.3.3)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.3)
gratia * 0.9.0 2024-03-27 [1] CRAN (R 4.3.3)
gt * 0.10.1 2024-01-17 [1] CRAN (R 4.3.3)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1)
haven 2.5.4 2023-11-30 [1] CRAN (R 4.3.2)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1)
htmltools 0.5.8 2024-03-25 [1] CRAN (R 4.3.3)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.2)
janitor * 2.2.0 2023-02-02 [1] CRAN (R 4.3.3)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2)
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.3)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1)
labelled 2.12.0 2023-06-21 [1] CRAN (R 4.3.1)
lattice * 0.22-6 2024-03-20 [1] CRAN (R 4.3.3)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.3)
MASS 7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.3)
Matrix * 1.6-5 2024-01-11 [1] CRAN (R 4.3.2)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)
mgcv * 1.9-1 2023-12-21 [1] CRAN (R 4.3.3)
mosaic * 1.9.1 2024-02-23 [1] CRAN (R 4.3.3)
mosaicCore 0.9.4.0 2023-11-05 [1] CRAN (R 4.3.2)
mosaicData * 0.20.4 2023-11-05 [1] CRAN (R 4.3.2)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.3.2)
mvnfast 0.2.8 2023-02-23 [1] CRAN (R 4.3.3)
nlme * 3.1-164 2023-11-27 [2] CRAN (R 4.3.3)
patchwork 1.2.0 2024-01-08 [1] CRAN (R 4.3.3)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.2)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2)
rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.3.3)
rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.3)
sass 0.4.9 2024-03-15 [1] CRAN (R 4.3.3)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
snakecase 0.11.1 2023-08-27 [1] CRAN (R 4.3.1)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.2)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.3)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.3)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.3)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.3)
topmodels * 0.3-0 2024-03-15 [1] R-Forge (R 4.3.3)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.2)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.3)
xfun 0.43 2024-03-25 [1] CRAN (R 4.3.3)
xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.2)
yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2)
[1] C:/Users/thoma/AppData/Local/R/win-library/4.3
[2] C:/Program Files/R/R-4.3.3/library
──────────────────────────────────────────────────────────────────────────────