Rootograms for Poisson Regression Models

Using the topmodels and gratia packages

Author

Thomas E. Love

Published

2024-04-01

1 R Setup

knitr::opts_chunk$set(comment = NA)

library(broom)
library(conflicted)
library(gt)
library(janitor)
library(mosaic)
library(sessioninfo)

library(topmodels) ## for drawing rootograms

library(mgcv) ## for fitting generalized additive models (gams)
library(gratia) ## for drawing rootograms

library(tidyverse)

theme_set(theme_light()) # set theme for ggplots

conflicts_prefer(dplyr::count, base::sum)

2 Finding This

You can find this document posted at https://rpubs.com/TELOVE/poisson-rootograms-432.

The data are available through our 432-data page at https://github.com/THOMASELOVE/432-data.

3 The Data

We have a data set, called ohc_2022.Rds that describes some information about each of the 88 counties of Ohio using the 2022 County Health Rankings report. For each county, we created (see below) a variable called goodcount which is a count of the number (out of 4) of standards met by the county. The standards are:

the county has a sroh_fairpoor value below the Ohio-wide mean of 18.1
the county has an obese_pct value below the Ohio-wide average of 34.6
the county has an exer_access value above the Ohio-wide average of 77.2
the county has NOT had a water violation in the past year (as shown by h2oviol = No)

3.1 Creating the `ohc_2022` data

The data we’re using here start with the same oh_counties_2022.csv file that we used in Lab 6.

ohc_2022 <- 
  read_csv("https://raw.githubusercontent.com/THOMASELOVE/432-data/master/data/oh_counties_2022.csv", 
                 show_col_types = FALSE) |>
  clean_names() |>
  mutate(fips = as.character(fips))

Here we create the new outcome variable, goodcount for the ohc_2022 data.

ohc_2022 <- ohc_2022 |>
    mutate(goodcount = 0 + (sroh_fairpoor < 18.1) + 
               (obese_pct < 34.6) + (exer_access > 77.2) + 
               (h2oviol == "No"))
ohc_2022 |> count(goodcount)

# A tibble: 5 × 2
  goodcount     n
      <dbl> <int>
1         0    16
2         1    46
3         2    17
4         3     5
5         4     4

Among the 88 counties we find 16 counties which meet 0 of these standards, 46 which meet 1, 17 which meet 2, 5 which meet 3 and 4 which meet all 4.

3.2 Selecting Predictors

We will use the following four predictors to model our goodcount variable.

age_adj_mortality, the age-adjusted mortality rate for the county
smoker_pct, the percentage of adults who smoke tobacco in the county
hsgrads, the percentage of adult county residents who graduated high school, and
inactive_pct, the percentage of adult county residents who are largely inactive.

Below, we’ll create new variables which center each of these predictors on their mean. The scale() function can both center the observations (by subtracting the mean) and scale them to have standard deviation 1 (by dividing by the standard deviation) if we like, but here, we’ll skip the division step using the parameter scale = FALSE within the scale() function.

So, scale(variablename, scale = FALSE) does the same thing as variablename - mean(variablename)

ohc_2022 <- ohc_2022 |>
  mutate(mortality_c = scale(age_adj_mortality, scale = FALSE),
         smoker_c = scale(smoker_pct, scale = FALSE),
         hsgrad_c = scale(hsgrads, scale = FALSE),
         inactive_c = scale(inactive_pct, scale = FALSE))

## sanity check to make sure this worked properly

favstats(~ age_adj_mortality, data = ohc_2022) |> 
  gt() |> fmt_number(mean:sd, decimals = 2)

min	Q1	median	Q3	max	mean	sd	n	missing
215.2	373.825	419.05	476.525	706.3	429.02	88.97	88	0

favstats(~ mortality_c, data = ohc_2022) |> 
  gt() |> fmt_number(mean:sd, decimals = 2)

min	Q1	median	Q3	max	mean	sd	n	missing
-213.825	-55.2	-9.975	47.5	277.275	0.00	88.97	88	0

ohc_2022 <- ohc_2022 |> 
  select(fips, county, goodcount, mortality_c, smoker_c, hsgrad_c, inactive_c)

write_rds(ohc_2022, "ohc_2022.Rds")

I’ve placed this ohc_2022.Rds file on our 432 Data page.

4 Rootogram for a Poisson regression model

4.1 Fitting the Poisson model for `goodcount` with `glm()`

m1_Poisson <- 
  glm(goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c,
      family = poisson(), data = ohc_2022)

4.2 Summarizing the Poisson model fit with `glm()`

summary(m1_Poisson)


Call:
glm(formula = goodcount ~ mortality_c + smoker_c + hsgrad_c + 
    inactive_c, family = poisson(), data = ohc_2022)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.130237   0.103653   1.256   0.2089  
mortality_c  0.002688   0.001881   1.429   0.1529  
smoker_c    -0.012870   0.093976  -0.137   0.8911  
hsgrad_c    -0.033788   0.028522  -1.185   0.2362  
inactive_c  -0.209768   0.084996  -2.468   0.0136 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 72.907  on 87  degrees of freedom
Residual deviance: 48.884  on 83  degrees of freedom
AIC: 223.34

Number of Fisher Scoring iterations: 5

glance(m1_Poisson) |> gt()

null.deviance	df.null	logLik	AIC	BIC	deviance	df.residual	nobs
72.90686	87	-106.6698	223.3396	235.7262	48.88433	83	88

4.3 Rootogram for `m1_Poisson` using the `topmodels` package

topmodels::rootogram(m1_Poisson)

5 Alternative Rootogram for same Poisson model

5.1 Fitting the same Poisson model with `gam()` from the `mgcv` package

m2_Poisson <- 
  gam(goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c,
      family = poisson(), data = ohc_2022)

5.2 Summarizing the Poisson model fit with `gam()`

summary(m2_Poisson)


Family: poisson 
Link function: log 

Formula:
goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c

Parametric coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.130237   0.103653   1.256   0.2089  
mortality_c  0.002688   0.001881   1.429   0.1529  
smoker_c    -0.012870   0.093976  -0.137   0.8911  
hsgrad_c    -0.033788   0.028522  -1.185   0.2362  
inactive_c  -0.209768   0.084996  -2.468   0.0136 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


R-sq.(adj) =  0.452   Deviance explained = 32.9%
UBRE = -0.33086  Scale est. = 1         n = 88

We can see that the same model is fit here with gam() as we built with glm().

The tidy() function applied to m2_Poisson produces nothing of value. However, the glance() function works well…

glance(m2_Poisson) |> gt()

df	logLik	AIC	BIC	deviance	df.residual	nobs
5	-106.6698	223.3396	235.7262	48.88433	83	88

5.3 Rootogram for `m2_Poisson` using the `gratia` package

Here’s what’s contained in the rootogram for m2_Poisson, which we obtain here with the rootogram() function from the gratia package:

rg2 <- gratia::rootogram(m2_Poisson)
rg2

A `rootogram` object without mandatory attributes `scale` and `style`

# A tibble: 21 × 3
    .bin .observed .fitted
   <dbl>     <int>   <dbl>
 1     0        16 28.8   
 2     1        46 29.1   
 3     2        17 17.1   
 4     3         5  7.79  
 5     4         4  3.15  
 6     5         0  1.22  
 7     6         0  0.488 
 8     7         0  0.205 
 9     8         0  0.0878
10     9         0  0.0370
# ℹ 11 more rows

To plot the information from this rootogram, we use the draw() function from the gratia package:

draw(rg2) + labs(title = "Rootogram using gratia")

6 References

Get started with gratia at https://gavinsimpson.github.io/gratia/articles/gratia.html
Rootograms in gratia at https://gavinsimpson.github.io/gratia/reference/rootogram.html
gams in the mgcv package at https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/gam.html
Kleiber, C., Zeileis, A., (2016) Visualizing Count Data Regressions Using Rootograms. The American Statistician, 70, 296–303. doi:10.1080/00031305.2016.1173590

7 Session Information

Note that the stars below indicate the packages that I’ve attached in this work.

session_info()  ## from sessioninfo package

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.3 (2024-02-29 ucrt)
 os       Windows 11 x64 (build 22631)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/New_York
 date     2024-04-01
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package        * version    date (UTC) lib source
 backports        1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
 bit              4.0.5      2022-11-15 [1] CRAN (R 4.3.1)
 bit64            4.0.5      2020-08-30 [1] CRAN (R 4.3.1)
 broom          * 1.0.5      2023-06-09 [1] CRAN (R 4.3.3)
 cachem           1.0.8      2023-05-01 [1] CRAN (R 4.3.1)
 cli              3.6.2      2023-12-11 [1] CRAN (R 4.3.2)
 colorspace       2.1-0      2023-01-23 [1] CRAN (R 4.3.1)
 conflicted     * 1.2.0      2023-02-01 [1] CRAN (R 4.3.1)
 crayon           1.5.2      2022-09-29 [1] CRAN (R 4.3.1)
 curl             5.2.1      2024-03-01 [1] CRAN (R 4.3.3)
 digest           0.6.35     2024-03-11 [1] CRAN (R 4.3.3)
 distributions3   0.2.1      2022-09-07 [1] CRAN (R 4.3.3)
 dplyr          * 1.1.4      2023-11-17 [1] CRAN (R 4.3.2)
 ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.3.1)
 evaluate         0.23       2023-11-01 [1] CRAN (R 4.3.2)
 fansi            1.0.6      2023-12-08 [1] CRAN (R 4.3.2)
 farver           2.1.1      2022-07-06 [1] CRAN (R 4.3.1)
 fastmap          1.1.1      2023-02-24 [1] CRAN (R 4.3.1)
 forcats        * 1.0.0      2023-01-29 [1] CRAN (R 4.3.1)
 generics         0.1.3      2022-07-05 [1] CRAN (R 4.3.1)
 ggformula      * 0.12.0     2023-11-09 [1] CRAN (R 4.3.2)
 ggokabeito       0.1.0      2021-10-18 [1] CRAN (R 4.3.3)
 ggplot2        * 3.5.0      2024-02-23 [1] CRAN (R 4.3.3)
 ggridges         0.5.6      2024-01-23 [1] CRAN (R 4.3.3)
 glue             1.7.0      2024-01-09 [1] CRAN (R 4.3.3)
 gratia         * 0.9.0      2024-03-27 [1] CRAN (R 4.3.3)
 gt             * 0.10.1     2024-01-17 [1] CRAN (R 4.3.3)
 gtable           0.3.4      2023-08-21 [1] CRAN (R 4.3.1)
 haven            2.5.4      2023-11-30 [1] CRAN (R 4.3.2)
 hms              1.1.3      2023-03-21 [1] CRAN (R 4.3.1)
 htmltools        0.5.8      2024-03-25 [1] CRAN (R 4.3.3)
 htmlwidgets      1.6.4      2023-12-06 [1] CRAN (R 4.3.2)
 janitor        * 2.2.0      2023-02-02 [1] CRAN (R 4.3.3)
 jsonlite         1.8.8      2023-12-04 [1] CRAN (R 4.3.2)
 knitr            1.45       2023-10-30 [1] CRAN (R 4.3.3)
 labeling         0.4.3      2023-08-29 [1] CRAN (R 4.3.1)
 labelled         2.12.0     2023-06-21 [1] CRAN (R 4.3.1)
 lattice        * 0.22-6     2024-03-20 [1] CRAN (R 4.3.3)
 lifecycle        1.0.4      2023-11-07 [1] CRAN (R 4.3.2)
 lubridate      * 1.9.3      2023-09-27 [1] CRAN (R 4.3.1)
 magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.3.3)
 MASS             7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.3)
 Matrix         * 1.6-5      2024-01-11 [1] CRAN (R 4.3.2)
 memoise          2.0.1      2021-11-26 [1] CRAN (R 4.3.1)
 mgcv           * 1.9-1      2023-12-21 [1] CRAN (R 4.3.3)
 mosaic         * 1.9.1      2024-02-23 [1] CRAN (R 4.3.3)
 mosaicCore       0.9.4.0    2023-11-05 [1] CRAN (R 4.3.2)
 mosaicData     * 0.20.4     2023-11-05 [1] CRAN (R 4.3.2)
 munsell          0.5.1      2024-04-01 [1] CRAN (R 4.3.2)
 mvnfast          0.2.8      2023-02-23 [1] CRAN (R 4.3.3)
 nlme           * 3.1-164    2023-11-27 [2] CRAN (R 4.3.3)
 patchwork        1.2.0      2024-01-08 [1] CRAN (R 4.3.3)
 pillar           1.9.0      2023-03-22 [1] CRAN (R 4.3.1)
 pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.3.1)
 purrr          * 1.0.2      2023-08-10 [1] CRAN (R 4.3.1)
 R6               2.5.1      2021-08-19 [1] CRAN (R 4.3.1)
 Rcpp             1.0.12     2024-01-09 [1] CRAN (R 4.3.2)
 readr          * 2.1.5      2024-01-10 [1] CRAN (R 4.3.2)
 rlang            1.1.3      2024-01-10 [1] CRAN (R 4.3.2)
 rmarkdown        2.26       2024-03-05 [1] CRAN (R 4.3.3)
 rstudioapi       0.16.0     2024-03-24 [1] CRAN (R 4.3.3)
 sass             0.4.9      2024-03-15 [1] CRAN (R 4.3.3)
 scales           1.3.0      2023-11-28 [1] CRAN (R 4.3.2)
 sessioninfo    * 1.2.2      2021-12-06 [1] CRAN (R 4.3.2)
 snakecase        0.11.1     2023-08-27 [1] CRAN (R 4.3.1)
 stringi          1.8.3      2023-12-11 [1] CRAN (R 4.3.2)
 stringr        * 1.5.1      2023-11-14 [1] CRAN (R 4.3.2)
 tibble         * 3.2.1      2023-03-20 [1] CRAN (R 4.3.1)
 tidyr          * 1.3.1      2024-01-24 [1] CRAN (R 4.3.3)
 tidyselect       1.2.1      2024-03-11 [1] CRAN (R 4.3.3)
 tidyverse      * 2.0.0      2023-02-22 [1] CRAN (R 4.3.3)
 timechange       0.3.0      2024-01-18 [1] CRAN (R 4.3.3)
 topmodels      * 0.3-0      2024-03-15 [1] R-Forge (R 4.3.3)
 tzdb             0.4.0      2023-05-12 [1] CRAN (R 4.3.1)
 utf8             1.2.4      2023-10-22 [1] CRAN (R 4.3.2)
 vctrs            0.6.5      2023-12-01 [1] CRAN (R 4.3.2)
 vroom            1.6.5      2023-12-05 [1] CRAN (R 4.3.2)
 withr            3.0.0      2024-01-16 [1] CRAN (R 4.3.3)
 xfun             0.43       2024-03-25 [1] CRAN (R 4.3.3)
 xml2             1.3.6      2023-12-04 [1] CRAN (R 4.3.2)
 yaml             2.3.8      2023-12-11 [1] CRAN (R 4.3.2)

 [1] C:/Users/thoma/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.3/library

──────────────────────────────────────────────────────────────────────────────

1 R Setup

2 Finding This

3 The Data

3.1 Creating the ohc_2022 data

3.2 Selecting Predictors

4 Rootogram for a Poisson regression model

4.1 Fitting the Poisson model for goodcount with glm()

4.2 Summarizing the Poisson model fit with glm()

4.3 Rootogram for m1_Poisson using the topmodels package