Rootograms for Poisson Regression Models

Using the topmodels and gratia packages

Author

Thomas E. Love

Published

2024-04-01

1 R Setup

knitr::opts_chunk$set(comment = NA)

library(broom)
library(conflicted)
library(gt)
library(janitor)
library(mosaic)
library(sessioninfo)

library(topmodels) ## for drawing rootograms

library(mgcv) ## for fitting generalized additive models (gams)
library(gratia) ## for drawing rootograms

library(tidyverse)

theme_set(theme_light()) # set theme for ggplots

conflicts_prefer(dplyr::count, base::sum)

2 Finding This

You can find this document posted at https://rpubs.com/TELOVE/poisson-rootograms-432.

The data are available through our 432-data page at https://github.com/THOMASELOVE/432-data.

3 The Data

We have a data set, called ohc_2022.Rds that describes some information about each of the 88 counties of Ohio using the 2022 County Health Rankings report. For each county, we created (see below) a variable called goodcount which is a count of the number (out of 4) of standards met by the county. The standards are:

  • the county has a sroh_fairpoor value below the Ohio-wide mean of 18.1
  • the county has an obese_pct value below the Ohio-wide average of 34.6
  • the county has an exer_access value above the Ohio-wide average of 77.2
  • the county has NOT had a water violation in the past year (as shown by h2oviol = No)

3.1 Creating the ohc_2022 data

The data we’re using here start with the same oh_counties_2022.csv file that we used in Lab 6.

ohc_2022 <- 
  read_csv("https://raw.githubusercontent.com/THOMASELOVE/432-data/master/data/oh_counties_2022.csv", 
                 show_col_types = FALSE) |>
  clean_names() |>
  mutate(fips = as.character(fips))

Here we create the new outcome variable, goodcount for the ohc_2022 data.

ohc_2022 <- ohc_2022 |>
    mutate(goodcount = 0 + (sroh_fairpoor < 18.1) + 
               (obese_pct < 34.6) + (exer_access > 77.2) + 
               (h2oviol == "No"))
ohc_2022 |> count(goodcount)
# A tibble: 5 × 2
  goodcount     n
      <dbl> <int>
1         0    16
2         1    46
3         2    17
4         3     5
5         4     4

Among the 88 counties we find 16 counties which meet 0 of these standards, 46 which meet 1, 17 which meet 2, 5 which meet 3 and 4 which meet all 4.

3.2 Selecting Predictors

We will use the following four predictors to model our goodcount variable.

  • age_adj_mortality, the age-adjusted mortality rate for the county
  • smoker_pct, the percentage of adults who smoke tobacco in the county
  • hsgrads, the percentage of adult county residents who graduated high school, and
  • inactive_pct, the percentage of adult county residents who are largely inactive.

Below, we’ll create new variables which center each of these predictors on their mean. The scale() function can both center the observations (by subtracting the mean) and scale them to have standard deviation 1 (by dividing by the standard deviation) if we like, but here, we’ll skip the division step using the parameter scale = FALSE within the scale() function.

So, scale(variablename, scale = FALSE) does the same thing as variablename - mean(variablename)

ohc_2022 <- ohc_2022 |>
  mutate(mortality_c = scale(age_adj_mortality, scale = FALSE),
         smoker_c = scale(smoker_pct, scale = FALSE),
         hsgrad_c = scale(hsgrads, scale = FALSE),
         inactive_c = scale(inactive_pct, scale = FALSE))

## sanity check to make sure this worked properly

favstats(~ age_adj_mortality, data = ohc_2022) |> 
  gt() |> fmt_number(mean:sd, decimals = 2)
min Q1 median Q3 max mean sd n missing
215.2 373.825 419.05 476.525 706.3 429.02 88.97 88 0
favstats(~ mortality_c, data = ohc_2022) |> 
  gt() |> fmt_number(mean:sd, decimals = 2)
min Q1 median Q3 max mean sd n missing
-213.825 -55.2 -9.975 47.5 277.275 0.00 88.97 88 0
ohc_2022 <- ohc_2022 |> 
  select(fips, county, goodcount, mortality_c, smoker_c, hsgrad_c, inactive_c)

write_rds(ohc_2022, "ohc_2022.Rds")

I’ve placed this ohc_2022.Rds file on our 432 Data page.

4 Rootogram for a Poisson regression model

4.1 Fitting the Poisson model for goodcount with glm()

m1_Poisson <- 
  glm(goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c,
      family = poisson(), data = ohc_2022)

4.2 Summarizing the Poisson model fit with glm()

summary(m1_Poisson)

Call:
glm(formula = goodcount ~ mortality_c + smoker_c + hsgrad_c + 
    inactive_c, family = poisson(), data = ohc_2022)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.130237   0.103653   1.256   0.2089  
mortality_c  0.002688   0.001881   1.429   0.1529  
smoker_c    -0.012870   0.093976  -0.137   0.8911  
hsgrad_c    -0.033788   0.028522  -1.185   0.2362  
inactive_c  -0.209768   0.084996  -2.468   0.0136 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 72.907  on 87  degrees of freedom
Residual deviance: 48.884  on 83  degrees of freedom
AIC: 223.34

Number of Fisher Scoring iterations: 5
glance(m1_Poisson) |> gt()
null.deviance df.null logLik AIC BIC deviance df.residual nobs
72.90686 87 -106.6698 223.3396 235.7262 48.88433 83 88

4.3 Rootogram for m1_Poisson using the topmodels package

topmodels::rootogram(m1_Poisson)

5 Alternative Rootogram for same Poisson model

5.1 Fitting the same Poisson model with gam() from the mgcv package

m2_Poisson <- 
  gam(goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c,
      family = poisson(), data = ohc_2022)

5.2 Summarizing the Poisson model fit with gam()

summary(m2_Poisson)

Family: poisson 
Link function: log 

Formula:
goodcount ~ mortality_c + smoker_c + hsgrad_c + inactive_c

Parametric coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.130237   0.103653   1.256   0.2089  
mortality_c  0.002688   0.001881   1.429   0.1529  
smoker_c    -0.012870   0.093976  -0.137   0.8911  
hsgrad_c    -0.033788   0.028522  -1.185   0.2362  
inactive_c  -0.209768   0.084996  -2.468   0.0136 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


R-sq.(adj) =  0.452   Deviance explained = 32.9%
UBRE = -0.33086  Scale est. = 1         n = 88

We can see that the same model is fit here with gam() as we built with glm().

The tidy() function applied to m2_Poisson produces nothing of value. However, the glance() function works well…

glance(m2_Poisson) |> gt()
df logLik AIC BIC deviance df.residual nobs
5 -106.6698 223.3396 235.7262 48.88433 83 88

5.3 Rootogram for m2_Poisson using the gratia package

Here’s what’s contained in the rootogram for m2_Poisson, which we obtain here with the rootogram() function from the gratia package:

rg2 <- gratia::rootogram(m2_Poisson)
rg2
A `rootogram` object without mandatory attributes `scale` and `style`

# A tibble: 21 × 3
    .bin .observed .fitted
   <dbl>     <int>   <dbl>
 1     0        16 28.8   
 2     1        46 29.1   
 3     2        17 17.1   
 4     3         5  7.79  
 5     4         4  3.15  
 6     5         0  1.22  
 7     6         0  0.488 
 8     7         0  0.205 
 9     8         0  0.0878
10     9         0  0.0370
# ℹ 11 more rows

To plot the information from this rootogram, we use the draw() function from the gratia package:

draw(rg2) + labs(title = "Rootogram using gratia")

6 References

  1. Get started with gratia at https://gavinsimpson.github.io/gratia/articles/gratia.html
  2. Rootograms in gratia at https://gavinsimpson.github.io/gratia/reference/rootogram.html
  3. gams in the mgcv package at https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/gam.html
  4. Kleiber, C., Zeileis, A., (2016) Visualizing Count Data Regressions Using Rootograms. The American Statistician, 70, 296–303. doi:10.1080/00031305.2016.1173590

7 Session Information

Note that the stars below indicate the packages that I’ve attached in this work.

session_info()  ## from sessioninfo package
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.3 (2024-02-29 ucrt)
 os       Windows 11 x64 (build 22631)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/New_York
 date     2024-04-01
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package        * version    date (UTC) lib source
 backports        1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
 bit              4.0.5      2022-11-15 [1] CRAN (R 4.3.1)
 bit64            4.0.5      2020-08-30 [1] CRAN (R 4.3.1)
 broom          * 1.0.5      2023-06-09 [1] CRAN (R 4.3.3)
 cachem           1.0.8      2023-05-01 [1] CRAN (R 4.3.1)
 cli              3.6.2      2023-12-11 [1] CRAN (R 4.3.2)
 colorspace       2.1-0      2023-01-23 [1] CRAN (R 4.3.1)
 conflicted     * 1.2.0      2023-02-01 [1] CRAN (R 4.3.1)
 crayon           1.5.2      2022-09-29 [1] CRAN (R 4.3.1)
 curl             5.2.1      2024-03-01 [1] CRAN (R 4.3.3)
 digest           0.6.35     2024-03-11 [1] CRAN (R 4.3.3)
 distributions3   0.2.1      2022-09-07 [1] CRAN (R 4.3.3)
 dplyr          * 1.1.4      2023-11-17 [1] CRAN (R 4.3.2)
 ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.3.1)
 evaluate         0.23       2023-11-01 [1] CRAN (R 4.3.2)
 fansi            1.0.6      2023-12-08 [1] CRAN (R 4.3.2)
 farver           2.1.1      2022-07-06 [1] CRAN (R 4.3.1)
 fastmap          1.1.1      2023-02-24 [1] CRAN (R 4.3.1)
 forcats        * 1.0.0      2023-01-29 [1] CRAN (R 4.3.1)
 generics         0.1.3      2022-07-05 [1] CRAN (R 4.3.1)
 ggformula      * 0.12.0     2023-11-09 [1] CRAN (R 4.3.2)
 ggokabeito       0.1.0      2021-10-18 [1] CRAN (R 4.3.3)
 ggplot2        * 3.5.0      2024-02-23 [1] CRAN (R 4.3.3)
 ggridges         0.5.6      2024-01-23 [1] CRAN (R 4.3.3)
 glue             1.7.0      2024-01-09 [1] CRAN (R 4.3.3)
 gratia         * 0.9.0      2024-03-27 [1] CRAN (R 4.3.3)
 gt             * 0.10.1     2024-01-17 [1] CRAN (R 4.3.3)
 gtable           0.3.4      2023-08-21 [1] CRAN (R 4.3.1)
 haven            2.5.4      2023-11-30 [1] CRAN (R 4.3.2)
 hms              1.1.3      2023-03-21 [1] CRAN (R 4.3.1)
 htmltools        0.5.8      2024-03-25 [1] CRAN (R 4.3.3)
 htmlwidgets      1.6.4      2023-12-06 [1] CRAN (R 4.3.2)
 janitor        * 2.2.0      2023-02-02 [1] CRAN (R 4.3.3)
 jsonlite         1.8.8      2023-12-04 [1] CRAN (R 4.3.2)
 knitr            1.45       2023-10-30 [1] CRAN (R 4.3.3)
 labeling         0.4.3      2023-08-29 [1] CRAN (R 4.3.1)
 labelled         2.12.0     2023-06-21 [1] CRAN (R 4.3.1)
 lattice        * 0.22-6     2024-03-20 [1] CRAN (R 4.3.3)
 lifecycle        1.0.4      2023-11-07 [1] CRAN (R 4.3.2)
 lubridate      * 1.9.3      2023-09-27 [1] CRAN (R 4.3.1)
 magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.3.3)
 MASS             7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.3)
 Matrix         * 1.6-5      2024-01-11 [1] CRAN (R 4.3.2)
 memoise          2.0.1      2021-11-26 [1] CRAN (R 4.3.1)
 mgcv           * 1.9-1      2023-12-21 [1] CRAN (R 4.3.3)
 mosaic         * 1.9.1      2024-02-23 [1] CRAN (R 4.3.3)
 mosaicCore       0.9.4.0    2023-11-05 [1] CRAN (R 4.3.2)
 mosaicData     * 0.20.4     2023-11-05 [1] CRAN (R 4.3.2)
 munsell          0.5.1      2024-04-01 [1] CRAN (R 4.3.2)
 mvnfast          0.2.8      2023-02-23 [1] CRAN (R 4.3.3)
 nlme           * 3.1-164    2023-11-27 [2] CRAN (R 4.3.3)
 patchwork        1.2.0      2024-01-08 [1] CRAN (R 4.3.3)
 pillar           1.9.0      2023-03-22 [1] CRAN (R 4.3.1)
 pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.3.1)
 purrr          * 1.0.2      2023-08-10 [1] CRAN (R 4.3.1)
 R6               2.5.1      2021-08-19 [1] CRAN (R 4.3.1)
 Rcpp             1.0.12     2024-01-09 [1] CRAN (R 4.3.2)
 readr          * 2.1.5      2024-01-10 [1] CRAN (R 4.3.2)
 rlang            1.1.3      2024-01-10 [1] CRAN (R 4.3.2)
 rmarkdown        2.26       2024-03-05 [1] CRAN (R 4.3.3)
 rstudioapi       0.16.0     2024-03-24 [1] CRAN (R 4.3.3)
 sass             0.4.9      2024-03-15 [1] CRAN (R 4.3.3)
 scales           1.3.0      2023-11-28 [1] CRAN (R 4.3.2)
 sessioninfo    * 1.2.2      2021-12-06 [1] CRAN (R 4.3.2)
 snakecase        0.11.1     2023-08-27 [1] CRAN (R 4.3.1)
 stringi          1.8.3      2023-12-11 [1] CRAN (R 4.3.2)
 stringr        * 1.5.1      2023-11-14 [1] CRAN (R 4.3.2)
 tibble         * 3.2.1      2023-03-20 [1] CRAN (R 4.3.1)
 tidyr          * 1.3.1      2024-01-24 [1] CRAN (R 4.3.3)
 tidyselect       1.2.1      2024-03-11 [1] CRAN (R 4.3.3)
 tidyverse      * 2.0.0      2023-02-22 [1] CRAN (R 4.3.3)
 timechange       0.3.0      2024-01-18 [1] CRAN (R 4.3.3)
 topmodels      * 0.3-0      2024-03-15 [1] R-Forge (R 4.3.3)
 tzdb             0.4.0      2023-05-12 [1] CRAN (R 4.3.1)
 utf8             1.2.4      2023-10-22 [1] CRAN (R 4.3.2)
 vctrs            0.6.5      2023-12-01 [1] CRAN (R 4.3.2)
 vroom            1.6.5      2023-12-05 [1] CRAN (R 4.3.2)
 withr            3.0.0      2024-01-16 [1] CRAN (R 4.3.3)
 xfun             0.43       2024-03-25 [1] CRAN (R 4.3.3)
 xml2             1.3.6      2023-12-04 [1] CRAN (R 4.3.2)
 yaml             2.3.8      2023-12-11 [1] CRAN (R 4.3.2)

 [1] C:/Users/thoma/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.3/library

──────────────────────────────────────────────────────────────────────────────