Count Dependent Variable Tutorials

library(descr)

## Warning: package 'descr' was built under R version 4.2.3

library(MASS)#glm modeling

## Warning: package 'MASS' was built under R version 4.2.3

library(rstatix) #Summary Tables

## Warning: package 'rstatix' was built under R version 4.2.3

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:MASS':
## 
##     select

## The following object is masked from 'package:stats':
## 
##     filter

library(tidyverse) #General coding

## Warning: package 'tidyverse' was built under R version 4.2.3

## Warning: package 'ggplot2' was built under R version 4.2.3

## Warning: package 'tibble' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'readr' was built under R version 4.2.3

## Warning: package 'purrr' was built under R version 4.2.3

## Warning: package 'dplyr' was built under R version 4.2.3

## Warning: package 'stringr' was built under R version 4.2.3

## Warning: package 'forcats' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks rstatix::filter(), stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ dplyr::select() masks rstatix::select(), MASS::select()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(patchwork) #Merge GGPlots togeter

## Warning: package 'patchwork' was built under R version 4.2.3

## 
## Attaching package: 'patchwork'
## 
## The following object is masked from 'package:MASS':
## 
##     area

library(ggplot2) #Graphing
library(stargazer) #Tabular Regression Results

## 
## Please cite as: 
## 
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

library(jtools) #Tabular Regression Results

## Warning: package 'jtools' was built under R version 4.2.3

library(descr) #Easy Frequency Tables  
library(haven) #Imports survey data

## Warning: package 'haven' was built under R version 4.2.3

library(stats) #Various statistical calculations 
library(ggeffects) #Predicted Probabilities from Regressions

## Warning: package 'ggeffects' was built under R version 4.2.3

library(pscl) #Zero Inflated Negative Binomial

## Warning: package 'pscl' was built under R version 4.2.3

## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis

anes <- read_dta("C:/Users/Carey/Dropbox/UMass/2024-2025/Spring/AQM/Data/anes_2020_count.dta")
anes <- anes %>% 
  mutate(pid_x = labelled(
    case_when(V201228 == 1 & V201229 == 1 ~ 1,  
      V201228 == 1 & V201229 == 2 ~ 2,  
      V201230 == 3 ~ 3,                 
      V201230 == 2 ~ 4,                
      V201230 == 1 ~ 5,                
      V201228 == 2 & V201229 == 2 ~ 6, 
      V201228 == 2 & V201229 == 1 ~ 7),
    labels = c("Strong Dem" = 1,
      "Weak Dem" = 2,
      "Lean Dem" = 3,
      "Ind" = 4,
      "Lean Rep" = 5,
      "Weak Rep" = 6,
      "Strong Rep" = 7)))

anes <- anes %>% # 
  mutate(college = case_when(
    V201511x==1 ~ 0,
    V201511x==2  ~ 0,
    V201511x==3  ~ 0,
    V201511x==4  ~ 1,
    V201511x==5  ~ 1))

anes <- anes %>% 
  mutate(poc = case_when(
    V201549x ==1 ~ 0,
    V201549x >=2 ~ 1,
    V201549x <=-1 ~ NA_real_))  

anes <- anes %>% 
  mutate(female = case_when(
    V201600 ==1 ~ 0,
    V201600 ==2 ~ 1,
    V201600 <=-1 ~ NA_real_)) 

anes <- anes %>%
  mutate(libcon = replace(V201200, (V201200 <= -1 | V201200 >= 99), NA))
freq(anes$libcon)

## PRE: 7pt scale liberal-conservative self-placement 
##       Frequency Percent Valid Percent
## 1           163   4.656         5.181
## 2           566  16.167        17.991
## 3           404  11.540        12.842
## 4           739  21.108        23.490
## 5           379  10.825        12.047
## 6           704  20.109        22.378
## 7           191   5.456         6.071
## NA's        355  10.140              
## Total      3501 100.000       100.000

value_labels <- c("Ext Liberal", "Liberal", 
                  "Slightly Liberal", "Moderate", "Slightly Conserv",
                  "Conserv", "Ext Conserv") #Add value labels to your new measure. Only use this if you need to as it changes the variable type to 'factor' which influences the types of analysis you can do with it. Previously, it was 'numeric'. 

# Assign value labels to the Response variable
anes$libcon_f <- factor(anes$libcon, levels = 1:7, labels = value_labels)

anes$libcon_f <- relevel(anes$libcon_f, ref = "Moderate") #Change reference group

Count Dependent Variables

In this tutorial, we will be reviewing how to estimate, evaluate and interpret count regression models. The dependent variable in these models are counts of something occurring. In survey data, it is oftentimes the count of how many times something is done in a specified time period, like going to the gym in the past 30 days or the number of different participatory acts someone did during a recent election.

While it is tempting to consider this variable type to be continuous, it is technically drawn from a count distribution rather than a Gaussian one. For instance, counts must be an integer and cannot be negative. That type of data structure does not apply to a Gaussian distribution.

Here, we review several different count model types including Poisson, Negative-Binomial, Zero-Inflated Poisson (Zips), and Zero-Inflated Negative-Binomial (ZINB). This includes covering when to use each type of model and why.

We will be working out of the 2020 American National Election Survey in this tutorial. The dependent variable will be a count of the number of rightwing news sources an individual consumed during the 2020 presidential election. There will be two specific DVs:

Number of Fox News Opinion TV Programs Watched (fox_shows)
- Consists of 4 possible Fox TV opinion shows (Carlson, Hannity, The Five, Ingraham)
Number of broader right-wing news media programs consumed (all_rw_shows)
- Consists of the above 4 plus other right-wing shows from 2020 like Rush Limbaugh, Glen Beck, and Breitbart News for 12 total right wing media sources that could have been consumed

First, examine the distribution of each count variable.

freq(anes$fox_shows)

## anes$fox_shows 
##       Frequency  Percent
## 0          2879  82.2336
## 1           361  10.3113
## 2           170   4.8558
## 3            84   2.3993
## 4             7   0.1999
## Total      3501 100.0000

freq(anes$all_rw_shows)

## anes$all_rw_shows 
##       Frequency   Percent Valid Percent
## 0          2325  66.40960      76.15460
## 1           281   8.02628       9.20406
## 2           145   4.14167       4.74943
## 3           117   3.34190       3.83230
## 4            75   2.14225       2.45660
## 5            56   1.59954       1.83426
## 6            20   0.57127       0.65509
## 7            27   0.77121       0.88438
## 8             3   0.08569       0.09826
## 9             4   0.11425       0.13102
## NA's        448  12.79634              
## Total      3501 100.00000     100.00000

For the Fox News program count, there are a lot of zeroes, ~82% meaning of the sample watched 0 of the Fox News opinion shows listed, with 1’s being the most commonly watched number of shows. The full right wing media variable had fewer 0’s, ~66%, with a few respondents consuming 9 of the 12 possible shows. We will return to this distribution shortly, but first let’s estimate the most basic of the count models - the Poisson.

Basic Poisson Model

The Poisson count model assumes the dependent variable is drawn from the Poisson distribution. What is unique about the Poisson distribution is that it assumes the mean of the count variable equals its variance. When this is violated, the estimated standard errors are the wrong size leading to Type I and Type II errors. However, this does nothing to bias the coefficients, which remain the same regardless of the issues with the standard errors.

We start with a simple model with one predictor - political ideology - measured on a 7-point scale from extremely liberal (1) to extremely conservative (7) with moderates (4) in the middle.

pois_fox <- glm(fox_shows ~ libcon_f, family = poisson, data = anes, weights = anes$V200010b)
summary(pois_fox)

## 
## Call:
## glm(formula = fox_shows ~ libcon_f, family = poisson, data = anes, 
##     weights = anes$V200010b)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0296  -0.6216  -0.2701  -0.0930   6.6264  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -1.89079    0.09819 -19.257  < 2e-16 ***
## libcon_fExt Liberal      -15.41179  304.76458  -0.051     0.96    
## libcon_fLiberal           -1.52599    0.27287  -5.592 2.24e-08 ***
## libcon_fSlightly Liberal  -1.48588    0.31356  -4.739 2.15e-06 ***
## libcon_fSlightly Conserv   0.76332    0.13603   5.611 2.01e-08 ***
## libcon_fConserv            1.51977    0.10934  13.900  < 2e-16 ***
## libcon_fExt Conserv        1.84095    0.12423  14.819  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2970.9  on 2898  degrees of freedom
## Residual deviance: 2127.6  on 2892  degrees of freedom
##   (602 observations deleted due to missingness)
## AIC: 3383.3
## 
## Number of Fisher Scoring iterations: 15

pois_all <- glm(all_rw_shows ~ libcon_f, family = poisson, data = anes, weights = anes$V200010b)
summary(pois_all)

## 
## Call:
## glm(formula = all_rw_shows ~ libcon_f, family = poisson, data = anes, 
##     weights = anes$V200010b)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.7584  -0.7615  -0.3587  -0.1587   7.4819  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -1.21260    0.07547 -16.067  < 2e-16 ***
## libcon_fExt Liberal      -2.88802    0.73057  -3.953 7.71e-05 ***
## libcon_fLiberal          -1.78814    0.22465  -7.960 1.72e-15 ***
## libcon_fSlightly Liberal -1.13007    0.19953  -5.664 1.48e-08 ***
## libcon_fSlightly Conserv  0.79642    0.10332   7.708 1.28e-14 ***
## libcon_fConserv           1.60536    0.08341  19.246  < 2e-16 ***
## libcon_fExt Conserv       2.14999    0.09097  23.634  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 5126.2  on 2543  degrees of freedom
## Residual deviance: 3304.7  on 2537  degrees of freedom
##   (957 observations deleted due to missingness)
## AIC: 4966.5
## 
## Number of Fisher Scoring iterations: 6

stargazer(pois_fox, pois_all, type="text", digits=3)

## 
## =====================================================
##                              Dependent variable:     
##                          ----------------------------
##                            fox_shows    all_rw_shows 
##                               (1)            (2)     
## -----------------------------------------------------
## libcon_fExt Liberal         -15.412       -2.888***  
##                            (304.765)       (0.731)   
##                                                      
## libcon_fLiberal            -1.526***      -1.788***  
##                             (0.273)        (0.225)   
##                                                      
## libcon_fSlightly Liberal   -1.486***      -1.130***  
##                             (0.314)        (0.200)   
##                                                      
## libcon_fSlightly Conserv    0.763***      0.796***   
##                             (0.136)        (0.103)   
##                                                      
## libcon_fConserv             1.520***      1.605***   
##                             (0.109)        (0.083)   
##                                                      
## libcon_fExt Conserv         1.841***      2.150***   
##                             (0.124)        (0.091)   
##                                                      
## Constant                   -1.891***      -1.213***  
##                             (0.098)        (0.075)   
##                                                      
## -----------------------------------------------------
## Observations                 2,899          2,544    
## Log Likelihood             -1,684.661    -2,476.275  
## Akaike Inf. Crit.          3,383.322      4,966.549  
## =====================================================
## Note:                     *p<0.1; **p<0.05; ***p<0.01

Results from our two simple model reveal the relationship between political ideology and right wing media consumption. The coefficients are on the logarithmic scale so are not easily interpretable in terms of the substantive effect We can interpret the direction and significance of the relationship directly from a regression table. Not surprisingly, conservatives consume more right wing media than moderates while liberals consume less.

fox_pred<-ggpredict(pois_fox, terms="libcon_f")
print(fox_pred)

## # Predicted counts of fox_shows
## 
## libcon_f         | Predicted |            95% CI
## ------------------------------------------------
## Moderate         |      0.15 | [0.12,      0.18]
## Ext Liberal      |      0.00 | [0.00, 7.97e+251]
## Liberal          |      0.03 | [0.02,      0.05]
## Slightly Liberal |      0.03 | [0.02,      0.06]
## Slightly Conserv |      0.32 | [0.27,      0.39]
## Conserv          |      0.69 | [0.63,      0.76]
## Ext Conserv      |      0.95 | [0.82,      1.10]

all_pred<-ggpredict(pois_all, terms="libcon_f")
print(all_pred)

## # Predicted counts of all_rw_shows
## 
## libcon_f         | Predicted |       95% CI
## -------------------------------------------
## Moderate         |      0.30 | [0.26, 0.34]
## Ext Liberal      |      0.02 | [0.00, 0.07]
## Liberal          |      0.05 | [0.03, 0.08]
## Slightly Liberal |      0.10 | [0.07, 0.14]
## Slightly Conserv |      0.66 | [0.57, 0.76]
## Conserv          |      1.48 | [1.38, 1.59]
## Ext Conserv      |      2.55 | [2.31, 2.82]

One thing we have not done yet is to test the assumption that the means of the DVs are equal to their variance. If they are not, we risk misestimating the size of our standard errors which can lead to incorrect conclusions.

Testing Variance = Mean Assumption of the Poisson Distribution

There are multiple ways to test the assumption that the mean equals the variance of the dependent variables. We will cover a few of them including…

Comparing the simple mean and variance of the count DVs
Estimating a Negative-Binomial model and evaluating the theta value (1/theta in actuality)

var_fox_shows <- var(anes$fox_shows, na.rm = TRUE)
var_all_rw_shows <- var(anes$all_rw_shows, na.rm = TRUE)
mean_fox_shows <- mean(anes$fox_shows, na.rm = TRUE)
mean_all_rw_shows <- mean(anes$all_rw_shows, na.rm = TRUE)
ratio_fox<-var_fox_shows/mean_fox_shows
ratio_all<-var_all_rw_shows/mean_all_rw_shows

# Combine means and variances for comparison 
count_variance <- data.frame(
  Variable = c("fox_shows", "all_rw_shows"),
  Mean = c(mean_fox_shows, mean_all_rw_shows),
  Variance = c(var_fox_shows, var_all_rw_shows),
  Ratio=c(ratio_fox, ratio_all)
)

# Print the variance results for evaluation 
print(count_variance)

##       Variable      Mean  Variance    Ratio
## 1    fox_shows 0.2802057 0.4668909 1.666244
## 2 all_rw_shows 0.6128398 1.9418010 3.168529

Using the tidyverse, we calculate the mean, variance, and ratio of the two for both of the count DVs. The ratio value is the easiest way to evaluate the relationship, also called dispersion. If mean = variance then the ratio will equal 1. The larger the value the worse the overdispersion is in the count variable. Overdispersion leads to standard errors that are too small, which leads us to be more certain about the relationship between our Xs and Y then we should be. The smaller the value, the worse the underdispersion is which leads the standard errors to be too big. This is less common than overdispersion but can happen.

These results suggest that we definitely have an overdispersion problem with the full right-wing media variable (ratio > 3) while the Fox TV shows variable shows some, but not as overwhelming, evidence of overdispersion (ratio of 1.7).

Quasi-Poisson Model

With the dispersion issues seen in variance/mean calculation, we model the count DVs with a technique that explicitly accounts for this issue. First, we will use the quasi-Poisson model before turning our attention to the Negative-Binomial model.

The quasi-Poisson model uses the same distribution as the Poisson model but adjusts for the variance/mean issue. One thing to note is that the quasi-Poisson modeling approach will not return basic fit statistics like the AIC, BIC or likelihood ratio. In the summary command for the regression results, it does return a dispersion parameter. Here we have a dispersion parameter of 1.23 for the Fox News shows - indicating slight overdispersion like we saw above - and 2.027 for all Right Wing media sources indicating problematic overdispersion. When calculating the negative binomial model, we will review the theta value which will gives us statistical tests to know if we need to worry about the dispersion parameter or not.

quasi_fox <- glm(fox_shows ~ libcon_f, family = quasipoisson, data = anes, weights = anes$V200010b)
summary(quasi_fox)

## 
## Call:
## glm(formula = fox_shows ~ libcon_f, family = quasipoisson, data = anes, 
##     weights = anes$V200010b)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0296  -0.6216  -0.2701  -0.0930   6.6264  
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -1.8908     0.1088 -17.374  < 2e-16 ***
## libcon_fExt Liberal      -15.4118   337.7893  -0.046    0.964    
## libcon_fLiberal           -1.5260     0.3024  -5.046 4.80e-07 ***
## libcon_fSlightly Liberal  -1.4859     0.3475  -4.275 1.97e-05 ***
## libcon_fSlightly Conserv   0.7633     0.1508   5.063 4.39e-07 ***
## libcon_fConserv            1.5198     0.1212  12.541  < 2e-16 ***
## libcon_fExt Conserv        1.8409     0.1377  13.370  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.228465)
## 
##     Null deviance: 2970.9  on 2898  degrees of freedom
## Residual deviance: 2127.6  on 2892  degrees of freedom
##   (602 observations deleted due to missingness)
## AIC: NA
## 
## Number of Fisher Scoring iterations: 15

quasi_all <- glm(all_rw_shows ~ libcon_f, family = quasipoisson, data = anes, weights = anes$V200010b)
summary(quasi_all)

## 
## Call:
## glm(formula = all_rw_shows ~ libcon_f, family = quasipoisson, 
##     data = anes, weights = anes$V200010b)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.7584  -0.7615  -0.3587  -0.1587   7.4819  
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -1.2126     0.1075 -11.284  < 2e-16 ***
## libcon_fExt Liberal       -2.8880     1.0402  -2.776  0.00554 ** 
## libcon_fLiberal           -1.7881     0.3199  -5.590 2.51e-08 ***
## libcon_fSlightly Liberal  -1.1301     0.2841  -3.978 7.16e-05 ***
## libcon_fSlightly Conserv   0.7964     0.1471   5.413 6.76e-08 ***
## libcon_fConserv            1.6054     0.1188  13.517  < 2e-16 ***
## libcon_fExt Conserv        2.1500     0.1295  16.598  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 2.027423)
## 
##     Null deviance: 5126.2  on 2543  degrees of freedom
## Residual deviance: 3304.7  on 2537  degrees of freedom
##   (957 observations deleted due to missingness)
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

stargazer(quasi_fox, quasi_all, type="text", digits=3)

## 
## =====================================================
##                              Dependent variable:     
##                          ----------------------------
##                            fox_shows    all_rw_shows 
##                               (1)            (2)     
## -----------------------------------------------------
## libcon_fExt Liberal         -15.412       -2.888***  
##                            (337.789)       (1.040)   
##                                                      
## libcon_fLiberal            -1.526***      -1.788***  
##                             (0.302)        (0.320)   
##                                                      
## libcon_fSlightly Liberal   -1.486***      -1.130***  
##                             (0.348)        (0.284)   
##                                                      
## libcon_fSlightly Conserv    0.763***      0.796***   
##                             (0.151)        (0.147)   
##                                                      
## libcon_fConserv             1.520***      1.605***   
##                             (0.121)        (0.119)   
##                                                      
## libcon_fExt Conserv         1.841***      2.150***   
##                             (0.138)        (0.130)   
##                                                      
## Constant                   -1.891***      -1.213***  
##                             (0.109)        (0.107)   
##                                                      
## -----------------------------------------------------
## Observations                 2,899          2,544    
## =====================================================
## Note:                     *p<0.1; **p<0.05; ***p<0.01

stargazer(pois_fox, quasi_fox, pois_all, quasi_all, type="text", digits=3)

## 
## ==================================================================================
##                                             Dependent variable:                   
##                          ---------------------------------------------------------
##                                   fox_shows                   all_rw_shows        
##                           Poisson   glm: quasipoisson  Poisson   glm: quasipoisson
##                                        link = log                   link = log    
##                             (1)            (2)           (3)            (4)       
## ----------------------------------------------------------------------------------
## libcon_fExt Liberal       -15.412        -15.412      -2.888***      -2.888***    
##                          (304.765)      (337.789)      (0.731)        (1.040)     
##                                                                                   
## libcon_fLiberal          -1.526***      -1.526***     -1.788***      -1.788***    
##                           (0.273)        (0.302)       (0.225)        (0.320)     
##                                                                                   
## libcon_fSlightly Liberal -1.486***      -1.486***     -1.130***      -1.130***    
##                           (0.314)        (0.348)       (0.200)        (0.284)     
##                                                                                   
## libcon_fSlightly Conserv  0.763***      0.763***       0.796***      0.796***     
##                           (0.136)        (0.151)       (0.103)        (0.147)     
##                                                                                   
## libcon_fConserv           1.520***      1.520***       1.605***      1.605***     
##                           (0.109)        (0.121)       (0.083)        (0.119)     
##                                                                                   
## libcon_fExt Conserv       1.841***      1.841***       2.150***      2.150***     
##                           (0.124)        (0.138)       (0.091)        (0.130)     
##                                                                                   
## Constant                 -1.891***      -1.891***     -1.213***      -1.213***    
##                           (0.098)        (0.109)       (0.075)        (0.107)     
##                                                                                   
## ----------------------------------------------------------------------------------
## Observations               2,899          2,899         2,544          2,544      
## Log Likelihood           -1,684.661                   -2,476.275                  
## Akaike Inf. Crit.        3,383.322                    4,966.549                   
## ==================================================================================
## Note:                                                  *p<0.1; **p<0.05; ***p<0.01

The second stargazer table compares the quasi-Poisson with the basic Poisson. Notice that the coefficients remain identical, as they should. Think of the quasi-Poisson in a similar way to using robust standard errors in OLS - the coefficients remain the same but the standard errors are adjusted. Because we have overdispersion problem in the data, the standard errors are larger in the quasi-Poisson compared to the Poisson. If we had an underdispersion problem, they would be smaller.

Negative-Binomial Model

Now, we will estimate both models using a Negative-Binomial approach. This modeling approach assumes the count variable is drawn from a slightly different probability distribution, the Negative-Binomial distribution. Importantly, this distribution does not assume that the mean=variance like the Poisson distribution. The dispersion parameter in the output is called theta. The closer to zero theta is the more a problem overdispersion is in the data.

We will also use the fact that a Poisson model is technically a nested model of the Negative-Binomial model (it simply is a form of the NB when mean = variance) to compare the model fit statistics we have covered previously. All things being equal, we should use the model that better fit the data.

# Checking Dispersion Parameter from a Negative Binomial 
nb_fox <- glm.nb(fox_shows ~ libcon_f , anes, weights = anes$V200010b)
summary(nb_fox)

## 
## Call:
## glm.nb(formula = fox_shows ~ libcon_f, data = anes, weights = anes$V200010b, 
##     init.theta = 1.147262873, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6807  -0.5852  -0.2667  -0.0924   5.6124  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                -1.8908     0.1044 -18.102  < 2e-16 ***
## libcon_fExt Liberal       -18.4118  1365.8600  -0.013    0.989    
## libcon_fLiberal            -1.5260     0.2785  -5.479 4.28e-08 ***
## libcon_fSlightly Liberal   -1.4859     0.3197  -4.647 3.36e-06 ***
## libcon_fSlightly Conserv    0.7633     0.1492   5.115 3.15e-07 ***
## libcon_fConserv             1.5198     0.1209  12.571  < 2e-16 ***
## libcon_fExt Conserv         1.8409     0.1466  12.553  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.1473) family taken to be 1)
## 
##     Null deviance: 2230.8  on 2898  degrees of freedom
## Residual deviance: 1556.4  on 2892  degrees of freedom
##   (602 observations deleted due to missingness)
## AIC: 3301.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.147 
##           Std. Err.:  0.179 
## 
##  2 x log-likelihood:  -3285.055

nb_all <- glm.nb(all_rw_shows ~ libcon_f , anes, weights = anes$V200010b)
summary(nb_all)

## 
## Call:
## glm.nb(formula = all_rw_shows ~ libcon_f, data = anes, weights = anes$V200010b, 
##     init.theta = 0.5601640797, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.1040  -0.6530  -0.3235  -0.1427   4.9686  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -1.21260    0.09338 -12.985  < 2e-16 ***
## libcon_fExt Liberal      -2.88802    0.74323  -3.886 0.000102 ***
## libcon_fLiberal          -1.78814    0.23973  -7.459 8.71e-14 ***
## libcon_fSlightly Liberal -1.13007    0.22065  -5.121 3.03e-07 ***
## libcon_fSlightly Conserv  0.79642    0.13987   5.694 1.24e-08 ***
## libcon_fConserv           1.60536    0.11540  13.911  < 2e-16 ***
## libcon_fExt Conserv       2.14999    0.15185  14.159  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.5602) family taken to be 1)
## 
##     Null deviance: 2414.5  on 2543  degrees of freedom
## Residual deviance: 1532.5  on 2537  degrees of freedom
##   (957 observations deleted due to missingness)
## AIC: 4306.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.5602 
##           Std. Err.:  0.0467 
## 
##  2 x log-likelihood:  -4290.1410

stargazer(nb_fox, nb_all, type="text", diits=3)

## 
## ==========================================================
##                                 Dependent variable:       
##                          ---------------------------------
##                             fox_shows       all_rw_shows  
##                                (1)              (2)       
## ----------------------------------------------------------
## libcon_fExt Liberal          -18.412         -2.888***    
##                            (1,365.860)        (0.743)     
##                                                           
## libcon_fLiberal             -1.526***        -1.788***    
##                              (0.279)          (0.240)     
##                                                           
## libcon_fSlightly Liberal    -1.486***        -1.130***    
##                              (0.320)          (0.221)     
##                                                           
## libcon_fSlightly Conserv     0.763***         0.796***    
##                              (0.149)          (0.140)     
##                                                           
## libcon_fConserv              1.520***         1.605***    
##                              (0.121)          (0.115)     
##                                                           
## libcon_fExt Conserv          1.841***         2.150***    
##                              (0.147)          (0.152)     
##                                                           
## Constant                    -1.891***        -1.213***    
##                              (0.104)          (0.093)     
##                                                           
## ----------------------------------------------------------
## Observations                  2,899            2,544      
## Log Likelihood              -1,643.528       -2,146.071   
## theta                    1.147*** (0.179) 0.560*** (0.047)
## Akaike Inf. Crit.           3,301.055        4,306.141    
## ==========================================================
## Note:                          *p<0.1; **p<0.05; ***p<0.01
## 
## =
## 3
## -

The first thing we will examine from the Negative-Binomial models is the theta parameter. The benefit of running the negative binomial model instead of just looking at the ratio of variance to mean is the associated significance test that accompanies theta. Here, we see significant theta values in both models indicating that overdispersion is a problem so the Poisson model will have standard errors that are too small potentially leading to Type I error.

Next, let’s take advantage of the fact that the Poisson model is nested within the NB and compare model fit statistics. We can use AIC and BIC to compare but because these models are nested we can also use the Likelihood Ratio and deviance tests for comparisons.

#Create function to compare model fit stats
glm_fit <- function(model) {
  # Calculate Likelihood Ratio
  lr <- logLik(model)
  
  # Calculate AIC
  aic <- AIC(model)
  
  # Calculate BIC
  n <- nobs(model)
  p <- length(coef(model))
  bic <- -2 * logLik(model) + p * log(n)
  
  # Calculate Deviance
  deviance <- summary(model)$deviance
  
  # Return the metrics as a list
  metrics <- data.frame(Likelihood_Ratio = lr, AIC = aic, BIC = bic, Deviance = deviance, Coefficients = p)
  return(metrics)
}

glm_fit(pois_fox)

glm_fit(nb_fox)

glm_fit(pois_all)

glm_fit(nb_all)

If the NB distribution fits the DV data structure better, we will see better fit statistics compared to the Poisson model. That is exactly what we see for both sets of models. The AIC and BIC fir stats are smaller for both NB models and both the LR and Deviance tests are closer to zero as well (indicting a better fit). This is more evidence that we should be reporting the NB models rather than the Poisson.

Next, we will compare the Poisson and NB model results side-by-side. Despite the two models assuming a slightly different distribution of the DV, both models’ coefficients are in the logarithmic scale.

stargazer(pois_fox, nb_fox, pois_all, nb_all, type="text", digits=3)

## 
## ================================================================================
##                                            Dependent variable:                  
##                          -------------------------------------------------------
##                                   fox_shows                 all_rw_shows        
##                           Poisson       negative      Poisson       negative    
##                                         binomial                    binomial    
##                             (1)           (2)           (3)           (4)       
## --------------------------------------------------------------------------------
## libcon_fExt Liberal       -15.412       -18.412      -2.888***     -2.888***    
##                          (304.765)    (1,365.860)     (0.731)       (0.743)     
##                                                                                 
## libcon_fLiberal          -1.526***     -1.526***     -1.788***     -1.788***    
##                           (0.273)       (0.279)       (0.225)       (0.240)     
##                                                                                 
## libcon_fSlightly Liberal -1.486***     -1.486***     -1.130***     -1.130***    
##                           (0.314)       (0.320)       (0.200)       (0.221)     
##                                                                                 
## libcon_fSlightly Conserv  0.763***      0.763***      0.796***      0.796***    
##                           (0.136)       (0.149)       (0.103)       (0.140)     
##                                                                                 
## libcon_fConserv           1.520***      1.520***      1.605***      1.605***    
##                           (0.109)       (0.121)       (0.083)       (0.115)     
##                                                                                 
## libcon_fExt Conserv       1.841***      1.841***      2.150***      2.150***    
##                           (0.124)       (0.147)       (0.091)       (0.152)     
##                                                                                 
## Constant                 -1.891***     -1.891***     -1.213***     -1.213***    
##                           (0.098)       (0.104)       (0.075)       (0.093)     
##                                                                                 
## --------------------------------------------------------------------------------
## Observations               2,899         2,899         2,544         2,544      
## Log Likelihood           -1,684.661    -1,643.528    -2,476.275    -2,146.071   
## theta                               1.147*** (0.179)            0.560*** (0.047)
## Akaike Inf. Crit.        3,383.322     3,301.055     4,966.549     4,306.141    
## ================================================================================
## Note:                                                *p<0.1; **p<0.05; ***p<0.01

Results are largely identical between the two types of models. This will oftentimes be the case, but it does not have to be. The coefficients will sometimes be slightly different between the two models. Generally, the main difference is in the size of the standard errors since the NB model calculates SE differently. This typically increases the size of the SE, which it does here in both models, but sometimes can make them smaller as well (just like Robust SE).

The odd findings for the Extreme Liberals on the Fox Shows DV is because only three respondents in this group watched at least 1 show. This low occurrence rate made the standard errors more difficult to estimate.

Now that we have decided to use the NB model, we can calculate predicted number of shows watched for each level of the IV, political ideology, using the same general format from previous tutorials. We will be using the more efficient ggpredict package here.

fox_pred<-ggpredict(nb_fox, terms="libcon_f")
print(fox_pred)

## # Predicted counts of fox_shows
## 
## libcon_f         | Predicted |           95% CI
## -----------------------------------------------
## Moderate         |      0.15 | [0.12,     0.19]
## Ext Liberal      |      0.00 | [0.00,      Inf]
## Liberal          |      0.03 | [0.02,     0.05]
## Slightly Liberal |      0.03 | [0.02,     0.06]
## Slightly Conserv |      0.32 | [0.26,     0.40]
## Conserv          |      0.69 | [0.61,     0.78]
## Ext Conserv      |      0.95 | [0.78,     1.16]

all_pred<-ggpredict(nb_all, terms="libcon_f")
print(all_pred)

## # Predicted counts of all_rw_shows
## 
## libcon_f         | Predicted |       95% CI
## -------------------------------------------
## Moderate         |      0.30 | [0.25, 0.36]
## Ext Liberal      |      0.02 | [0.00, 0.07]
## Liberal          |      0.05 | [0.03, 0.08]
## Slightly Liberal |      0.10 | [0.06, 0.14]
## Slightly Conserv |      0.66 | [0.54, 0.81]
## Conserv          |      1.48 | [1.30, 1.69]
## Ext Conserv      |      2.55 | [2.02, 3.23]

As expected from reviewing the regression results, conservatives consume more right-wing media than moderates or liberals. From the Fox only model, our basic model expects moderates to watch about .15 shows on average whereas an extremely conservative person is likely watch about .95 shows on average. With the broader right-winder media environment model, there are bigger differences with extremely conservative respondents consume about 2.5 shows on average compared to just .3 shows for moderates.

We can pass these values into ggplot for easy visualization as well.

custom_order <- c('Ext Liberal', 'Liberal', 'Slightly Liberal', 'Moderate', 'Slightly Conserv', 'Conserv', 'Ext Conserv')

ggplot(all_pred, aes(x = x, y = predicted)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7, fill="grey") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7)) +
  theme_minimal(base_size = 13) +
  labs(x = "# of Right-wing Media Shows Consumed", y = "Predicted Count", 
       title = "Predicted Count of Right-wing Media Shows Consumed", 
       subtitle = "Results from Negative-Binomial Model")+
  scale_x_discrete(limits = custom_order)

Zero-Inflated Count Models

If you remember all the way back at the beginning of this tutorial, we examined the frequency of distribution of both count DVs and saw that both contained a high number of zeros. If this is the case, it can be appropriate to model the count variable using a different approach called Zero-Inflated Poisson/Negative Binomial. In this modeling approach, we actually assume there are two separate data generating processes: 1. DGP that describes why a person would be a 0 or > 0 on the count DV 2. DGP that describes why a person would be a count of 1, 2, 3, etc. on the count DV

You can think of it conceptually as two models. Model 1 is a binomial model predicting if a respondent is a 0 or not on the Count DV. Model 2 is a count model predicting the count of the DV for the people we predict are not a zero from the first stage.

In R, we will be estimating two separate models in one line of code. Because we already have decided to use the NB model on the count data, we will use the Zero Inflated NB model. You need to evaluate the variance/mean relationship first before deciding on using zero inflated model.

Also only use a zero inflated model if you have theory to suggest that the DGP is different between doing it at all and the count of doing it. For the example in this tutorial, we could argue something like the following.

Political ideology will directly influence whether a person chooses to consume right-wing opinion media. For people who do consume right-wing opinion media, being more conservative should be directly related to consuming a higher number of shows since it matches their political beliefs.

fox_zinb <- zeroinfl(fox_shows ~ libcon_f, data = anes)
summary(fox_zinb)

## 
## Call:
## zeroinfl(formula = fox_shows ~ libcon_f, data = anes)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7776 -0.4571 -0.1803 -0.1286 13.1113 
## 
## Count model coefficients (poisson with log link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -0.4428     0.1839  -2.407  0.01607 *  
## libcon_fExt Liberal       -4.6347     1.8122  -2.557  0.01054 *  
## libcon_fLiberal           -1.1986     1.0018  -1.196  0.23153    
## libcon_fSlightly Liberal  -0.2645     0.5161  -0.512  0.60830    
## libcon_fSlightly Conserv   0.3299     0.2345   1.407  0.15947    
## libcon_fConserv            0.4940     0.1961   2.520  0.01175 *  
## libcon_fExt Conserv        0.7919     0.2106   3.760  0.00017 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                1.2830     0.2175   5.898 3.68e-09 ***
## libcon_fExt Liberal       -5.0121    63.8681  -0.078   0.9374    
## libcon_fLiberal            0.9108     1.0736   0.848   0.3962    
## libcon_fSlightly Liberal   0.9664     0.5440   1.777   0.0756 .  
## libcon_fSlightly Conserv  -0.7348     0.3022  -2.432   0.0150 *  
## libcon_fConserv           -2.0007     0.2792  -7.165 7.76e-13 ***
## libcon_fExt Conserv       -1.8696     0.3278  -5.704 1.17e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 59 
## Log-likelihood: -1760 on 14 Df

all_zinb <- zeroinfl(all_rw_shows ~ libcon_f, data = anes)
summary(all_zinb)

## 
## Call:
## zeroinfl(formula = all_rw_shows ~ libcon_f, data = anes)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0353 -0.3669 -0.2193 -0.1228 11.8598 
## 
## Count model coefficients (poisson with log link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                0.3181     0.1008   3.155 0.001603 ** 
## libcon_fExt Liberal        0.4846     0.5325   0.910 0.362782    
## libcon_fLiberal           -1.7437     0.7012  -2.487 0.012892 *  
## libcon_fSlightly Liberal  -0.4112     0.3014  -1.364 0.172476    
## libcon_fSlightly Conserv   0.4486     0.1258   3.565 0.000364 ***
## libcon_fConserv            0.6136     0.1074   5.715 1.09e-08 ***
## libcon_fExt Conserv        0.9016     0.1156   7.797 6.34e-15 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                1.3554     0.1299  10.431  < 2e-16 ***
## libcon_fExt Liberal        2.7849     0.7389   3.769 0.000164 ***
## libcon_fLiberal            0.4337     0.7827   0.554 0.579498    
## libcon_fSlightly Liberal   0.8845     0.3300   2.680 0.007354 ** 
## libcon_fSlightly Conserv  -0.7200     0.1845  -3.903 9.48e-05 ***
## libcon_fConserv           -1.8679     0.1619 -11.539  < 2e-16 ***
## libcon_fExt Conserv       -2.0641     0.2222  -9.289  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 21 
## Log-likelihood: -2314 on 14 Df

stargazer(fox_zinb, all_zinb, type='text', digits=3)

## 
## =====================================================
##                              Dependent variable:     
##                          ----------------------------
##                            fox_shows    all_rw_shows 
##                               (1)            (2)     
## -----------------------------------------------------
## libcon_fExt Liberal         -4.635**        0.485    
##                             (1.812)        (0.532)   
##                                                      
## libcon_fLiberal              -1.199       -1.744**   
##                             (1.002)        (0.701)   
##                                                      
## libcon_fSlightly Liberal     -0.265        -0.411    
##                             (0.516)        (0.301)   
##                                                      
## libcon_fSlightly Conserv     0.330        0.449***   
##                             (0.235)        (0.126)   
##                                                      
## libcon_fConserv             0.494**       0.614***   
##                             (0.196)        (0.107)   
##                                                      
## libcon_fExt Conserv         0.792***      0.902***   
##                             (0.211)        (0.116)   
##                                                      
## Constant                    -0.443**      0.318***   
##                             (0.184)        (0.101)   
##                                                      
## -----------------------------------------------------
## Observations                 3,146          2,762    
## Log Likelihood             -1,760.496    -2,313.545  
## =====================================================
## Note:                     *p<0.1; **p<0.05; ***p<0.01

With the zero-inflated modeling approaches, we see two sets of results for each DV:

Count Model which is modeling the count variable for cases that are > 0 on the count DV
Binomial Logit model which is modeling if the respondent consumed any Fox or right-wing media shows or not (1= consumed any while 0 = consumed zero shows)

Above, we estimate the model and in this instance we estimate the same model for the count and the binomial portion of the zero-inflated approach. This does not have to be the case. Oftentimes, there is theoretical reason to think that an X influences one part of the data generating process but not the other. In that instance, the models would be different, which is perfectly acceptable.

We can change the variables included in the models using the coding approach below. In the zeroinfl code, we specify two separate models splitting them with the | operator. Here, we include political ideology, college education and biological sex as predictors for the count portion of the model while we only include political ideology in the binomial logit portion. This is because the code starts with the count model before the | and then moves to the binomial logit model past that operator. We only have libcon_f specified past the | operator so that will be the only IV in the binomial logit portion of the model.

# Estimate the zero-inflated negative binomial model with separate predictors
zinb_model <- zeroinfl(all_rw_shows ~ libcon_f + college + female | libcon_f + college , data = anes)

summary(zinb_model)

## 
## Call:
## zeroinfl(formula = all_rw_shows ~ libcon_f + college + female | libcon_f + 
##     college, data = anes)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0519 -0.3766 -0.2198 -0.1629 12.3646 
## 
## Count model coefficients (poisson with log link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               0.35934    0.10887   3.301 0.000965 ***
## libcon_fExt Liberal       0.52072    0.53244   0.978 0.328083    
## libcon_fLiberal          -1.79900    0.70799  -2.541 0.011054 *  
## libcon_fSlightly Liberal -0.45763    0.30427  -1.504 0.132573    
## libcon_fSlightly Conserv  0.44848    0.12777   3.510 0.000448 ***
## libcon_fConserv           0.61665    0.10932   5.641 1.69e-08 ***
## libcon_fExt Conserv       0.86424    0.11876   7.277 3.40e-13 ***
## college                   0.04087    0.05590   0.731 0.464668    
## female                   -0.15682    0.05528  -2.837 0.004558 ** 
## 
## Zero-inflation model coefficients (binomial with logit link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                1.3193     0.1424   9.262  < 2e-16 ***
## libcon_fExt Liberal        2.7691     0.7392   3.746  0.00018 ***
## libcon_fLiberal            0.3389     0.8082   0.419  0.67492    
## libcon_fSlightly Liberal   0.8172     0.3380   2.418  0.01562 *  
## libcon_fSlightly Conserv  -0.7264     0.1864  -3.897 9.75e-05 ***
## libcon_fConserv           -1.8522     0.1639 -11.302  < 2e-16 ***
## libcon_fExt Conserv       -2.0722     0.2291  -9.047  < 2e-16 ***
## college                    0.0487     0.1195   0.407  0.68366    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 23 
## Log-likelihood: -2273 on 17 Df

Let’s calculated the predicted counts from the basic ZINB model that only has political ideology included. Once again, we will use the ggpredict function. We can then easily graph the results using ggplot if we so choose.

# Summarize the model & calculate predicted counts
summary(fox_zinb)

## 
## Call:
## zeroinfl(formula = fox_shows ~ libcon_f, data = anes)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7776 -0.4571 -0.1803 -0.1286 13.1113 
## 
## Count model coefficients (poisson with log link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -0.4428     0.1839  -2.407  0.01607 *  
## libcon_fExt Liberal       -4.6347     1.8122  -2.557  0.01054 *  
## libcon_fLiberal           -1.1986     1.0018  -1.196  0.23153    
## libcon_fSlightly Liberal  -0.2645     0.5161  -0.512  0.60830    
## libcon_fSlightly Conserv   0.3299     0.2345   1.407  0.15947    
## libcon_fConserv            0.4940     0.1961   2.520  0.01175 *  
## libcon_fExt Conserv        0.7919     0.2106   3.760  0.00017 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                1.2830     0.2175   5.898 3.68e-09 ***
## libcon_fExt Liberal       -5.0121    63.8681  -0.078   0.9374    
## libcon_fLiberal            0.9108     1.0736   0.848   0.3962    
## libcon_fSlightly Liberal   0.9664     0.5440   1.777   0.0756 .  
## libcon_fSlightly Conserv  -0.7348     0.3022  -2.432   0.0150 *  
## libcon_fConserv           -2.0007     0.2792  -7.165 7.76e-13 ***
## libcon_fExt Conserv       -1.8696     0.3278  -5.704 1.17e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 59 
## Log-likelihood: -1760 on 14 Df

fz<-ggpredict(fox_zinb, terms="libcon_f")
print(fz)

## # Predicted counts of fox_shows
## 
## libcon_f         | Predicted |       95% CI
## -------------------------------------------
## Moderate         |      0.64 | [0.45, 0.92]
## Ext Liberal      |      0.01 | [0.00, 0.21]
## Liberal          |      0.19 | [0.03, 1.33]
## Slightly Liberal |      0.49 | [0.19, 1.27]
## Slightly Conserv |      0.89 | [0.67, 1.19]
## Conserv          |      1.05 | [0.92, 1.20]
## Ext Conserv      |      1.42 | [1.16, 1.73]

summary(all_zinb)

## 
## Call:
## zeroinfl(formula = all_rw_shows ~ libcon_f, data = anes)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0353 -0.3669 -0.2193 -0.1228 11.8598 
## 
## Count model coefficients (poisson with log link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                0.3181     0.1008   3.155 0.001603 ** 
## libcon_fExt Liberal        0.4846     0.5325   0.910 0.362782    
## libcon_fLiberal           -1.7437     0.7012  -2.487 0.012892 *  
## libcon_fSlightly Liberal  -0.4112     0.3014  -1.364 0.172476    
## libcon_fSlightly Conserv   0.4486     0.1258   3.565 0.000364 ***
## libcon_fConserv            0.6136     0.1074   5.715 1.09e-08 ***
## libcon_fExt Conserv        0.9016     0.1156   7.797 6.34e-15 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                1.3554     0.1299  10.431  < 2e-16 ***
## libcon_fExt Liberal        2.7849     0.7389   3.769 0.000164 ***
## libcon_fLiberal            0.4337     0.7827   0.554 0.579498    
## libcon_fSlightly Liberal   0.8845     0.3300   2.680 0.007354 ** 
## libcon_fSlightly Conserv  -0.7200     0.1845  -3.903 9.48e-05 ***
## libcon_fConserv           -1.8679     0.1619 -11.539  < 2e-16 ***
## libcon_fExt Conserv       -2.0641     0.2222  -9.289  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 21 
## Log-likelihood: -2314 on 14 Df

az<-ggpredict(all_zinb, terms="libcon_f")
print(az)

## # Predicted counts of all_rw_shows
## 
## libcon_f         | Predicted |       95% CI
## -------------------------------------------
## Moderate         |      1.37 | [1.13, 1.67]
## Ext Liberal      |      2.23 | [0.80, 6.22]
## Liberal          |      0.24 | [0.06, 0.94]
## Slightly Liberal |      0.91 | [0.52, 1.59]
## Slightly Conserv |      2.15 | [1.86, 2.50]
## Conserv          |      2.54 | [2.36, 2.73]
## Ext Conserv      |      3.39 | [3.03, 3.78]

Using the zero-inflated models, we calculated the predicted counts of right-wing media shows consumed. If you were paying close attention to the predicted values from the previous count modeling approaches, you might recognize that these predicted values are considerably higher than before. These values are closer to being the predicted number of right wing media shows consumed for people who consumed at least 1 show or not. While that is not entirely true since the math is a bit more complicated in the combination of the two models, but conceptually that is a decent way to think about these numbers.

Let’s look at the mean number of right wing shows consumed by ideology both overall and for respondents who consumed at least 1 right-wing show.

anes %>%
    group_by(libcon_f) %>%
  summarize(mean = mean(all_rw_shows, na.rm = TRUE))

anes %>%
  filter(all_rw_shows > 0) %>%
  group_by(libcon_f) %>%
  summarize(mean = mean(all_rw_shows))

These means largely track with the predicted count means from both the Negative-Binomial model (first set of means above) and the Zero-Inflated Negative-Binomial model (second set of means).