Poisson Regression and Negative Binomial Regression

Introduction

Businesses always want to maximize their proifts and diminish the risk of external factors that could impair their functional operations. Thus, implementing predictive models (regression or classfication) of quantitative dependent variables (or response variables). It is somehow natural to assume that you have known about the Bayesian Normal Linear regression model. The model itself spread its broad applicability in every domain. However, the normal assumption is ocassionally unsatisfactory in some cases and that ’s the reason we will expand the regression toolbox for you in the light of the evolution of data.

Our example for this post is slightly in the relation to politics. The United States are considering the impended Equality Act. The act would guarantee elementary LGBTQ+ rights at national level by forbidding discrimination in major disciplines such as education, employment, housing, if passed. As a result, each of the fifty individual states has to enact their own set of anti-discrimination laws, covering areas from anti-bullying to social wealthfare coverage (e.g., healthcare, insurance). We aim to capture relationship between the number of laws in a state and its exclusive demographic features and political environment. Our concentration on the first concern will restrict to the proportion of a state’s local metropolis. Related to the second matter, historical voting characteristics in presidential elections will be utilized, indicated the constistent preference of one state over the Democratic candidate, the Republican candidate, or a swing state that play a role as a flipped coin for candidates. It is worth noting that our analysis does not imply that the number of laws is the perfect substitution for the quality of a state’s law. It solely provide a initial point in understanding the discrepancies amongst states’ laws.

An Intriguing Example: Equality Movement

As discussed above, we wil introduce some notations for our case-study. Let $Y_{i}$ denote the number of anti-discrimination laws and explanatory variables $X{i1}$ denote the percentage of the state’s residents that reside in metropolitans, for each state $i \in \{1,2, \dots, 50\}$. Moreover, our historical political climate predictor variables are categorical with three levels: Democrat, Republican, and Swing. Thus, to set up an appropriate regression form without committing to the “dummy variable trap”, we will use one level of a categorical predictor to serve as a baseline/reference for our model. Other levels, Republican and Swing, will enter the model as indicator variables. That is, assumed that $X_{i2}$ and $X_{i3}$ are the former and the latter variables respectively

\[ X_{i 2}=\left\{\begin{array}{ll} 1 & \text { Republican } \\ 0 & \text { Otherwise } \end{array} \quad \text { and } \quad X_{i 3}= \begin{cases} 1 & \text { Swing } \\ 0 & \text { Otherwise }\end{cases}\right. \] Supposed that we only know the Normal regression, our first approach to understanding the relationship between our quantitative response variable $Y$ and our predictors $X$ might be to build a regression model with a Normal data structure: $Y_{i} \mid \beta_{0}, \beta_{1}, \beta_{2}, \beta_{3}, \sigma \stackrel{i n d}{\sim} N\left(\mu_{i}, \sigma^{2}\right) \quad$ with $\quad \mu_{i}=\beta_{0}+\beta_{1} X_{i 1}+\beta_{2} X_{i 2}+\beta_{3} X_{i 3}$

Now it’s time to incorporate our understanding or belief into the modeling. It is understandable that a state that is emblematic of its urban residents and historical voting patterns has approximately 7 laws;, thereby we will set a $N\left(7, 1.5^2\right)$ prior for the intercept $\beta_{i0}$(which measures the average effect without other factors). Concerning the other three parameters, we will utilize weakly informative priors since we do not have any certain belief/information.

In this example, we will use the equality_index dataset from bayesrules package. The dataset which is com is a part of a “State Equality Index”, released by Human Rights Campaign which monitors the number of LQBTQ+ rights laws in each state. This dataset is compiled by Sarah Warbelow, Courtnay Avant, and Colin Kutney (2019). Loading necessary pacakges into our Rstudio !!!

# Load packages
library(bayesrules)
library(rstanarm)
library(bayesplot)
library(tidyverse)
library(tidybayes)
library(broom.mixed)
library(patchwork)
library(tidymodels)
library(glmnet)

# Load data
data(equality_index)
equality <- equality_index

We will extract some insightful understandings that is the ingredients for the model. The histogram below indicates that the number of laws ranges from as low as 1 to as high as 155, yet the majority of states have fewer than ten laws:

# Number of laws distribution 
ggplot(equality, aes(x = laws)) + 
  geom_histogram(color = "darkorange", breaks = seq(0, 160, by = 10))

$\label{hist} A histogram of the number of laws in each of 50 states$

A histogram of the number of laws in each of 50 states

Next, examinining states that have over 10 laws, we notice that while other states seem not to variable in number, California is the only state that has over 150 laws (155 laws).

# All states that have num laws > 10
p1 <- equality %>%
  filter(laws >= 10) %>%
  select(state, laws) %>%
  ggplot(aes(x = state, y = laws)) + geom_bar(stat = "identity", color = "darkorange") + 
  labs(title = "All states with 10+ number of laws") + coord_flip()

# All states that have num laws < 10
p2 <- equality %>%
  filter(laws < 10) %>%
  select(state, laws) %>%
  ggplot(aes(x = state, y = laws)) + geom_bar(stat = "identity", color = "darkorange") +  labs(title = "All states with number of laws less than 10") + coord_flip()

p1 + p2 + plot_layout(ncol = 2)

Since it is a clear outlier, we will remove this state from our analysis.

equality %>%
  filter(laws == max(laws))

## # A tibble: 1 x 6
##   state      region gop_2016  laws historical percent_urban
##   <fct>      <fct>     <dbl> <dbl> <fct>              <dbl>
## 1 california west       31.6   155 dem                   95

# remove outliers (California)
equality <- equality %>%
  filter(state != "california")

Next, in a scatterplot of the number of state laws versus its percent_urban population and historical voting patterns, notice that historically dem states and states with greater urban populations have an inclination to adopt more LGBTQ+ anti-discrimination laws in place:

ggplot(equality, aes(y = laws, x = percent_urban, color = historical)) + 
  geom_point(size = 4.5)

$\label{scatterplot} A scatterplot of the number of anti-discrimination laws in a state versus its urban population percentage and historical voting trends.$

A scatterplot of the number of anti-discrimination laws in a state versus its urban population percentage and historical voting trends.

At this step, let’s jump into our model. Utilizing stan_glm, we consolidate this data with our weakly prior understanding to simulate the posterior Normal regression model of laws by percent_urban and historical voting trends. Quickly examning the posterior predictive of our simulated model, the comparison of the histogram between the observed anti-discrimination laws and five simulated dataset based on the posterior is provided (see ). The use of histogram is justified by the nature of our data characteristics, which is a non-negative count rather than a continuous-valued variable). It is obvious that the results are not quite satisfactory - the posterior predictions do not align well with the features of the observed data. This may not be so astounding. Taking a closer look at the observed values of our response variable, they are clearly right-skewed, which contradict with the rough symmetry of the simulated posterior Normal regression. Additionally, these simulated datasets also produce negative predictions, which is impossible since the number of laws could not be less than zero.

## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.002 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 20 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.247 seconds (Warm-up)
## Chain 1:                0.286 seconds (Sampling)
## Chain 1:                0.533 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.239 seconds (Warm-up)
## Chain 2:                0.265 seconds (Sampling)
## Chain 2:                0.504 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.272 seconds (Warm-up)
## Chain 3:                0.276 seconds (Sampling)
## Chain 3:                0.548 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.262 seconds (Warm-up)
## Chain 4:                0.329 seconds (Sampling)
## Chain 4:                0.591 seconds (Total)
## Chain 4: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 5).
## Chain 5: 
## Chain 5: Gradient evaluation took 0.001 seconds
## Chain 5: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 5: Adjust your expectations accordingly!
## Chain 5: 
## Chain 5: 
## Chain 5: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 5: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 5: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 5: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 5: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 5: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 5: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 5: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 5: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 5: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 5: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 5: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 5: 
## Chain 5:  Elapsed Time: 0.256 seconds (Warm-up)
## Chain 5:                0.319 seconds (Sampling)
## Chain 5:                0.575 seconds (Total)
## Chain 5:

$\label{normal} A posterior predictive check of the Normal regression model of anti-discrimination laws. A histogram of the observed laws (y) is plotted alongside five posterior simulated datasets ($y_{rep}$)$

A posterior predictive check of the Normal regression model of anti-discrimination laws. A histogram of the observed laws (y) is plotted alongside five posterior simulated datasets ($y_{rep}$)

The example apparently reveals the hurdles of the normal regression. Fortunately, we have all the resources to handle with such problems when the data does not match well with the Normal distribution. That is, our goal in this post is to > Extend the Normal regression model of quantitative response variable $Y$ to the situation in which $Y$ is a coutn variable whose dependence on exlanatory variables $X$ is better captured by the Poisson or Negative Binomial models >

Poisson Regression Model

Poisson model is appropirate for modeling discrete count of events (in our example the event of interest is anti-dicrimation laws enaction) that occurs in a fixed interval of space and time and theoretically, have no upper bound. The Poisson is especially handy in cases like ours in which counts are right-skewed. Moving forward, assume a Poisson model for the number of LGBT+ anti-discrimination laws in each state $i$ ($Y_{i]$) where the number of anti-discrimination laws $\lambda_{i}$ could be thought as a rate and depends upon the demographic patterns and voting trends $\left(X_{i1}, X_{i2}, X_{i3}\right)$ respectively. \[ Y_{i} \mid \lambda_{i} \stackrel{i n d}{\sim} Pois\left(\lambda_{i}\right). \]

Under the Poisson structure, the expected number of laws $Y_{i}$ in states with identical predictor values $X$ is captured by $\lambda_{i}$ \[ E\left(Y_{i} \mid \lambda_{i}\right) = \lambda_{i} \]

If we proceeded as in Normal regression, we might assume that the average number of laws is a linear combination of our predictors, $\lambda_{i} = \beta_0 + \beta_{i1} X_{i1} + \beta_{i2} X_{i2} + \beta_{i3} X_{i3}$. This assumption is depicted by figure below, one by one on historical voting trend

$\label{incorrect} A graphical depiction of the (incorrect) assumption that, no matter the historical politics, the typical number of state laws is linearly related to urban population. The dashed horizontal line represents the x-axis.$

A graphical depiction of the (incorrect) assumption that, no matter the historical politics, the typical number of state laws is linearly related to urban population. The dashed horizontal line represents the x-axis.

As we assume that $\lambda_{i}$ could be expressed by a linear combination of the $X$ predictors, the model of $\lambda_{i}$ spans both positive and negative values, thus indicates that some states have a negative number of anti-discrimination laws, which does not appropriate. Similar to the number of laws, a Poisson rate $\lambda_{i}$ must be positive. Thus, to overcome such limitation, we will use a function which is a mapping from the whole line domain to the positive domain. The common choice would be the log link function. That is, we will assume that $\log(\lambda_{i})$, which extend both positive and negative values, is a linear combination of the $X$ predictors: \[ Y_{i} \mid \beta_0, \beta_1, \beta_2, \beta_3 \stackrel{ind} \sim Pois(\lambda_{i}) \quad \text{ with } \quad \log(\lambda_{i}) = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} \] At the exchange for such suitable representation, the interpretation of the logged number of laws is not so straightforward. Instead, we may always transform the model relationship off the log scale by relying on the fact that if $log(\lambda) =a$ , then $\lambda = e^{a}$ for natural number $e$: \[ Y_{i} \mid \beta_0, \beta_1, \beta_2, \beta_3 \stackrel{ind} \sim Pois(\lambda_{i}) \quad \text{ with } \quad \lambda_{i} = e^{\beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2}} \] Figure below presents a prior plausible outcome for this model on both the $\log(\lambda)$ and $\lambda$ scales. In both cases there are three curves, one per historical voting category. On the $\log(\lambda)$ scale, these curves are linear. Yet when transformed to the $\lambda$ or (unlogged) laws scale, these curves are nonlinear and restricted to be at or above 0. This was the point – we want our model to preserve the fact that a state can’t have a negative number of laws.

$\label{prioroutcome}An example relationship between the logged number of laws (left) or number of laws (right) in a state with its urban percentage and historical voting trends.$

An example relationship between the logged number of laws (left) or number of laws (right) in a state with its urban percentage and historical voting trends.

The model curves on both the $\log(\lambda)$ and $\lambda$ scales are defined by the set of $\beta$ parameters. When depicting the linear model of the logged number of laws in a state, these parameters take on the usual meanings related to intercepts and slopes. However, $\left(\beta_0, \beta_1, \beta_2, \beta_3\right)$ suggest new meanings when describing the nonlinear model that underlies the unlogged number of laws in a state.

Interpreting Poisson regression coefficients

Consider two equivalent formulations of the Poisson regression models: $Y \mid \beta_0, \beta_1, \dots, \beta_p \sim \operatorname{Pois}(\lambda)$ with \[\begin{align} \log(\lambda) &= \beta_0 + \beta_1X_1 + \dots + \beta_p X_{p} \\ \lambda & = e^{\beta_0 + \beta_1X_1 + \dots + \beta_p X_{p}} \end{align}\]

Interpreting $\beta_0$

When $X_1, X_2, \dots, X_{p}$ are all $0$, $\beta_0$ is the logged average $Y$ value and $e^{\beta_0}$ is the average $Y$ value.

Interpreting $\beta_i$, for $i = 1,2, \dots, p$

Supposed that $\lambda_{x}$ represent the average value $Y$ when $X_1 = x$ and $\lambda_{x+1}$ represent the average $Y$ value when $X_1 = x +1$. In case we control all other predictors $X_2, \dots, X_{p}$ and increase $X_1$ by $1$, from $x$ to $x+1$, then $\beta_1$ is the change in the logged average $Y$ value and $e^{\beta_1}$ is the multiplicative change in the (unlogged) average $Y$ value. That is, \[ \beta_1 = \log(\lambda_{x+1}) - log(\lambda_{x}) \quad \text{ and } \quad e^{\beta_1} = \frac{\lambda_{x+1}}{\lambda_{x}}. \]

The conclusion to this section will recapitulate the assumptions for the Poisson regression model

Consider the Bayesian Poisson regression model with $Y_{i} \mid \beta_0, \beta_1, \dots, \beta_p \stackrel{ind} \sim \operatorname{Pois} (\lambda_{i})$ where

\[ \log(\lambda_i) = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \dots + \beta_p X_{ip}. \] then the model should satisfy the following assumptions.

Structure of the data
Conditioned on predictors $X$, the observed data $Y_{i}$ on case $i$ is independent of the observed data on any other case $j$
Structure of variable $Y$
Response variable $Y$ has a Poisson structure, i.e., is a discrete count of evens that happen in a fixed interval of space or time
Structure of the relationship
The logged average value $Y$ can be written as a linear combination of the predictors, \[ \log(\lambda_i) = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \beta_3 X_{i3}. \]
Structure of the variability in Y
Poisson random variable $Y$ with rate $\lambda$ is equi-dispersed, i.e., its mean and variance is equal, $E(Y) = \operatorname{Var}(Y) = \lambda$. Hence, conditioned on predictors $X$, the typical value of $Y$ should be approximately equivalent to the variability in $Y$. s such, the variability in Y increases as its mean increases.

Model Components and Specification

To conduct a Bayesian analysis, prior or our belief/understanding of the unknown parameters of interest must be expressed (in our case here is the regression coefficients $\beta_0, \beta_1, \beta_2, \beta_3$). As these coefficients could take on any value on the real line, it is reasonable to assume the Normal priors. Moreover, we further impose the independency among these priors on our hierarchy of the model. Then, the complete representation of our Poisson regression model of $Y_i$ would be

data: $\quad Y_{i} \mid \beta_{0}, \beta_{1}, \beta_{2}, \beta_{3} \stackrel{\text { ind }}{\sim} \operatorname{Pois}\left(\lambda_{i}\right)$

priors: \[ \begin{aligned} \beta_{0 c} & \sim N\left(2,0.5^{2}\right) \\ \beta_{1} & \sim N\left(0,0.17^{2}\right) \\ \beta_{2} & \sim N\left(0,4.97^{2}\right) \\ \beta_{3} & \sim N\left(0,5.60^{2}\right) \end{aligned} \] The first thing to consider is the centered intercept $\beta_{0c}$. Recall our prior assumption that the average number of laws in “typical” states is around $\lambda = 7$ . Hence, we set the Normal Prior mean for $\beta_{0c}$ to 2 on the log scale: \[ log(\lambda) = log(7) \approx 1.95 \] Additionally, the range of this Normal prior implies our relative uncertainty about this reference. Though the logged average number of laws is most likely around $2$, we ponder that it may range from $1$ to $3$ $(2 \pm 2 \times 0.5)$. Or, more intuitively, we think that the average number of laws in typical states might be somewhere between $3$ and $20$: \[ \left(e^1, e^3\right) \approx (3,20) \] Beyond this baseline, we again used weakly informative default priors for $\left(\beta_1, \beta_2, \beta_3\right)$, tuned by stan_glm() function. Being centered at zero with relatively large standard deviation on the scale of our variables, these priors reflect a general uncertainty about whether and how the number of anti-discrimination laws is associated with a state’s urban population and voting trends.

To examine whether these combined priors accurately reflect our uncertain understanding of state laws, we’ll simulate 20,000 draws from the prior models of $\{\beta_{k}\}_{k=0}^{3}$. Moreover, we will run the same stan_glm() function that we use to simulate the posterior with two new arguments: prior_PD = TRUE specifies that we wish to simulate the prior, and the family = poisson suggests that we are using a Poisson model (in contrast with Normal or gaussian).

equality_model_prior <- stan_glm(laws ~ percent_urban + historical, 
                                 data = equality, 
                                 family = poisson,
                                 prior_intercept = normal(2, 0.5),
                                 prior = normal(0, 2.5, autoscale = TRUE), 
                                 chains = 4, iter = 5000*2, seed = 84735, 
                                 prior_PD = TRUE)

## 
## SAMPLING FOR MODEL 'count' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.122 seconds (Warm-up)
## Chain 1:                0.137 seconds (Sampling)
## Chain 1:                0.259 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'count' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.122 seconds (Warm-up)
## Chain 2:                0.132 seconds (Sampling)
## Chain 2:                0.254 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'count' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.124 seconds (Warm-up)
## Chain 3:                0.139 seconds (Sampling)
## Chain 3:                0.263 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'count' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.121 seconds (Warm-up)
## Chain 4:                0.131 seconds (Sampling)
## Chain 4:                0.252 seconds (Total)
## Chain 4:

A call to prior_summary confirms the specification of our weakly informative priors above:

prior_summary(equality_model_prior)

## Priors for model 'equality_model_prior' 
## ------
## Intercept (after predictors centered)
##  ~ normal(location = 2, scale = 0.5)
## 
## Coefficients
##   Specified prior:
##     ~ normal(location = [0,0,0], scale = [2.5,2.5,2.5])
##   Adjusted prior:
##     ~ normal(location = [0,0,0], scale = [0.17,4.97,5.60])
## ------
## See help('prior_summary.stanreg') for more details

Next, we plot 100 prior plausible models, $e^{\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}$. The mess of curves here certainly reflects our prior uncertainty! These are all over the map, indicating that the number of laws might increase or decrease with urban population and might or might not differ by historical voting trends.

## Warning: `fitted_draws` and `add_fitted_draws` are deprecated as their names were confusing.
## Use [add_]epred_draws() to get the expectation of the posterior predictive.
## Use [add_]linpred_draws() to get the distribution of the linear predictor.
## For example, you used [add_]fitted_draws(..., scale = "response"), which
## means you most likely want [add_]epred_draws(...).

## Warning: Removed 1273 row(s) containing missing values (geom_path).

$\label{priorplot} 100 prior plausible models of state anti-discrimination laws$

100 prior plausible models of state anti-discrimination laws

Posterior Simulation

Given the prior and the likelihood (our belief and the evidence collected from the data), we are ready to simulate the Poisson posterior regression model of anti-discrimination laws versus urban population and voting trends. Instead of starting from scratch with the stan_glm() function, we will take advantage of update() function together with the equality_model_prior simulation, implying prior_PD = FALSE(i.e., we wish to simulate the posterior, not the prior). Hence, the result would be the same argument as we simulate the prior above, but the prior_PD argument will change into FALSE.

equality_model <- update(equality_model_prior, prior_PD = FALSE)

## 
## SAMPLING FOR MODEL 'count' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.002 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 20 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.495 seconds (Warm-up)
## Chain 1:                0.63 seconds (Sampling)
## Chain 1:                1.125 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'count' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.497 seconds (Warm-up)
## Chain 2:                0.557 seconds (Sampling)
## Chain 2:                1.054 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'count' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.479 seconds (Warm-up)
## Chain 3:                0.554 seconds (Sampling)
## Chain 3:                1.033 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'count' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.537 seconds (Warm-up)
## Chain 4:                0.566 seconds (Sampling)
## Chain 4:                1.103 seconds (Total)
## Chain 4:

Furthermore, we could examine on the trace plot resulting from MCMC simulating process. MCMC trace, density, and autocorrelation plots confirm that our simulation has stabilized:

mcmc_trace(equality_model)

mcmc_dens_overlay(equality_model)

mcmc_acf(equality_model)

Before diving deeper into our analysis of these simulation results, a quick posterior predictive check confirms that we are doing the right things. First, histograms of just five posterior simulations of state law data exhibit similar skew, range, and trends as the observed law data. Second, though density plots aren’t the best display of count data. Even though these simulations aren’t perfect, but they do reasonably capture the features of the observed law data.

set.seed(1)
pp_check(equality_model, plotfun = "hist", nreps = 5) + 
  xlab("laws")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

$\label{posterior}A posterior predictive check of the Poisson regression model of anti-discrimination laws compares the observed laws $(y)$ to five posterior simulated datasets $(y_{rep})$ via histograms (left) and to 50 posterior simulated datasets via density plots (right).$

A posterior predictive check of the Poisson regression model of anti-discrimination laws compares the observed laws $(y)$ to five posterior simulated datasets $(y_{rep})$ via histograms (left) and to 50 posterior simulated datasets via density plots (right).

pp_check(equality_model) + 
  xlab("laws")

TO BE CONTINUED !!!