Censoring, Truncation, and Panels

Christopher Weber

2025-11-17

Introduction

  • The Poisson Regression Model (PRM)
  • A strong assumption: \(E(\mu)=var(y)\)
  • Overdispersion
  • The negative binomial
  • Zero counts
  • Truncated regression
  • Zero inflation and hurdle models

Introduction

  • Truncation means data that fall above (or below) are at some particular value(s) are ignored, they are missing

  • E.g., The impact of ideology on dollars spent during an election cycle, among general election candidates

  • Truncation at zero, for instance

  • Censoring means data that fall above (or below) are at some particular value(s) are scored at a threshold value

  • The standard approach of estimating a PRM or ZINB regression is incorrect

  • For truncation, the wrong PDF is used

  • For censoring, the expected value is biased

For instance

  • A zero count. The probability of a zero count under the PRM is not zero if \(x\) has an effect on \(y\), \[p(y_i=0|x_i)=exp(-\mu_i)\]
  • A nonzero count, \[p(y_i>0|x_i)=1-exp(-\mu_i)\].
  • And, a Poisson distribution where \(p(y |y > 0)\) \[p(y|x)={{exp(-\mu_i)\mu_i^{y_i}}\over{y_i!}(1-exp(-\mu_i))}\]

When Zeroes are Observed

  • Imagine you are completing a project on casualties in military conflict
  • Your data is panel data, which includes a large sample of countries over a many years
  • You have a dataset with a lot of zeros
  • What is a zero and why do we observe it?

When Zeros are Observed

  • Imagine you are completing a project on casualties in military conflict
  • Your data is panel data, which includes a large sample of countries over a many years
  • You have a dataset with a lot of zeros
  • What is a zero and why do we observe it?
  • Superior defenses, nature of war, no boots on the ground, and/or no conflict

Mixed Processes

\[ Pr(y = 0 |p, \lambda) = Pr(\text{Not at War}) + \\Pr(\text{At War}) \times Pr(y = 0 | \lambda) \]

  • The probability of observing zero casualties is a function of two processes:
    1. The probability of not being at war (structural zero)
    2. The probability of being at war and observing zero casualties (sampling zero)

Zero Generating

  • Imagine a two-stage process, a zero-stage and a count-stage. This allows us to model excess zeros, in that we can consider the probability of a count, weighted by the likelihood of being in the count stage. It’s useful to think of this as a hurdle process.

Zero stage

  • \(\theta_i\) the probability that \(y=0\) and \(1-\theta_i\) is the probability that \(y>0\)
  • Model 0/1 using a logit or probit regression \[\theta_i=F(z_i\gamma)\]

Count Generating

Count Stage

\[pr(y_i=0|x_i)=\theta_i+(1-\theta_i)exp(\mu_i)\]

Note

  • The zero count is a composite of \(\theta_i\), from zero process (e.g., lack of conflict), as well as the count process itself, \[(1-\theta_i)exp(\mu_i)\]

  • Non zero values are,

\[pr(y_i|x_i)=(1-\theta_i){{exp(\mu_i)\mu_i^{y_i}}\over{y_i!}}\]

Dual Processes

  • The count process weighted by the probability of a non-zero. \[ L(\pi, \mu \mid y) = \prod_{i=1}^{n} \left[ \theta_i \mathbb{I}(y_i = 0) + (1 - \theta_i) {{exp(-\mu_i)\mu_i^{y_i}}\over{y_i!}} \right] \]
  • \(\theta_i\) is the probability of a zero for the ith observation.
  • \(\mathbb{I}(y_i = 0)\) is called indicator function; it’s just a binary indicator that equals 1 when \(y_i = 0\), and 0 otherwise
  • So, when \(y_i = 0\), the contribution to the likelihood is \(\theta_i + (1-\theta_i)pr(\text{Poisson}=0)\)
  • But when \(y_i > 0\) the contribution to the likelihood is \((1-\theta_i)pr(\text{Poisson}=y_i)\)

\[\log L(\pi, \mu \mid y) = \sum_{i=1}^{n} \log \left[ \theta_i \mathbb{I}(y_i = 0) + (1 - \theta_i) {{exp(-\mu_i)\mu_i^{y_i}}\over{y_i!}} \right]\]

  • Each data row’s contribution to the (log) likelihood is a mixture of the zero process and the count process.

The Hurdle Model

  • The zero attenuated regression model

  • Predict a zero count

\[\theta_i=F(z_i\gamma)\]

  • Model the non zero equation with a truncated poisson (or negative binomial)

    \[pr(y_i|x_i)=(1-\theta_i){{exp(\mu_i)\mu_i^{y_i}}\over{y_i!}(1-exp(\mu_i))}\]

Zero Inflation

Zero Counts

  • Zero counts are common in count data
  • They may arise from a poisson or negative binomial process
  • Or they may be observed for entirely separate reasons
  • Theory should guide the decision to model zero counts

Multilevel Structures and the Panel Design

  • Panel data are common in political science

  • Unlike cross sectional data, units are repeatedly observed

  • Some designs are “cross sectionally” dominant, others are “time series” dominant

  • The Time-Series Cross-Section (TSCS) design

\[y_{it}=\beta_0+\beta_1 x_{it}+e_{it}\]

Fixed Effects

\[ y_{it}=\beta_0+\beta_1 x_{it}+e_{it}\]

  • \(y\) is the observation for the ith unit at time t

  • If there is a lot of variation across units, we should account for this variation.

Fixed Effects

  • The intercepts vary across units (e.g, countries, states, individuals)

\[ y_{it}=\beta_0+\beta_1 x_{it}+\sum_t^{N-1} \gamma_{i} d_{i}+ e_{it}\]

  • \(d_i\) denotes a dummy variable for the “unit”

  • This is the fixed effects estimator, and it captures the extent to which heterogeneity in the “units” – the \(i\) intercepts – influence \(y\) alongside \(x\).

  • The fixed effects estimator is also called the least squares dummy variable (LSDV) estimator (Hsiao 2022), and the within effects estimator.

  • An equivalent approach is to remove the unit means from \(y\).

\[ y_{it} - \bar{y_i}=\beta_{0}+\beta_1 x_{it}+ e_{it}\]

Fixed Effects and Lags

  • The panel data is often leveraged to examine over time changes, i.e., autoregressive effects.

  • This model includes lagged dependent variables as predictors

  • If there is substantial heterogeneity across units, we should account for this variation.

  • Ignoring it will bias parameter estimates.

Fixed Effects and Lags

\[ \begin{matrix} y_{it} & = \alpha_{i} + \beta_{1}y_{it-1} + e_{it}\\ e_{it} & = s_i + u_{it}\\ \end{matrix} \]

In this model, \(\widehat\beta\) corresponds to

\[ \begin{matrix} \widehat\beta_y &=& {cov(y_{it}, y_{it-1}) \over var(y_{it-1})}\\ &=& {cov(\beta y_{it-1} + s_i + u_{y,it}, y_{it-1}) \over var(y_{it-1})}\\ &=& \frac{cov(\beta_y y_{it-1}, y_{it-1}) + cov(s_i, y_{it-1}) + cov(u_{y,it}, y_{it-1})}{var(y_{it-1})}\\ (\text{Exogeneity}) &=& \frac{\beta_y \cdot var(y_{it-1}) + cov(s_i, y_{it-1})}{var(y_{it-1})}\\ &=& \beta_y + \frac{cov(s_i, y_{it-1})}{var(y_{it-1})}\\ \end{matrix} \]

Fixed Effects and Lags

\[ \begin{matrix} \widehat\beta_y &=& {cov(y_{it}, y_{it-1}) \over var(y_{it-1})}\\ &=& {cov(\beta y_{it-1} + s_i + u_{y,it}, y_{it-1}) \over var(y_{it-1})}\\ \end{matrix} \]

  • The result is even more general in that the bias will exist for \(x\) variables that are correlated with the unit effects, \(s_i\).

  • What are “unit effects”? They are just the unit means averaged over time (e.g., country means)

  • Important Note: The fixed effects estimator does not account for time varying unobserved heterogeneity. It only accounts for time invariant unobserved heterogeneity.

  • Another Important Note: The fixed effects estimator with a lagged dependent variable – the dynamic panel design – produces biased estimates of the lagged dependent variable coefficient (Nickell 1981). The bias decreases as \(T\) increases.

Random Effects

  • An alternative to the fixed effects estimator is the random effects estimator

  • In the random effects model, the intercepts are drawn from a probability density, rather than estimating \(J-1\) dummy variables (i.e, unit averages)

  • The nested logic is the same; the unit of observation is the ith observation nested within the jth higher level unit (e.g, country-time nested in countries)

The Random Intercept Model

Level 1 (within-group): \[ y_{ij} = \beta_{0t} + \beta_1 x_{it} + \epsilon_{it} \] Level 2 (between-group): \[ \beta_{0t} = \gamma_{0} + u_{0t} \] where:

\[ \begin{align} \epsilon_{ij} &\sim N(0, \sigma^2) \\ u_{0j} &\sim N(0, \tau_{0}) \end{align} \]

Reduced form: \[ y_{ij} = \gamma_{0} + \beta_1 x_{ij} + (u_{0j} + \epsilon_{ij}) \quad \text{(composite error)} \]

The Random Intercept Model

  • The random intercept model captures variation at two levels: within groups (level 1) and between groups (level 2)

  • It’s an Analysis of Variance (ANOVA), partitioning between unit and within unit variation

  • The intraclass correlation coefficient (ICC) measures the proportion of variance at the group level

The Random Intercept Model

\[ y_{it} = \gamma_{0} + \beta_1 x_{it} + (u_{0t} + \epsilon_{it}) \quad \text{(composite error)} \]

Variance components:

\[ \begin{aligned} u_{0t} &\sim N(0, \tau_{0}) \quad \text{(between-group variance)} \\ \epsilon_{it} &\sim N(0, \sigma^2) \quad \text{(within-group variance)} \end{aligned} \]

Total variance: \[ \text{Var}(y_{it}) = \sigma^2 + \tau_{0} = var(\text{Within}) + var(\text{Between}) \]

Intraclass correlation:

\[ \rho = \frac{\tau_{0}}{\tau_{0} + \sigma^2} \]

Adding Predictors

  • Predictors can be added at both levels of the model
  • Differentiate time variant (\(x_{it}\)) and time invariant (\(x_{t}\)) predictors

\[\begin{eqnarray} y_{it}=b_{0,i}+b_{1} x_{it}+e_{1,it}\\ b_{0,i}=\omega_0+\omega_1 x_{i}+e_{2,i}\\ e_{1,} \sim N(0, \sigma_1^2)\\ e_{2,it} \sim N(0, \sigma_2^2) \end{eqnarray}\]

  • \(x_{it}\) consist of variables that vary between units and waves; \(x_{i}\) consists of variables that only vary between units.

Adding Random Coefficients

  • Predictors can be added at both levels of the model
  • Differentiate time variant (\(x_{it}\)) and time invariant (\(x_{t}\)) predictors**

\[\begin{eqnarray} y_{it}=b_{0,i}+b_{1,i} x_{it}+e_{1,it}\\ b_{0,i}=\omega_0+\omega_1 x_{i}+e_{2,i}\\ b_{1,i}=\omega_0+\omega_1 x_{i}+e_{3,i}\\ e_{1,it} \sim N(0, \sigma_1^2)\\ e_{2,i} \sim N(0, \sigma_2^2)\\ e_{3,i} \sim N(0, \sigma_3^2) \end{eqnarray}\]

  • \(x_{it}\) consist of variables that vary between units and waves; \(x_{i}\) consists of variables that only vary between units.

Variance components:

\[ \begin{aligned} e_{1,it} &\sim N(0, \sigma_1^2) \quad \text{(level-1 residual variance)} \\ e_{2,i} &\sim N(0, \sigma_2^2) \quad \text{(random intercept variance)} \\ e_{3,i} &\sim N(0, \sigma_3^2) \quad \text{(random slope variance)} \end{aligned} \]

Pooling: A Continuum

  • Let’s situate the random intercepts/coefficients in a broader structure.
  • No pooling model. This is the fixed effects model above, in which each level-2 unit has a unique mean value.
  • Complete pooling. This is the regression model with no level 2 estimated means. Instead, we assume the level-2 units completely pool around a common intercept (and perhaps slope). Formally, compare

\[y_{j,i}=\beta_0+\sum_j^{J-1} \gamma_{j} d_j+ e_{j,i}\]

\[y_{j,i}=\beta_0+ e_{j,i}\]

Partial Pooling

\[\tiny \begin{eqnarray} y_{j[i]}=b_{0,j[i]}+e_{1,i}\\ \end{eqnarray}\]

\[\tiny \begin{eqnarray} b_{0,j}={{y_j\times n_j/\sigma^2_y+y_{all}\times 1/\sigma^2_{b_0}}\over{n_j/\sigma^2_y+ 1/\sigma^2_{b_0}}}\end{eqnarray}\]

  • The first part of the numerator represents the movement away from a common mean. Note that as \(n_j\) increases (the group size), the estimate is pulled further from the common mean (which of course is what’s on the right in the numerator).

  • As \(n_j\) increases, the estimate of the estimated mean is influenced more by the group than a common mean.

  • As \(n_j\) decreases – so small groups – the formula now allows for a stronger likelihood that the estimates pools around a single value.

Partial Pooling

\[ \begin{eqnarray} b_{0,j}={{y_j\times n_j/\sigma^2_y+y_{all}\times 1/\sigma^2_{b_0}}\over{n_j/\sigma^2_y+ 1/\sigma^2_{b_0}}}\end{eqnarray}\]

  • As the within group variance increases, the group mean is pullled towards the pooled mean

  • As the between group variance increases, the common mean exerts a smaller impact

  • The values in the numerator are then weighted by the variation between and within level-2 units

Partial Pooling

  • The Intra-class correlation(ICC)

\[ICC=\sigma^2_{b_0}/[\sigma^2_{b_0}+\sigma^2_{y}]\]

Recall,

\[\sigma^2_{all}=\sigma^2_{b_0}+\sigma^2_{y}\]

  • Thus, the estimate is an estimate of how much of the total variation in \(y\) is a function of variation between level-2 units, relative to within level-1 units

Some Practical Advice

  • The ICC should decrease as you include level-2 predictors; compare to a model without predictors

  • Interpretation of the level-2 expected values (i.e., the group means) is based on a compromise between the pooled and no pooling models

  • If we estimate a regression model with a dummy for every level-2 unit and predictors, the model is not identified because the variables will be collinear (Gelman and Hill 2009, 269)