2025-11-17
Truncation means data that fall above (or below) are at some particular value(s) are ignored, they are missing
E.g., The impact of ideology on dollars spent during an election cycle, among general election candidates
Truncation at zero, for instance
Censoring means data that fall above (or below) are at some particular value(s) are scored at a threshold value
The standard approach of estimating a PRM or ZINB regression is incorrect
For truncation, the wrong PDF is used
For censoring, the expected value is biased
\[ Pr(y = 0 |p, \lambda) = Pr(\text{Not at War}) + \\Pr(\text{At War}) \times Pr(y = 0 | \lambda) \]
Zero stage
Count Stage
\[pr(y_i=0|x_i)=\theta_i+(1-\theta_i)exp(\mu_i)\]
Note
The zero count is a composite of \(\theta_i\), from zero process (e.g., lack of conflict), as well as the count process itself, \[(1-\theta_i)exp(\mu_i)\]
Non zero values are,
\[pr(y_i|x_i)=(1-\theta_i){{exp(\mu_i)\mu_i^{y_i}}\over{y_i!}}\]
\[\log L(\pi, \mu \mid y) = \sum_{i=1}^{n} \log \left[ \theta_i \mathbb{I}(y_i = 0) + (1 - \theta_i) {{exp(-\mu_i)\mu_i^{y_i}}\over{y_i!}} \right]\]
The zero attenuated regression model
Predict a zero count
\[\theta_i=F(z_i\gamma)\]
Model the non zero equation with a truncated poisson (or negative binomial)
\[pr(y_i|x_i)=(1-\theta_i){{exp(\mu_i)\mu_i^{y_i}}\over{y_i!}(1-exp(\mu_i))}\]
Panel data are common in political science
Unlike cross sectional data, units are repeatedly observed
Some designs are “cross sectionally” dominant, others are “time series” dominant
The Time-Series Cross-Section (TSCS) design
\[y_{it}=\beta_0+\beta_1 x_{it}+e_{it}\]
Fixed Effects
\[ y_{it}=\beta_0+\beta_1 x_{it}+e_{it}\]
\(y\) is the observation for the ith unit at time t
If there is a lot of variation across units, we should account for this variation.
\[ y_{it}=\beta_0+\beta_1 x_{it}+\sum_t^{N-1} \gamma_{i} d_{i}+ e_{it}\]
\(d_i\) denotes a dummy variable for the “unit”
This is the fixed effects estimator, and it captures the extent to which heterogeneity in the “units” – the \(i\) intercepts – influence \(y\) alongside \(x\).
The fixed effects estimator is also called the least squares dummy variable (LSDV) estimator (Hsiao 2022), and the within effects estimator.
An equivalent approach is to remove the unit means from \(y\).
\[ y_{it} - \bar{y_i}=\beta_{0}+\beta_1 x_{it}+ e_{it}\]
The panel data is often leveraged to examine over time changes, i.e., autoregressive effects.
This model includes lagged dependent variables as predictors
If there is substantial heterogeneity across units, we should account for this variation.
Ignoring it will bias parameter estimates.
\[ \begin{matrix} y_{it} & = \alpha_{i} + \beta_{1}y_{it-1} + e_{it}\\ e_{it} & = s_i + u_{it}\\ \end{matrix} \]
In this model, \(\widehat\beta\) corresponds to
\[ \begin{matrix} \widehat\beta_y &=& {cov(y_{it}, y_{it-1}) \over var(y_{it-1})}\\ &=& {cov(\beta y_{it-1} + s_i + u_{y,it}, y_{it-1}) \over var(y_{it-1})}\\ &=& \frac{cov(\beta_y y_{it-1}, y_{it-1}) + cov(s_i, y_{it-1}) + cov(u_{y,it}, y_{it-1})}{var(y_{it-1})}\\ (\text{Exogeneity}) &=& \frac{\beta_y \cdot var(y_{it-1}) + cov(s_i, y_{it-1})}{var(y_{it-1})}\\ &=& \beta_y + \frac{cov(s_i, y_{it-1})}{var(y_{it-1})}\\ \end{matrix} \]
\[ \begin{matrix} \widehat\beta_y &=& {cov(y_{it}, y_{it-1}) \over var(y_{it-1})}\\ &=& {cov(\beta y_{it-1} + s_i + u_{y,it}, y_{it-1}) \over var(y_{it-1})}\\ \end{matrix} \]
The result is even more general in that the bias will exist for \(x\) variables that are correlated with the unit effects, \(s_i\).
What are “unit effects”? They are just the unit means averaged over time (e.g., country means)
Important Note: The fixed effects estimator does not account for time varying unobserved heterogeneity. It only accounts for time invariant unobserved heterogeneity.
Another Important Note: The fixed effects estimator with a lagged dependent variable – the dynamic panel design – produces biased estimates of the lagged dependent variable coefficient (Nickell 1981). The bias decreases as \(T\) increases.
An alternative to the fixed effects estimator is the random effects estimator
In the random effects model, the intercepts are drawn from a probability density, rather than estimating \(J-1\) dummy variables (i.e, unit averages)
The nested logic is the same; the unit of observation is the ith observation nested within the jth higher level unit (e.g, country-time nested in countries)
Level 1 (within-group): \[ y_{ij} = \beta_{0t} + \beta_1 x_{it} + \epsilon_{it} \] Level 2 (between-group): \[ \beta_{0t} = \gamma_{0} + u_{0t} \] where:
\[ \begin{align} \epsilon_{ij} &\sim N(0, \sigma^2) \\ u_{0j} &\sim N(0, \tau_{0}) \end{align} \]
Reduced form: \[ y_{ij} = \gamma_{0} + \beta_1 x_{ij} + (u_{0j} + \epsilon_{ij}) \quad \text{(composite error)} \]
The random intercept model captures variation at two levels: within groups (level 1) and between groups (level 2)
It’s an Analysis of Variance (ANOVA), partitioning between unit and within unit variation
The intraclass correlation coefficient (ICC) measures the proportion of variance at the group level
\[ y_{it} = \gamma_{0} + \beta_1 x_{it} + (u_{0t} + \epsilon_{it}) \quad \text{(composite error)} \]
Variance components:
\[ \begin{aligned} u_{0t} &\sim N(0, \tau_{0}) \quad \text{(between-group variance)} \\ \epsilon_{it} &\sim N(0, \sigma^2) \quad \text{(within-group variance)} \end{aligned} \]
Total variance: \[ \text{Var}(y_{it}) = \sigma^2 + \tau_{0} = var(\text{Within}) + var(\text{Between}) \]
Intraclass correlation:
\[ \rho = \frac{\tau_{0}}{\tau_{0} + \sigma^2} \]
\[\begin{eqnarray} y_{it}=b_{0,i}+b_{1} x_{it}+e_{1,it}\\ b_{0,i}=\omega_0+\omega_1 x_{i}+e_{2,i}\\ e_{1,} \sim N(0, \sigma_1^2)\\ e_{2,it} \sim N(0, \sigma_2^2) \end{eqnarray}\]
\[\begin{eqnarray} y_{it}=b_{0,i}+b_{1,i} x_{it}+e_{1,it}\\ b_{0,i}=\omega_0+\omega_1 x_{i}+e_{2,i}\\ b_{1,i}=\omega_0+\omega_1 x_{i}+e_{3,i}\\ e_{1,it} \sim N(0, \sigma_1^2)\\ e_{2,i} \sim N(0, \sigma_2^2)\\ e_{3,i} \sim N(0, \sigma_3^2) \end{eqnarray}\]
Variance components:
\[ \begin{aligned} e_{1,it} &\sim N(0, \sigma_1^2) \quad \text{(level-1 residual variance)} \\ e_{2,i} &\sim N(0, \sigma_2^2) \quad \text{(random intercept variance)} \\ e_{3,i} &\sim N(0, \sigma_3^2) \quad \text{(random slope variance)} \end{aligned} \]
\[y_{j,i}=\beta_0+\sum_j^{J-1} \gamma_{j} d_j+ e_{j,i}\]
\[y_{j,i}=\beta_0+ e_{j,i}\]
\[\tiny \begin{eqnarray} y_{j[i]}=b_{0,j[i]}+e_{1,i}\\ \end{eqnarray}\]
\[\tiny \begin{eqnarray} b_{0,j}={{y_j\times n_j/\sigma^2_y+y_{all}\times 1/\sigma^2_{b_0}}\over{n_j/\sigma^2_y+ 1/\sigma^2_{b_0}}}\end{eqnarray}\]
The first part of the numerator represents the movement away from a common mean. Note that as \(n_j\) increases (the group size), the estimate is pulled further from the common mean (which of course is what’s on the right in the numerator).
As \(n_j\) increases, the estimate of the estimated mean is influenced more by the group than a common mean.
As \(n_j\) decreases – so small groups – the formula now allows for a stronger likelihood that the estimates pools around a single value.
\[ \begin{eqnarray} b_{0,j}={{y_j\times n_j/\sigma^2_y+y_{all}\times 1/\sigma^2_{b_0}}\over{n_j/\sigma^2_y+ 1/\sigma^2_{b_0}}}\end{eqnarray}\]
As the within group variance increases, the group mean is pullled towards the pooled mean
As the between group variance increases, the common mean exerts a smaller impact
The values in the numerator are then weighted by the variation between and within level-2 units
\[ICC=\sigma^2_{b_0}/[\sigma^2_{b_0}+\sigma^2_{y}]\]
Recall,
\[\sigma^2_{all}=\sigma^2_{b_0}+\sigma^2_{y}\]
The ICC should decrease as you include level-2 predictors; compare to a model without predictors
Interpretation of the level-2 expected values (i.e., the group means) is based on a compromise between the pooled and no pooling models
If we estimate a regression model with a dummy for every level-2 unit and predictors, the model is not identified because the variables will be collinear (Gelman and Hill 2009, 269)