Panel Data

Q1 Data Set

data("USSeatBelts")
pdim(pdata.frame(USSeatBelts, index = c("state", "year")))
Balanced Panel: n = 51, T = 15, N = 765

The entity component is the US state (i), representing 51 states, and the time component is the year (t), spanning 15 years from 1983 to 1997. Each state is observed once per year, giving us 765 total observations in a balanced panel.

Q2: Estiation Equation

fatalities_it = β₀ + β₁·seatbelt_it + β₂·speed65_it + β₃·speed70_it + β₄·drinkage_it + β₅·alcohol_it + β₆·income_it + β₇·age_it + ε_it

The key variable of interest is seatbelt — we expect β₁ < 0 (more seatbelt usage → fewer fatalities).

ols <- lm(fatalities ~ seatbelt + speed65 + speed70 + drinkage + alcohol + income + age, data = USSeatBelts)
summary(ols)

Call:
lm(formula = fatalities ~ seatbelt + speed65 + speed70 + drinkage + 
    alcohol + income + age, data = USSeatBelts)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0093668 -0.0022860 -0.0002033  0.0020267  0.0145960 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.772e-02  3.822e-03   9.867  < 2e-16 ***
seatbelt     2.045e-03  1.230e-03   1.662   0.0971 .  
speed65yes  -5.539e-05  4.221e-04  -0.131   0.8956    
speed70yes   2.220e-03  5.322e-04   4.171 3.52e-05 ***
drinkageyes -9.306e-04  9.060e-04  -1.027   0.3048    
alcoholyes  -1.985e-03  4.636e-04  -4.281 2.20e-05 ***
income      -8.218e-07  4.700e-08 -17.487  < 2e-16 ***
age         -5.507e-05  1.135e-04  -0.485   0.6277    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.003542 on 548 degrees of freedom
  (209 observations deleted due to missingness)
Multiple R-squared:  0.5102,    Adjusted R-squared:  0.504 
F-statistic: 81.56 on 7 and 548 DF,  p-value: < 2.2e-16

seatbelt (β₁ = 0.00205, p = 0.097): This is the most concerning result, the sign is positive, meaning higher seatbelt usage is associated with more fatalities. This is counterintuitive and almost certainly reflects omitted variable bias. States with historically bad driving conditions or cultures may have both higher fatality rates AND higher seatbelt adoption (in response). This is exactly the kind of bias fixed effects can fix.

speed70 (β₄ = 0.00222, p < 0.001): Makes sense — higher speed limits increase fatalities. Significant and in the right direction

alcohol (β₅ = -0.00199, p < 0.001): Stricter BAC laws reduce fatalities. Correct sign, highly significant

income (β₆ = -8.2e-7, p < 0.001): Wealthier states have fewer fatalities, possibly due to safer cars and better roads.

Correct sign speed65, drinkage, age: Not statistically significant.

Omitted variable bias concern: Yes — time-invariant state characteristics like road quality, rural vs. urban geography, and driving culture are all left out of pooled OLS. These likely correlate with both seatbelt usage and fatalities, biasing our coefficients. This is exactly what fixed effects are designed to address.

Q3 Fixed Effects Model

Method 1 LSDV

lsdv <- lm(fatalities ~ seatbelt + speed65 + speed70 + drinkage + alcohol + income + age + factor(state), data = USSeatBelts)
summary(lsdv)

Call:
lm(formula = fatalities ~ seatbelt + speed65 + speed70 + drinkage + 
    alcohol + income + age + factor(state), data = USSeatBelts)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0059989 -0.0010549 -0.0001455  0.0010132  0.0071260 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      3.187e-02  1.007e-02   3.166 0.001643 ** 
seatbelt        -7.428e-03  1.143e-03  -6.497 1.99e-10 ***
speed65yes      -7.294e-04  3.394e-04  -2.149 0.032103 *  
speed70yes       9.725e-04  3.346e-04   2.906 0.003822 ** 
drinkageyes      4.426e-04  5.298e-04   0.835 0.403877    
alcoholyes      -1.417e-03  3.834e-04  -3.696 0.000243 ***
income          -5.014e-07  6.500e-08  -7.713 6.74e-14 ***
age              2.601e-04  3.674e-04   0.708 0.479241    
factor(state)AL -4.993e-03  2.353e-03  -2.122 0.034362 *  
factor(state)AR -3.302e-03  2.657e-03  -1.243 0.214549    
factor(state)AZ -3.649e-03  2.133e-03  -1.711 0.087747 .  
factor(state)CA -5.694e-03  1.561e-03  -3.648 0.000292 ***
factor(state)CO -7.202e-03  1.776e-03  -4.054 5.84e-05 ***
factor(state)CT -9.740e-03  2.302e-03  -4.231 2.77e-05 ***
factor(state)DC -4.595e-03  2.403e-03  -1.912 0.056387 .  
factor(state)DE -7.041e-03  2.115e-03  -3.328 0.000939 ***
factor(state)FL -2.657e-03  3.314e-03  -0.802 0.422991    
factor(state)GA -7.735e-03  1.709e-03  -4.527 7.49e-06 ***
factor(state)HI -7.293e-03  1.859e-03  -3.923 9.96e-05 ***
factor(state)IA -7.211e-03  2.639e-03  -2.733 0.006503 ** 
factor(state)ID -3.800e-03  1.757e-03  -2.164 0.030975 *  
factor(state)IL -7.862e-03  2.006e-03  -3.919 0.000102 ***
factor(state)IN -9.931e-03  2.197e-03  -4.521 7.71e-06 ***
factor(state)KS -7.484e-03  2.239e-03  -3.342 0.000894 ***
factor(state)KY -7.307e-03  2.314e-03  -3.157 0.001690 ** 
factor(state)LA -3.886e-03  1.778e-03  -2.185 0.029349 *  
factor(state)MA -1.405e-02  2.394e-03  -5.869 7.99e-09 ***
factor(state)MD -7.440e-03  1.886e-03  -3.944 9.17e-05 ***
factor(state)ME -1.127e-02  2.665e-03  -4.229 2.79e-05 ***
factor(state)MI -8.197e-03  1.984e-03  -4.132 4.22e-05 ***
factor(state)MN -1.228e-02  1.995e-03  -6.154 1.55e-09 ***
factor(state)MO -6.777e-03  2.426e-03  -2.793 0.005421 ** 
factor(state)MS -1.858e-03  2.126e-03  -0.874 0.382491    
factor(state)MT -3.737e-03  2.295e-03  -1.628 0.104159    
factor(state)NC -3.830e-03  2.246e-03  -1.705 0.088752 .  
factor(state)ND -1.496e-02  2.496e-03  -5.993 3.96e-09 ***
factor(state)NE -9.778e-03  2.298e-03  -4.254 2.51e-05 ***
factor(state)NH -1.088e-02  1.970e-03  -5.523 5.37e-08 ***
factor(state)NJ -1.076e-02  2.326e-03  -4.626 4.77e-06 ***
factor(state)NM -1.510e-03  1.799e-03  -0.839 0.401798    
factor(state)NV  1.690e-04  1.966e-03   0.086 0.931517    
factor(state)NY -6.660e-03  2.241e-03  -2.971 0.003111 ** 
factor(state)OH -1.037e-02  2.286e-03  -4.538 7.12e-06 ***
factor(state)OK -1.080e-02  2.374e-03  -4.552 6.69e-06 ***
factor(state)OR -5.057e-03  2.468e-03  -2.049 0.040994 *  
factor(state)PA -8.827e-03  2.805e-03  -3.147 0.001749 ** 
factor(state)RI -1.680e-02  2.689e-03  -6.247 8.96e-10 ***
factor(state)SC -3.083e-03  2.146e-03  -1.436 0.151517    
factor(state)SD -8.538e-03  2.336e-03  -3.656 0.000284 ***
factor(state)TN -4.386e-03  2.392e-03  -1.834 0.067295 .  
factor(state)TX -6.607e-03  1.537e-03  -4.300 2.05e-05 ***
factor(state)UT -8.508e-03  1.094e-03  -7.777 4.29e-14 ***
factor(state)VA -9.707e-03  1.936e-03  -5.013 7.47e-07 ***
factor(state)VT -9.134e-03  2.193e-03  -4.166 3.66e-05 ***
factor(state)WA -9.888e-03  1.938e-03  -5.102 4.79e-07 ***
factor(state)WI -1.101e-02  2.234e-03  -4.929 1.13e-06 ***
factor(state)WV -5.479e-03  3.089e-03  -1.774 0.076724 .  
factor(state)WY -5.991e-03  1.796e-03  -3.336 0.000912 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.001836 on 498 degrees of freedom
  (209 observations deleted due to missingness)
Multiple R-squared:  0.8804,    Adjusted R-squared:  0.8667 
F-statistic:  64.3 on 57 and 498 DF,  p-value: < 2.2e-16

Method 2: plm within estimator

pdata <- pdata.frame(USSeatBelts, index = c("state", "year"))

fe <- plm(fatalities ~ seatbelt + speed65 + speed70 + drinkage + alcohol + income + age, data = pdata, model = "within", effect = "individual")
summary(fe)
Oneway (individual) effect Within Model

Call:
plm(formula = fatalities ~ seatbelt + speed65 + speed70 + drinkage + 
    alcohol + income + age, data = pdata, effect = "individual", 
    model = "within")

Unbalanced Panel: n = 51, T = 8-15, N = 556

Residuals:
       Min.     1st Qu.      Median     3rd Qu.        Max. 
-0.00599893 -0.00105491 -0.00014547  0.00101325  0.00712605 

Coefficients:
               Estimate  Std. Error t-value  Pr(>|t|)    
seatbelt    -7.4279e-03  1.1433e-03 -6.4967 1.994e-10 ***
speed65yes  -7.2937e-04  3.3937e-04 -2.1492  0.032103 *  
speed70yes   9.7250e-04  3.3463e-04  2.9062  0.003822 ** 
drinkageyes  4.4265e-04  5.2984e-04  0.8354  0.403877    
alcoholyes  -1.4171e-03  3.8338e-04 -3.6963  0.000243 ***
income      -5.0138e-07  6.5004e-08 -7.7130 6.743e-14 ***
age          2.6012e-04  3.6737e-04  0.7081  0.479241    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    0.005078
Residual Sum of Squares: 0.0016795
R-Squared:      0.66927
Adj. R-Squared: 0.63141
F-statistic: 143.963 on 7 and 498 DF, p-value: < 2.22e-16

Method 3: De-meaned manually

USSeatBelts$fat_dm <- USSeatBelts$fatalities - ave(USSeatBelts$fatalities, USSeatBelts$state, na.rm=TRUE)
USSeatBelts$sb_dm <- USSeatBelts$seatbelt - ave(USSeatBelts$seatbelt, USSeatBelts$state, na.rm=TRUE)

demeaned <- lm(fat_dm ~ sb_dm - 1, data = USSeatBelts)
summary(demeaned)

Call:
lm(formula = fat_dm ~ sb_dm - 1, data = USSeatBelts)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0023809 -0.0015055 -0.0002406  0.0010775  0.0035358 

Coefficients:
       Estimate Std. Error t value Pr(>|t|)    
sb_dm -0.013907   0.001841  -7.552 2.52e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.001733 on 29 degrees of freedom
  (735 observations deleted due to missingness)
Multiple R-squared:  0.6629,    Adjusted R-squared:  0.6513 
F-statistic: 57.04 on 1 and 29 DF,  p-value: 2.519e-08

Once we add state fixed effects, the seatbelt coefficient flips from +0.00205 in pooled OLS to -0.00743 in both the LSDV and plm within models, and -0.01391 in the de-meaned model — all three now negative and statistically significant. This dramatic sign reversal confirms that pooled OLS was severely biased upward due to omitted state-level characteristics, and that higher seatbelt usage does in fact reduce traffic fatalities once we control for time-invariant differences across states.

Do the Coefficients Change?

Yes — dramatically. The seatbelt coefficient flips sign from positive to negative once we add state fixed effects. This confirms our omitted variable bias concern from Step 2. In pooled OLS, time-invariant state characteristics (rural geography, driving culture, road quality) were confounding the estimate. States with dangerous driving conditions adopted seatbelt laws in response to high fatalities, making seatbelts look harmful. Fixed effects remove that bias by only exploiting within-state variation over time. Methods 1 and 2 give identical seatbelt coefficients (-0.00743), confirming LSDV and the within estimator are mathematically equivalent.

What Are the Fixed Effects Controlling For?

In our specification with factor(state), we have entity fixed effects only, meaning we control for all time-invariant characteristics of each state such as geography, baseline driving culture, permanent infrastructure quality, and anything else that doesn’t change over time within a state. We are not controlling for time fixed effects here, meaning national-level shocks (e.g., a federal highway safety campaign in a given year) are still in the error term.

Conclusion

Using the USSeatBelts panel dataset, this analysis examined whether seatbelt usage reduces traffic fatalities across 51 US states over 15 years. Pooled OLS produced a positive and barely significant seatbelt coefficient, indicating omitted variable bias — states with dangerous driving conditions tended to adopt seatbelt laws in response to high fatalities, making seatbelts appear harmful. Once state fixed effects were introduced, the coefficient flipped negative and became highly significant across all three estimation methods (LSDV, plm within, and de-meaning), confirming that higher seatbelt usage does reduce fatalities. This exercise highlights the importance of fixed effects in panel data — without them, a genuinely effective policy can appear counterproductive.