Discussion W4

1. Panel Data Structure

library(plm)
data("Produc", package = "plm")
head(Produc)
    state year region     pcap     hwy   water    util       pc   gsp    emp
1 ALABAMA 1970      6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5
2 ALABAMA 1971      6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9
3 ALABAMA 1972      6 15972.41 7765.42 1764.75 6442.23 38670.30 31303 1072.3
4 ALABAMA 1973      6 16406.26 7907.66 1742.41 6756.19 40084.01 33430 1135.5
5 ALABAMA 1974      6 16762.67 8025.52 1734.85 7002.29 42057.31 33749 1169.8
6 ALABAMA 1975      6 17316.26 8158.23 1752.27 7405.76 43971.71 33604 1155.4
  unemp
1   4.7
2   5.2
3   4.7
4   3.9
5   5.5
6   7.7
table(table(Produc$state))

17 
48 

The dataset records yearly data for 48 US states, from 1970 to 1986.

  • Entity component: state

  • Time component: year

Since all states have the same number of yearly observations (17), the panel is balanced.

2. OLS Regression

Estimating Equation

We model how Gross State Product (gsp) depends on employment (emp), unemployment (unemp), and highway spending (hwy):

\[ gsp_{it} = \beta_0 + \beta_1\,emp_{it} + \beta_2\,unemp_{it} + \beta_3\,hwy_{it} + \epsilon \]

ols <- lm(gsp ~ emp + unemp + hwy, data = Produc)
summary(ols)

Call:
lm(formula = gsp ~ emp + unemp + hwy, data = Produc)

Residuals:
   Min     1Q Median     3Q    Max 
-28643  -5580   1075   3605  61194 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -4646.9944  1039.8829  -4.469 8.98e-06 ***
emp            32.4206     0.7370  43.988  < 2e-16 ***
unemp        -222.8670   148.2528  -1.503    0.133    
hwy             1.0266     0.1483   6.921 9.10e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9283 on 812 degrees of freedom
Multiple R-squared:  0.9825,    Adjusted R-squared:  0.9824 
F-statistic: 1.516e+04 on 3 and 812 DF,  p-value: < 2.2e-16

Do the estimated coefficients make sense (direction, magnitude, statistical significance)?

Yes, the directions and sizes of the coefficients match what I’d expect, and the main coefficients are highly significant outside of unemployment.

Could there be omitted variable bias that could potentially be reduced by throwing in fixed effects?

Unobserved and state-specific factors such as geography, policies, or demographics could affect both GSP and the predictors. This could potentially be biasing OLS estimates.

3. Fixed Effects Models

a) Within Estimator (plm)

Estimating equation:

\[ gsp_{it} = \alpha_i + \beta_1\,emp_{it} + \beta_2\,unemp_{it} + \beta_3\,hwy_{it} + u_{it} \]

where \(\alpha_i\) captures unobserved, time-invariant state characteristics.

pdata <- pdata.frame(Produc, index = c("state", "year"))
fe <- plm(gsp ~ emp + unemp + hwy, data = pdata, model = "within")
summary(fe)
Oneway (individual) effect Within Model

Call:
plm(formula = gsp ~ emp + unemp + hwy, data = pdata, model = "within")

Balanced Panel: n = 48, T = 17, N = 816

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-16456.409   -712.464    -53.765    717.332  20705.483 

Coefficients:
      Estimate Std. Error  t-value  Pr(>|t|)    
emp   41.18126    0.34386 119.7632 < 2.2e-16 ***
unemp 59.30546   56.74199   1.0452    0.2963    
hwy   -1.08679    0.15668  -6.9365 8.567e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    1.3289e+11
Residual Sum of Squares: 5566400000
R-Squared:      0.95811
Adj. R-Squared: 0.95538
F-statistic: 5832.8 on 3 and 765 DF, p-value: < 2.22e-16

b) OLS with Entity Dummies

ols_dummies <- lm(gsp ~ emp + unemp + hwy + factor(state), data = Produc)
summary(ols_dummies)

Call:
lm(formula = gsp ~ emp + unemp + hwy + factor(state), data = Produc)

Residuals:
     Min       1Q   Median       3Q      Max 
-16456.4   -712.5    -53.8    717.3  20705.5 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                 -5.055e+03  1.243e+03  -4.068 5.23e-05 ***
emp                          4.118e+01  3.439e-01 119.763  < 2e-16 ***
unemp                        5.931e+01  5.674e+01   1.045 0.296271    
hwy                         -1.087e+00  1.567e-01  -6.936 8.57e-12 ***
factor(state)ARIZONA         4.431e+03  9.978e+02   4.440 1.03e-05 ***
factor(state)ARKANSAS        2.761e+03  1.075e+03   2.567 0.010455 *  
factor(state)CALIFORNIA      3.486e+04  5.257e+03   6.632 6.25e-11 ***
factor(state)COLORADO        3.762e+03  1.004e+03   3.746 0.000193 ***
factor(state)CONNECTICUT     2.945e+03  9.407e+02   3.130 0.001812 ** 
factor(state)DELAWARE        4.934e+03  1.251e+03   3.943 8.79e-05 ***
factor(state)FLORIDA        -7.197e+03  1.421e+03  -5.063 5.17e-07 ***
factor(state)GEORGIA        -6.845e+03  9.781e+02  -6.998 5.66e-12 ***
factor(state)IDAHO           5.250e+03  1.205e+03   4.356 1.50e-05 ***
factor(state)ILLINOIS        8.605e+03  2.721e+03   3.162 0.001627 ** 
factor(state)INDIANA        -2.512e+03  9.761e+02  -2.573 0.010255 *  
factor(state)IOWA            9.196e+03  9.971e+02   9.222  < 2e-16 ***
factor(state)KANSAS          8.474e+03  9.530e+02   8.892  < 2e-16 ***
factor(state)KENTUCKY        1.074e+04  1.031e+03  10.424  < 2e-16 ***
factor(state)LOUISIANA       2.997e+04  1.052e+03  28.481  < 2e-16 ***
factor(state)MAINE           2.387e+03  1.224e+03   1.950 0.051494 .  
factor(state)MARYLAND        5.726e+02  9.613e+02   0.596 0.551627    
factor(state)MASSACHUSETTS  -1.340e+04  1.011e+03 -13.255  < 2e-16 ***
factor(state)MICHIGAN        8.554e+03  1.764e+03   4.849 1.50e-06 ***
factor(state)MINNESOTA       4.079e+03  1.090e+03   3.741 0.000197 ***
factor(state)MISSISSIPPI     3.449e+03  9.763e+02   3.533 0.000435 ***
factor(state)MISSOURI       -5.475e+02  1.015e+03  -0.539 0.589725    
factor(state)MONTANA         8.278e+03  1.097e+03   7.547 1.27e-13 ***
factor(state)NEBRASKA        5.840e+03  1.024e+03   5.705 1.67e-08 ***
factor(state)NEVADA          4.979e+03  1.245e+03   3.999 6.98e-05 ***
factor(state)NEW_HAMPSHIRE   2.890e+03  1.221e+03   2.368 0.018130 *  
factor(state)NEW_JERSEY      3.771e+02  1.183e+03   0.319 0.749887    
factor(state)NEW_MEXICO      8.806e+03  1.130e+03   7.795 2.10e-14 ***
factor(state)NEW_YORK        7.010e+03  3.986e+03   1.759 0.079017 .  
factor(state)NORTH_CAROLINA -1.139e+04  9.874e+02 -11.531  < 2e-16 ***
factor(state)NORTH_DAKOTA    7.634e+03  1.153e+03   6.621 6.71e-11 ***
factor(state)OHIO           -2.880e+03  2.436e+03  -1.182 0.237433    
factor(state)OKLAHOMA        1.078e+04  9.717e+02  11.092  < 2e-16 ***
factor(state)OREGON          3.981e+03  9.682e+02   4.111 4.36e-05 ***
factor(state)PENNSYLVANIA   -1.343e+04  2.480e+03  -5.416 8.16e-08 ***
factor(state)RHODE_ISLAND    1.715e+03  1.265e+03   1.356 0.175552    
factor(state)SOUTH_CAROLINA -6.359e+03  1.073e+03  -5.928 4.64e-09 ***
factor(state)SOUTH_DAKOTA    6.606e+03  1.130e+03   5.847 7.41e-09 ***
factor(state)TENNESSE       -3.107e+03  9.707e+02  -3.201 0.001428 ** 
factor(state)TEXAS           4.592e+04  3.215e+03  14.284  < 2e-16 ***
factor(state)UTAH            4.010e+03  1.108e+03   3.619 0.000315 ***
factor(state)VERMONT         4.591e+03  1.260e+03   3.644 0.000287 ***
factor(state)VIRGINIA        2.528e+03  1.226e+03   2.061 0.039594 *  
factor(state)WASHINGTON      1.036e+04  9.687e+02  10.694  < 2e-16 ***
factor(state)WEST_VIRGINIA   7.964e+03  9.715e+02   8.198 1.03e-15 ***
factor(state)WISCONSIN      -8.131e+02  9.927e+02  -0.819 0.412983    
factor(state)WYOMING         1.135e+04  1.153e+03   9.839  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2697 on 765 degrees of freedom
Multiple R-squared:  0.9986,    Adjusted R-squared:  0.9985 
F-statistic: 1.095e+04 on 50 and 765 DF,  p-value: < 2.2e-16

c) Demeaning (manual)

# Demean variables by state
demeaned <- within(Produc, {
  gsp_dm   <- ave(gsp, state, FUN = function(x) x - mean(x))
  emp_dm   <- ave(emp, state, FUN = function(x) x - mean(x))
  unemp_dm <- ave(unemp, state, FUN = function(x) x - mean(x))
  hwy_dm   <- ave(hwy, state, FUN = function(x) x - mean(x))
})
manual_fe <- lm(gsp_dm ~ emp_dm + unemp_dm + hwy_dm - 1, data = demeaned)
summary(manual_fe)

Call:
lm(formula = gsp_dm ~ emp_dm + unemp_dm + hwy_dm - 1, data = demeaned)

Residuals:
     Min       1Q   Median       3Q      Max 
-16456.4   -712.5    -53.8    717.3  20705.5 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
emp_dm    41.1813     0.3336 123.463  < 2e-16 ***
unemp_dm  59.3055    55.0415   1.077    0.282    
hwy_dm    -1.0868     0.1520  -7.151 1.92e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2617 on 813 degrees of freedom
Multiple R-squared:  0.9581,    Adjusted R-squared:  0.958 
F-statistic:  6199 on 3 and 813 DF,  p-value: < 2.2e-16

Do your coefficients change? Why or why not?

Yes the coefficients changes because fixed effects control for all the state level characteristics that stay constant over time. This helps eliminate bias from omitted variables that do not change across the years.

What are the fixed effects controlling for?

The fixed effects model controls for all time-invariant characteristics of each state.

Do you get the same coefficient if you specify the Fixed Effect in an alternative way?

Yes, we can see in the three models used above that the coefficients were the same. For example, emp = 41.18, unemp= 59.3, and hwy = -1.09 in all three models.