Panel Data Discussion

Author

Gina Occhipinti

Weekly Discussion - Panel Data

1. Choose and Analyze Data

Please choose any panel data and show/tell if data is balanced or not. What is the time component and the entity component in the data?

For this discussion, I’ve chosen the Guns dataset from the AER packages in R. Guns is a balanced panel of data on 50 US states, plus the District of Columbia (for a total of 51 states), by year for 1977–1999. There are 23 years of data per state. The data contains 1,173 observations on 13 variables total. The following variables are:

state - factor indicating state.

year - factor indicating year.

violent - violent crime rate (incidents per 100,000 members of the population).

murder - murder rate (incidents per 100,000).

robbery - robbery rate (incidents per 100,000).

prisoners - incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year).

afam - percent of state population that is African-American, ages 10 to 64.

cauc - percent of state population that is Caucasian, ages 10 to 64.

male - percent of state population that is male, ages 10 to 29.

population - state population, in millions of people.

income - real per capita personal income in the state (US dollars).

density - population per square mile of land area, divided by 1,000.

law - factor. Does the state have a shall carry law in effect in that year?

The time component is year. The year variable captures years 1977 - 1999 (23 years). The entity component is state. The state variable captures all 50 states plus Washington, D.C., for 51 states total. In a balanced panel, the number of entities and number of years equals the total number of observations in the dataset. There is an equal number of rows for each variable. Given our time and entity multiplied together equal the number of observations in the dataset, we can say this panel data is balanced. The code included further proves this.

Code

#load and name the data
guns_data <-read.csv("/Users/ginaocchipinti/Documents/Econometrics Course - BC/Guns-1.csv")

# learn about the Guns dataset
?Guns

# view the data
View(guns_data)

# look at all the variables, class, and examples
str(guns_data)

'data.frame':   1173 obs. of  13 variables:
 $ year      : int  1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 ...
 $ violent   : num  414 419 413 448 470 ...
 $ murder    : num  14.2 13.3 13.2 13.2 11.9 10.6 9.2 9.4 9.8 10.1 ...
 $ robbery   : num  96.8 99.1 109.5 132.1 126.5 ...
 $ prisoners : int  83 94 144 141 149 183 215 243 256 267 ...
 $ afam      : num  8.38 8.35 8.33 8.41 8.48 ...
 $ cauc      : num  55.1 55.1 55.1 54.9 54.9 ...
 $ male      : num  18.2 18 17.8 17.7 17.7 ...
 $ population: num  3.78 3.83 3.87 3.9 3.92 ...
 $ income    : num  9563 9932 9877 9541 9548 ...
 $ density   : num  0.0746 0.0756 0.0762 0.0768 0.0772 ...
 $ state     : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
 $ law       : chr  "no" "no" "no" "no" ...

Code

#count how many unique values are in year
n_distinct(guns_data$year)

[1] 23

Code

#count how many unique values are in state
n_distinct(guns_data$state)

[1] 51

Code

# understand number of observations for each varible
describe(guns_data)

           vars    n     mean      sd   median  trimmed     mad     min
year          1 1173  1988.00    6.64  1988.00  1988.00    8.90 1977.00
violent       2 1173   503.07  334.28   443.00   464.50  266.42   47.00
murder        3 1173     7.67    7.52     6.40     6.72    4.30    0.20
robbery       4 1173   161.82  170.51   124.10   133.84   91.18    6.40
prisoners     5 1173   226.58  178.89   187.00   202.38  123.06   19.00
afam          6 1173     5.34    4.89     4.03     4.53    3.14    0.25
cauc          7 1173    62.95    9.76    65.06    64.48    6.68   21.78
male          8 1173    16.08    1.73    15.90    16.04    2.09   12.21
population    9 1173     4.82    5.25     3.27     3.78    3.20    0.40
income       10 1173 13724.80 2554.54 13401.55 13549.54 2406.96 8554.88
density      11 1173     0.35    1.36     0.08     0.12    0.09    0.00
state*       12 1173    26.00   14.73    26.00    26.00   19.27    1.00
law*         13 1173     1.24    0.43     1.00     1.18    0.00    1.00
                max    range  skew kurtosis    se
year        1999.00    22.00  0.00    -1.21  0.19
violent     2921.80  2874.80  2.54    11.85  9.76
murder        80.60    80.40  5.78    45.41  0.22
robbery     1635.10  1628.70  3.88    21.28  4.98
prisoners   1913.00  1894.00  3.88    25.99  5.22
afam          26.98    26.73  2.35     6.69  0.14
cauc          76.53    54.75 -2.22     6.05  0.29
male          22.35    10.14  0.27    -0.57  0.05
population    33.15    32.74  2.43     7.31  0.15
income     23646.71 15091.83  0.73     0.64 74.59
density       11.10    11.10  6.69    44.21  0.04
state*        51.00    50.00  0.00    -1.20  0.43
law*           2.00     1.00  1.20    -0.57  0.01

Code

# number of entities times number of years 
print(paste("The number of states times years for each state is", 23*51))

[1] "The number of states times years for each state is 1173"

2. OLS Regression

Type out meaningful estimating equation and run the OLS regression/estimate the coefficients.

Do the estimated coefficients make sense (direction, magnitude, statistical significance)? Could there be omitted variable bias that could potentially be reduced by throwing in fixed effects?

Our model below takes the natural log violent (violent crime rate in state i) as the dependent (Y) variable to understand the percent change depending on the one independent variable law (whether the state has a shall carry law in effect that year). A shall carry law is a gun law that requires law enforcement to issue a permit to carry a concealed firearm. If yes, citizens must have a permit to carry a concealed firearm. If not, citizens can legally posses a firearm and carry it without a permit. There is a controversial debate whether and to what extent this law influences crime. For the sake of this example, we assume that states who have conceal carry laws have less crime. This is because this could bring a greater prevalence of gun owners which might deter criminals from committing violent crime. Also, since you need a permit to carry the gun, you’re less likely to commit violent crime as you can be more easily identified and persecuted. Whereas, states without carry laws might be more likely to have higher violent crime rates because permits are not needed to carry, so it’s easier to obtain a gun.

In our model below, \(\beta_0\) represents the violent crime rate when there is no carry law. \(\beta_1\) represents the % change in violent crime when there is a carry law, and \(u_i\) is the error term representing other factors not included in the model.

\(log(violent_i)=\beta_0 + \beta_1law_i + u_i\)

The results below show that when there are no carry laws applied, or the value is “no”, the violent crime rate is about 462 (after exponentiation). This result is statistically significant at the level of 0.01. If the state does have the law applied, violent crimes decreases by 44.3%. This generally aligns with expectations - if more citizens are armed, this might deter criminals from carrying out violent crime because they aren’t sure who is armed. However, its R-squared is only 8.7% which is quite low. The reducation in the violent crime rate is also quite large. Due to this, there is likely omitted variable bias that could be reduced by using a fixed effects approach.

Code

# create a model for our example

model_ols <- lm(log(violent) ~ law, data = guns_data)

# run summary() to view results
summary(model_ols)


Call:
lm(formula = log(violent) ~ law, data = guns_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.28477 -0.42748  0.04655  0.42172  1.84504 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.13492    0.02072  296.13   <2e-16 ***
lawyes      -0.44296    0.04203  -10.54   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6174 on 1171 degrees of freedom
Multiple R-squared:  0.08664,   Adjusted R-squared:  0.08586 
F-statistic: 111.1 on 1 and 1171 DF,  p-value: < 2.2e-16

Code

stargazer(model_ols, type = "text")


===============================================
                        Dependent variable:    
                    ---------------------------
                           log(violent)        
-----------------------------------------------
lawyes                       -0.443***         
                              (0.042)          
                                               
Constant                     6.135***          
                              (0.021)          
                                               
-----------------------------------------------
Observations                   1,173           
R2                             0.087           
Adjusted R2                    0.086           
Residual Std. Error      0.617 (df = 1171)     
F Statistic          111.079*** (df = 1; 1171) 
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

3. Fixed Effects Models

There are three different ways to do so. Type out the estimating equation. Do your coefficients change? Why or why not? Tell us what the fixed effects controlling for are (time-invariant characteristics of the entity, or time-varying characteristics affecting all entities, or both - based on your specification). It is common to include both time and entity fixed effects in many applications in Economics. Do you get the same coefficient if you specify the Fixed Effect in an alternative way?

In the new model below, I use a fixed effects model, specifically state fixed effects to control for time-invariant difference between states. These are factors that change from state to state like political climate, poverty rate, but don’t change over time. This model interestingly results in a positive relationship between having shall carry laws and violent crime - if a state has a carry law, violent crime increases by 11.4%. This is statistically significant at the 0.01 level. It’s possible that more guns in a state could lead to escalated or accidental shootings, thus the increase in violent crime. The R-squared shows that the variation in the natural log of the violent crime rate is only explained by the model by 3.9%. Due to this, we still may be suffering from OVB.

\(log(violent)=\beta_0 + \beta_1law_{it} + \delta_2 state \ 2_i + \delta_3 state \ 3_i + ... + \delta_n state \ n_i + u_{it}\)

Code

# create a model for our example
model_fe <- plm(log(violent) ~ law, data = guns_data, index = c("state", "year"), model = "within")

# run summary() to view results
summary(model_fe)

Oneway (individual) effect Within Model

Call:
plm(formula = log(violent) ~ law, data = guns_data, model = "within", 
    index = c("state", "year"))

Balanced Panel: n = 51, T = 23, N = 1173

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.5155092 -0.1234488 -0.0049341  0.1204905  0.5780460 

Coefficients:
       Estimate Std. Error t-value  Pr(>|t|)    
lawyes 0.113663   0.016929  6.7142 2.999e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    36.789
Residual Sum of Squares: 35.367
R-Squared:      0.038659
Adj. R-Squared: -0.0050769
F-statistic: 45.08 on 1 and 1121 DF, p-value: 2.9991e-11

Code

stargazer(model_fe, type = "text")


========================================
                 Dependent variable:    
             ---------------------------
                    log(violent)        
----------------------------------------
lawyes                0.114***          
                       (0.017)          
                                        
----------------------------------------
Observations            1,173           
R2                      0.039           
Adjusted R2            -0.005           
F Statistic   45.080*** (df = 1; 1121)  
========================================
Note:        *p<0.1; **p<0.05; ***p<0.01

I’ve tried a third model using the fixed effects technique, but this time controlling for both time invariant state effects, and state-constant time-varying effects. So for example, it could be controlling for factors like political climate, but also national gun policies that affect all states. This results in a very small positive coefficient that is not statistically significant. The R-squared also reduces substantially, showing that the variation in violent crime percentage is not much explained by this model. It appears that after controlling for both state and time effects, carry laws do not impact violent crime rates per state very much.

\(log(violent_{it})=\beta_0 + \beta_1law_{i} + \alpha_i + \lambda_t + u_i\)

Code

# create a model for our example
model_fe_2 <- plm(log(violent) ~ law, data = guns_data, index = c("state", "year"), model = "within", effect = "twoways")

# run summary() to view results
summary(model_fe_2)

Twoways effects Within Model

Call:
plm(formula = log(violent) ~ law, data = guns_data, effect = "twoways", 
    model = "within", index = c("state", "year"))

Balanced Panel: n = 51, T = 23, N = 1173

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.4606932 -0.0852717  0.0021864  0.0863644  0.7018717 

Coefficients:
       Estimate Std. Error t-value Pr(>|t|)
lawyes 0.001885   0.016613  0.1135   0.9097

Total Sum of Squares:    22.69
Residual Sum of Squares: 22.69
R-Squared:      1.1714e-05
Adj. R-Squared: -0.066412
F-statistic: 0.0128737 on 1 and 1099 DF, p-value: 0.90968

Code

stargazer(model_fe_2, type = "text")


========================================
                 Dependent variable:    
             ---------------------------
                    log(violent)        
----------------------------------------
lawyes                  0.002           
                       (0.017)          
                                        
----------------------------------------
Observations            1,173           
R2                     0.00001          
Adjusted R2            -0.066           
F Statistic     0.013 (df = 1; 1099)    
========================================
Note:        *p<0.1; **p<0.05; ***p<0.01