Please choose any panel data and show/tell if data is balanced or not. What is the time component and the entity component in the data?
For this discussion, I’ve chosen the Guns dataset from the AER packages in R. Guns is a balanced panel of data on 50 US states, plus the District of Columbia (for a total of 51 states), by year for 1977–1999. There are 23 years of data per state. The data contains 1,173 observations on 13 variables total. The following variables are:
state - factor indicating state.
year - factor indicating year.
violent - violent crime rate (incidents per 100,000 members of the population).
murder - murder rate (incidents per 100,000).
robbery - robbery rate (incidents per 100,000).
prisoners - incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year).
afam - percent of state population that is African-American, ages 10 to 64.
cauc - percent of state population that is Caucasian, ages 10 to 64.
male - percent of state population that is male, ages 10 to 29.
population - state population, in millions of people.
income - real per capita personal income in the state (US dollars).
density - population per square mile of land area, divided by 1,000.
law - factor. Does the state have a shall carry law in effect in that year?
The time component is year. The year variable captures years 1977 - 1999 (23 years). The entity component is state. The state variable captures all 50 states plus Washington, D.C., for 51 states total. In a balanced panel, the number of entities and number of years equals the total number of observations in the dataset. There is an equal number of rows for each variable. Given our time and entity multiplied together equal the number of observations in the dataset, we can say this panel data is balanced. The code included further proves this.
Code
#load and name the dataguns_data <-read.csv("/Users/ginaocchipinti/Documents/Econometrics Course - BC/Guns-1.csv")# learn about the Guns dataset?Guns# view the dataView(guns_data)# look at all the variables, class, and examplesstr(guns_data)
'data.frame': 1173 obs. of 13 variables:
$ year : int 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 ...
$ violent : num 414 419 413 448 470 ...
$ murder : num 14.2 13.3 13.2 13.2 11.9 10.6 9.2 9.4 9.8 10.1 ...
$ robbery : num 96.8 99.1 109.5 132.1 126.5 ...
$ prisoners : int 83 94 144 141 149 183 215 243 256 267 ...
$ afam : num 8.38 8.35 8.33 8.41 8.48 ...
$ cauc : num 55.1 55.1 55.1 54.9 54.9 ...
$ male : num 18.2 18 17.8 17.7 17.7 ...
$ population: num 3.78 3.83 3.87 3.9 3.92 ...
$ income : num 9563 9932 9877 9541 9548 ...
$ density : num 0.0746 0.0756 0.0762 0.0768 0.0772 ...
$ state : chr "Alabama" "Alabama" "Alabama" "Alabama" ...
$ law : chr "no" "no" "no" "no" ...
Code
#count how many unique values are in yearn_distinct(guns_data$year)
[1] 23
Code
#count how many unique values are in staten_distinct(guns_data$state)
[1] 51
Code
# understand number of observations for each varibledescribe(guns_data)
# number of entities times number of years print(paste("The number of states times years for each state is", 23*51))
[1] "The number of states times years for each state is 1173"
2. OLS Regression
Type out meaningful estimating equation and run the OLS regression/estimate the coefficients.
Do the estimated coefficients make sense (direction, magnitude, statistical significance)? Could there be omitted variable bias that could potentially be reduced by throwing in fixed effects?
Our model below takes the natural log violent (violent crime rate in state i) as the dependent (Y) variable to understand the percent change depending on the one independent variable law (whether the state has a shall carry law in effect that year). A shall carry law is a gun law that requires law enforcement to issue a permit to carry a concealed firearm. If yes, citizens must have a permit to carry a concealed firearm. If not, citizens can legally posses a firearm and carry it without a permit. There is a controversial debate whether and to what extent this law influences crime. For the sake of this example, we assume that states who have conceal carry laws have less crime. This is because this could bring a greater prevalence of gun owners which might deter criminals from committing violent crime. Also, since you need a permit to carry the gun, you’re less likely to commit violent crime as you can be more easily identified and persecuted. Whereas, states without carry laws might be more likely to have higher violent crime rates because permits are not needed to carry, so it’s easier to obtain a gun.
In our model below, \(\beta_0\) represents the violent crime rate when there is no carry law. \(\beta_1\) represents the % change in violent crime when there is a carry law, and \(u_i\) is the error term representing other factors not included in the model.
\(log(violent_i)=\beta_0 + \beta_1law_i + u_i\)
The results below show that when there are no carry laws applied, or the value is “no”, the violent crime rate is about 462 (after exponentiation). This result is statistically significant at the level of 0.01. If the state does have the law applied, violent crimes decreases by 44.3%. This generally aligns with expectations - if more citizens are armed, this might deter criminals from carrying out violent crime because they aren’t sure who is armed. However, its R-squared is only 8.7% which is quite low. The reducation in the violent crime rate is also quite large. Due to this, there is likely omitted variable bias that could be reduced by using a fixed effects approach.
Code
# create a model for our examplemodel_ols <-lm(log(violent) ~ law, data = guns_data)# run summary() to view resultssummary(model_ols)
Call:
lm(formula = log(violent) ~ law, data = guns_data)
Residuals:
Min 1Q Median 3Q Max
-2.28477 -0.42748 0.04655 0.42172 1.84504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.13492 0.02072 296.13 <2e-16 ***
lawyes -0.44296 0.04203 -10.54 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6174 on 1171 degrees of freedom
Multiple R-squared: 0.08664, Adjusted R-squared: 0.08586
F-statistic: 111.1 on 1 and 1171 DF, p-value: < 2.2e-16
There are three different ways to do so. Type out the estimating equation. Do your coefficients change? Why or why not? Tell us what the fixed effects controlling for are (time-invariant characteristics of the entity, or time-varying characteristics affecting all entities, or both - based on your specification). It is common to include both time and entity fixed effects in many applications in Economics. Do you get the same coefficient if you specify the Fixed Effect in an alternative way?
In the new model below, I use a fixed effects model, specifically state fixed effects to control for time-invariant difference between states. These are factors that change from state to state like political climate, poverty rate, but don’t change over time. This model interestingly results in a positive relationship between having shall carry laws and violent crime - if a state has a carry law, violent crime increases by 11.4%. This is statistically significant at the 0.01 level. It’s possible that more guns in a state could lead to escalated or accidental shootings, thus the increase in violent crime. The R-squared shows that the variation in the natural log of the violent crime rate is only explained by the model by 3.9%. Due to this, we still may be suffering from OVB.
\(log(violent)=\beta_0 + \beta_1law_{it} + \delta_2 state \ 2_i + \delta_3 state \ 3_i + ... + \delta_n state \ n_i + u_{it}\)
Code
# create a model for our examplemodel_fe <-plm(log(violent) ~ law, data = guns_data, index =c("state", "year"), model ="within")# run summary() to view resultssummary(model_fe)
Oneway (individual) effect Within Model
Call:
plm(formula = log(violent) ~ law, data = guns_data, model = "within",
index = c("state", "year"))
Balanced Panel: n = 51, T = 23, N = 1173
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.5155092 -0.1234488 -0.0049341 0.1204905 0.5780460
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
lawyes 0.113663 0.016929 6.7142 2.999e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 36.789
Residual Sum of Squares: 35.367
R-Squared: 0.038659
Adj. R-Squared: -0.0050769
F-statistic: 45.08 on 1 and 1121 DF, p-value: 2.9991e-11
I’ve tried a third model using the fixed effects technique, but this time controlling for both time invariant state effects, and state-constant time-varying effects. So for example, it could be controlling for factors like political climate, but also national gun policies that affect all states. This results in a very small positive coefficient that is not statistically significant. The R-squared also reduces substantially, showing that the variation in violent crime percentage is not much explained by this model. It appears that after controlling for both state and time effects, carry laws do not impact violent crime rates per state very much.
# create a model for our examplemodel_fe_2 <-plm(log(violent) ~ law, data = guns_data, index =c("state", "year"), model ="within", effect ="twoways")# run summary() to view resultssummary(model_fe_2)
Twoways effects Within Model
Call:
plm(formula = log(violent) ~ law, data = guns_data, effect = "twoways",
model = "within", index = c("state", "year"))
Balanced Panel: n = 51, T = 23, N = 1173
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.4606932 -0.0852717 0.0021864 0.0863644 0.7018717
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
lawyes 0.001885 0.016613 0.1135 0.9097
Total Sum of Squares: 22.69
Residual Sum of Squares: 22.69
R-Squared: 1.1714e-05
Adj. R-Squared: -0.066412
F-statistic: 0.0128737 on 1 and 1099 DF, p-value: 0.90968