Load Packages

#install.packages('AER')
#install.packages("plm")
#install.packages("foreign")
#install.packages("stargazer")
#install.packages("lmtest")
library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(readr)
library(AER)
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## Loading required package: sandwich
## Loading required package: survival
library(plm)
## 
## Attaching package: 'plm'
## The following objects are masked from 'package:dplyr':
## 
##     between, lag, lead
library(foreign)

Load Dataset

Description (Guns)

  • Guns is a panel dataset that contains 1173 observations and 13 variables from 1977 to 1999.

  • state: 51 states in the U.S.

  • year: From 1977 - 1999.

  • violent: violent crime rate (incidents per 100,000 members per. population).

  • murder: murder rate (incidents per 100,000).

  • robbery: robbery rate (incidents per 100,000).

  • prisoners: how many people were sentenced to prison in a specific state the year before, relative to the state’s total population.

  • afam: percent of state population that is African-American, ages 10 to 64.

  • cauc: percent of state population that is Caucasian, ages 10 to 64.

  • male: percent of state population that is male, ages 10 to 29.

  • population: state population (million).

  • income: real per capita personal income in the states.

  • density: population per square mile of land area, divided by 1,000.

  • law: binary variable. The state wether have a shall carry law in effect in that year.

data("Guns")
head(Guns)
##   year violent murder robbery prisoners     afam     cauc     male population
## 1 1977   414.4   14.2    96.8        83 8.384873 55.12291 18.17441   3.780403
## 2 1978   419.1   13.3    99.1        94 8.352101 55.14367 17.99408   3.831838
## 3 1979   413.3   13.2   109.5       144 8.329575 55.13586 17.83934   3.866248
## 4 1980   448.5   13.2   132.1       141 8.408386 54.91259 17.73420   3.900368
## 5 1981   470.5   11.9   126.5       149 8.483435 54.92513 17.67372   3.918531
## 6 1982   447.7   10.6   112.0       183 8.514000 54.89621 17.51052   3.925229
##     income   density   state law
## 1 9563.148 0.0745524 Alabama  no
## 2 9932.000 0.0755667 Alabama  no
## 3 9877.028 0.0762453 Alabama  no
## 4 9541.428 0.0768288 Alabama  no
## 5 9548.351 0.0771866 Alabama  no
## 6 9478.919 0.0773185 Alabama  no
str(Guns)
## 'data.frame':    1173 obs. of  13 variables:
##  $ year      : Factor w/ 23 levels "1977","1978",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ violent   : num  414 419 413 448 470 ...
##  $ murder    : num  14.2 13.3 13.2 13.2 11.9 10.6 9.2 9.4 9.8 10.1 ...
##  $ robbery   : num  96.8 99.1 109.5 132.1 126.5 ...
##  $ prisoners : int  83 94 144 141 149 183 215 243 256 267 ...
##  $ afam      : num  8.38 8.35 8.33 8.41 8.48 ...
##  $ cauc      : num  55.1 55.1 55.1 54.9 54.9 ...
##  $ male      : num  18.2 18 17.8 17.7 17.7 ...
##  $ population: num  3.78 3.83 3.87 3.9 3.92 ...
##  $ income    : num  9563 9932 9877 9541 9548 ...
##  $ density   : num  0.0746 0.0756 0.0762 0.0768 0.0772 ...
##  $ state     : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ law       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
# Check for NA in each column
na_counts <- sapply(Guns, function(x) sum(is.na(x)))

# Print the count of NAs for each column
na_counts
##       year    violent     murder    robbery  prisoners       afam       cauc 
##          0          0          0          0          0          0          0 
##       male population     income    density      state        law 
##          0          0          0          0          0          0

Time Component

  • For the “Guns” dataset, the time component would be time periods during data collected for each sate.

  • Year with level in 23 from 1977 to 1999.

Entity Component

  • 51 different U.S states in the dataset.

Balanced Panel Data

  • There are a total of 51 states * 23 years = 1173, so we have a balanced panel dataset.

OLS Estimation

lm_1 <- lm(data = Guns,
           formula = violent ~ afam,)
summary(lm_1)
## 
## Call:
## lm(formula = violent ~ afam, data = Guns)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1109.32  -135.58   -29.15   126.77  1732.84 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  295.044     11.886   24.82   <2e-16 ***
## afam          38.985      1.643   23.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 274.8 on 1171 degrees of freedom
## Multiple R-squared:  0.3247, Adjusted R-squared:  0.3241 
## F-statistic: 562.9 on 1 and 1171 DF,  p-value: < 2.2e-16

OVB

  • Since income is also related to the violent crime rate, we will include it in our model to determine whether there is an issue of OVB

  • Check first condition: The correlation between key x variable (afam) and omitted variable (income) is 0.26.

  • Check the second condition: Based on value of 0.41, the omitted variable (income) have a positive impact on prediction (violent)

cor(Guns[, c("violent", "afam", "income")])
##           violent      afam    income
## violent 1.0000000 0.5697884 0.4079864
## afam    0.5697884 1.0000000 0.2626943
## income  0.4079864 0.2626943 1.0000000
  • “afam”, “income” will change over time (more specifc) (why only use these two cause bias)

  • Reducing the OVB with Fixed Effects: Some unobserved factors unique to each state but do not change over time. For instance, some geographic or cultural factors may contribute to prediction. In our case, we could consider state and year as fixed effects.

Fixed Effects Models

First Model

  • The fixed effects model for estimation of the relationship between violent rate and percent of state population that is African-American (afam)

  • \[ violent =\beta_0\ + \beta_1afam\ + StateFixedEffect + \epsilon\ \]

  • A regression of the violent rate on afam and 51 binary regressors

violent_fe_1 <- plm(violent ~ afam, 
            data = Guns,
            index = c('state', 'year'),
            model = 'within')
coeftest(violent_fe_1)
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value  Pr(>|t|)    
## afam  30.9059     6.1915  4.9917 6.936e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Second Model

  • We include both entity (states) and time (year) fixed effects

  • \[ violent =\beta_0\ + \beta_1afam\ + StateFixedEffects + TimeFixedEffects + \epsilon\ \]

  • StateFixedEffects: Fixed effects for each state, controlling for all unobserved, time - invariant characteristic that vary across states but not over time. This could be like cultural and geographical characteristics that are unique to each states.

  • TimeFixedEffects: Fixed effects for each year, controlling all unobserved characteristics that vary over yeas but are constant across all states.

violent_fe_2 <- plm(violent ~ afam, 
            data = Guns,
            index = c('state', 'year'),
            model = 'within',
            effect = 'twoways')
coeftest(violent_fe_2)
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value  Pr(>|t|)    
## afam -34.7667     6.6755 -5.2081 2.277e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparison and Conclusion

ct_lm_1 <- coeftest(lm_1)
ct_violent_fe_1 <- coeftest(violent_fe_1)
ct_violent_fe_2 <- coeftest(violent_fe_2)
ct_lm_1
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 295.0437    11.8855  24.824 < 2.2e-16 ***
## afam         38.9847     1.6431  23.726 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ct_violent_fe_1
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value  Pr(>|t|)    
## afam  30.9059     6.1915  4.9917 6.936e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ct_violent_fe_2
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value  Pr(>|t|)    
## afam -34.7667     6.6755 -5.2081 2.277e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1