#install.packages('AER')
#install.packages("plm")
#install.packages("foreign")
#install.packages("stargazer")
#install.packages("lmtest")
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(readr)
library(AER)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## Loading required package: sandwich
## Loading required package: survival
library(plm)
##
## Attaching package: 'plm'
## The following objects are masked from 'package:dplyr':
##
## between, lag, lead
library(foreign)
Guns is a panel dataset that contains 1173 observations and 13 variables from 1977 to 1999.
state: 51 states in the U.S.
year: From 1977 - 1999.
violent: violent crime rate (incidents per 100,000 members per. population).
murder: murder rate (incidents per 100,000).
robbery: robbery rate (incidents per 100,000).
prisoners: how many people were sentenced to prison in a specific state the year before, relative to the state’s total population.
afam: percent of state population that is African-American, ages 10 to 64.
cauc: percent of state population that is Caucasian, ages 10 to 64.
male: percent of state population that is male, ages 10 to 29.
population: state population (million).
income: real per capita personal income in the states.
density: population per square mile of land area, divided by 1,000.
law: binary variable. The state wether have a shall carry law in effect in that year.
data("Guns")
head(Guns)
## year violent murder robbery prisoners afam cauc male population
## 1 1977 414.4 14.2 96.8 83 8.384873 55.12291 18.17441 3.780403
## 2 1978 419.1 13.3 99.1 94 8.352101 55.14367 17.99408 3.831838
## 3 1979 413.3 13.2 109.5 144 8.329575 55.13586 17.83934 3.866248
## 4 1980 448.5 13.2 132.1 141 8.408386 54.91259 17.73420 3.900368
## 5 1981 470.5 11.9 126.5 149 8.483435 54.92513 17.67372 3.918531
## 6 1982 447.7 10.6 112.0 183 8.514000 54.89621 17.51052 3.925229
## income density state law
## 1 9563.148 0.0745524 Alabama no
## 2 9932.000 0.0755667 Alabama no
## 3 9877.028 0.0762453 Alabama no
## 4 9541.428 0.0768288 Alabama no
## 5 9548.351 0.0771866 Alabama no
## 6 9478.919 0.0773185 Alabama no
str(Guns)
## 'data.frame': 1173 obs. of 13 variables:
## $ year : Factor w/ 23 levels "1977","1978",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ violent : num 414 419 413 448 470 ...
## $ murder : num 14.2 13.3 13.2 13.2 11.9 10.6 9.2 9.4 9.8 10.1 ...
## $ robbery : num 96.8 99.1 109.5 132.1 126.5 ...
## $ prisoners : int 83 94 144 141 149 183 215 243 256 267 ...
## $ afam : num 8.38 8.35 8.33 8.41 8.48 ...
## $ cauc : num 55.1 55.1 55.1 54.9 54.9 ...
## $ male : num 18.2 18 17.8 17.7 17.7 ...
## $ population: num 3.78 3.83 3.87 3.9 3.92 ...
## $ income : num 9563 9932 9877 9541 9548 ...
## $ density : num 0.0746 0.0756 0.0762 0.0768 0.0772 ...
## $ state : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ law : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
# Check for NA in each column
na_counts <- sapply(Guns, function(x) sum(is.na(x)))
# Print the count of NAs for each column
na_counts
## year violent murder robbery prisoners afam cauc
## 0 0 0 0 0 0 0
## male population income density state law
## 0 0 0 0 0 0
For the “Guns” dataset, the time component would be time periods during data collected for each sate.
Year with level in 23 from 1977 to 1999.
Our first assumption: Is there a correlation between a higher percentage of African American population and an increased violent rate?
\[
violent =\beta_0\ + \beta_1afam\ + \epsilon\
\]
lm_1 <- lm(data = Guns,
formula = violent ~ afam,)
summary(lm_1)
##
## Call:
## lm(formula = violent ~ afam, data = Guns)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1109.32 -135.58 -29.15 126.77 1732.84
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 295.044 11.886 24.82 <2e-16 ***
## afam 38.985 1.643 23.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274.8 on 1171 degrees of freedom
## Multiple R-squared: 0.3247, Adjusted R-squared: 0.3241
## F-statistic: 562.9 on 1 and 1171 DF, p-value: < 2.2e-16
With one percent of state population that is African-American increase, the violent crime rate increases 38.985%, holding others constant.
Also, the correlation between afam and violent is statistically significant.
However, there is potential OVB problem for the current model.
Since income is also related to the violent crime rate, we will include it in our model to determine whether there is an issue of OVB
Check first condition: The correlation between key x variable (afam) and omitted variable (income) is 0.26.
Check the second condition: Based on value of 0.41, the omitted variable (income) have a positive impact on prediction (violent)
cor(Guns[, c("violent", "afam", "income")])
## violent afam income
## violent 1.0000000 0.5697884 0.4079864
## afam 0.5697884 1.0000000 0.2626943
## income 0.4079864 0.2626943 1.0000000
“afam”, “income” will change over time (more specifc) (why only use these two cause bias)
Reducing the OVB with Fixed Effects: Some unobserved factors unique to each state but do not change over time. For instance, some geographic or cultural factors may contribute to prediction. In our case, we could consider state and year as fixed effects.
The fixed effects model for estimation of the relationship between violent rate and percent of state population that is African-American (afam)
\[ violent =\beta_0\ + \beta_1afam\ + StateFixedEffect + \epsilon\ \]
A regression of the violent rate on afam and 51 binary regressors
violent_fe_1 <- plm(violent ~ afam,
data = Guns,
index = c('state', 'year'),
model = 'within')
coeftest(violent_fe_1)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## afam 30.9059 6.1915 4.9917 6.936e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We include both entity (states) and time (year) fixed effects
\[ violent =\beta_0\ + \beta_1afam\ + StateFixedEffects + TimeFixedEffects + \epsilon\ \]
StateFixedEffects: Fixed effects for each state, controlling for all unobserved, time - invariant characteristic that vary across states but not over time. This could be like cultural and geographical characteristics that are unique to each states.
TimeFixedEffects: Fixed effects for each year, controlling all unobserved characteristics that vary over yeas but are constant across all states.
violent_fe_2 <- plm(violent ~ afam,
data = Guns,
index = c('state', 'year'),
model = 'within',
effect = 'twoways')
coeftest(violent_fe_2)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## afam -34.7667 6.6755 -5.2081 2.277e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The coefficient of second model becomes smaller.
Question: The coefficient in the third model changed negative. This result seems counterintuitive. It could be due to the limited time period covered by the study.
ct_lm_1 <- coeftest(lm_1)
ct_violent_fe_1 <- coeftest(violent_fe_1)
ct_violent_fe_2 <- coeftest(violent_fe_2)
ct_lm_1
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 295.0437 11.8855 24.824 < 2.2e-16 ***
## afam 38.9847 1.6431 23.726 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ct_violent_fe_1
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## afam 30.9059 6.1915 4.9917 6.936e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ct_violent_fe_2
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## afam -34.7667 6.6755 -5.2081 2.277e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1