Fixed effects regression can be done three ways:
n-1 binary regressors method when n is
small
Entity-demeaned regression. “Changes” method when T
= 2
FE estimator directly.
The third biggest factor of deaths in US is accidents (which includes motor vehicle crashes).
# Clear the workspace
rm(list = ls()) # Clear environment-remove all files from your workspace
gc() # Clear unused memory
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 525924 28.1 1168005 62.4 NA 669268 35.8
## Vcells 972128 7.5 8388608 64.0 32768 1840504 14.1
cat("\f") # Clear the console
graphics.off() # Clear all graphs
# Prepare needed libraries
packages <- c("AER", # load dataset
"plm", # for FE
"stargazer", # summary stats for sharing
"ggplot2", #for graphing,
"psych",
"naniar", # for visualisation of missing data,
"visdat", # for visualisation of missing data,
"VIM", # for visualisation of missing data,
"DataExplorer", # for visualisation of missing data,
"dplyr",
"magrittr"
)
for (i in 1:length(packages)) {
if (!packages[i] %in% rownames(installed.packages())) {
install.packages(packages[i]
, repos = "http://cran.rstudio.com/"
, dependencies = TRUE
)
}
library(packages[i], character.only = TRUE)
}
rm(packages)
Load data and generate fatality rate dependent variable.
## data from Stock and Watson (2007)
data("Fatalities",
package = "AER"
)
Fatalities_original <- Fatalities
## add fatality rate (number of traffic deaths per 10,000 people living in that state in that year)
Fatalities$frate <- with(data = Fatalities,
expr = fatal/pop * 10000
)
## ALL VARIABLES IN THE ORIGINAL DATAFRAME
stargazer(Fatalities,
type = "text", # html, latex
# out =
# summary.stat =
# covariate.labels = labels,
digits = 1)
##
## ===============================================================
## Statistic N Mean St. Dev. Min Max
## ---------------------------------------------------------------
## spirits 336 1.8 0.7 0.8 4.9
## unemp 336 7.3 2.5 2.4 18.0
## income 336 13,880.2 2,253.0 9,513.8 22,193.5
## emppop 336 60.8 4.7 43.0 71.3
## beertax 336 0.5 0.5 0.04 2.7
## baptist 336 7.2 9.8 0.0 30.4
## mormon 336 2.8 9.7 0.1 65.9
## drinkage 336 20.5 0.9 18.0 21.0
## dry 336 4.3 9.5 0.0 45.8
## youngdrivers 336 0.2 0.02 0.1 0.3
## miles 336 7,890.8 1,475.7 4,576.3 26,148.3
## fatal 336 928.7 934.1 79 5,504
## nfatal 336 182.6 188.4 13 1,049
## sfatal 336 109.9 108.5 8 603
## fatal1517 336 62.6 55.7 3 318
## nfatal1517 336 12.3 12.3 0 76
## fatal1820 336 106.7 104.2 7 601
## nfatal1820 336 33.5 33.2 0 196
## fatal2124 336 126.9 131.8 12 770
## nfatal2124 336 41.4 42.9 1 249
## afatal 336 293.3 303.6 24.6 2,094.9
## pop 336 4,930,272.0 5,073,704.0 478,999.7 28,314,028.0
## pop1517 336 230,815.5 229,896.3 21,000.0 1,172,000.0
## pop1820 336 249,090.4 249,345.6 21,000.0 1,321,004.0
## pop2124 336 336,389.9 345,304.4 30,000.2 1,892,998.0
## milestot 336 37,101.5 37,454.4 3,993.0 241,575.0
## unempus 336 7.5 1.5 5.5 9.7
## emppopus 336 60.0 1.6 57.8 62.3
## gsp 336 0.03 0.04 -0.1 0.1
## frate 336 2.0 0.6 0.8 4.2
## ---------------------------------------------------------------
A data frame containing 336 observations on 34 variables.
Balanced Panel !
state - factor indicating state.
year - factor indicating year.
breath - factor. Preliminary breath test
law?
jail - factor. Mandatory jail sentence?
service - factor. Mandatory community
service?
# re-order levels
reorder_size <- function(x) {
factor(x, levels = names(sort(table(x), decreasing = TRUE)))
}
ggplot(data = Fatalities,
aes(x = reorder_size(year)
)
) +
geom_bar() +
xlab("Year") + ylab("Frequency") +
theme(axis.text.x = element_text(angle = 45)
)
ggplot(data = Fatalities,
aes(x = reorder_size(state)
)
) +
geom_bar() +
xlab("State") + ylab("Frequency") +
theme(axis.text.x = element_text(angle = 90)
)
Basic economic variables, demographic decomposition by religion, drinking by age, fatality decomposition like by night or single vehicle, and by age,…
spirits - numeric. Spirits consumption.
unemp - numeric. Unemployment rate.
income - numeric. Per capita personal income in 1987 dollars. US Bureau of Economic Analysis.
emppop - numeric. Employment/population ratio.
beertax - numeric. Tax on case of beer, which is an available measure of state alcohol taxes more generally.
baptist - numeric. Percent of southern baptist.
mormon - numeric. Percent of mormon.
drinkage - numeric. Minimum legal drinking age. EG. 18, 19, or 20.
dry - numeric. Percent residing in “dry” countries.
youngdrivers - numeric. Percent of drivers aged 15–24.
miles - numeric. Average miles per driver. Department of Transportation.
fatal - numeric. Number of vehicle fatalities. Department of Transportation Fatal Accident Reporting System.
nfatal - numeric. Number of night-time vehicle fatalities.
sfatal - numeric. Number of single vehicle fatalities.
fatal1517 - numeric. Number of vehicle fatalities, 15–17 year olds.
nfatal1517 - numeric. Number of night-time vehicle fatalities, 15–17 year olds.
fatal1820 - numeric. Number of vehicle fatalities, 18–20 year olds.
nfatal1820 - numeric. Number of night-time vehicle fatalities, 18–20 year olds.
fatal2124 - numeric. Number of vehicle fatalities, 21–24 year olds.
nfatal2124 - numeric. Number of night-time vehicle fatalities, 21–24 year olds.
afatal - numeric. Number of alcohol-involved vehicle fatalities.
pop - numeric. Population.
pop1517 - numeric. Population, 15–17 year olds.
pop1820 - numeric. Population, 18–20 year olds.
pop2124 - numeric. Population, 21–24 year olds.
milestot - numeric. Total vehicle miles (millions).
unempus - numeric. US unemployment rate. US Bureau of Labor Statistics.
emppopus - numeric. US employment/population ratio.
gsp - numeric. GSP rate of change.
We would expect more beer drinking linked with more fatal car crashes.
States mainly levy excise taxes on beer, wine, and spirits, although some states levy taxes on additional beverages (such as sparkling wine and cider).
Thus, we would expect more beer taxes (reduces beer sales) linked with less fatal car crashes.
Unfortunately, we find that more state beer tax leads to more fatal DUI (driving under influence) related deaths, not less !!!
fatal_lm_mod <- lm(data = Fatalities,
formula = frate ~ beertax ,
)
summary(fatal_lm_mod)
##
## Call:
## lm(formula = frate ~ beertax, data = Fatalities)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.09060 -0.37768 -0.09436 0.28548 2.27643
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.85331 0.04357 42.539 < 2e-16 ***
## beertax 0.36461 0.06217 5.865 1.08e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5437 on 334 degrees of freedom
## Multiple R-squared: 0.09336, Adjusted R-squared: 0.09065
## F-statistic: 34.39 on 1 and 334 DF, p-value: 1.082e-08
If the tax on a case of beer increases by $1, then the number of vehicle fatalities increases 0.36 deaths per 10,000 people. This is statistically significant, seems a bit high and in the unexpected direction.
What could cause this result??? Omitted Variable Bias due to unobserved heterogeneity ???
Risk of a bias due to omitted factors that vary across states but not over time. Use State Fixed Effects. Start by thinking about factors that determine traffic fatality rate, and could be correlated with beer taxes
Density of cars on the road. Specifically, “high taxes” could reflect “high traffic density” (so the OLS coefficient would be biased positively – high taxes, more deaths). Panel data lets us eliminate omitted variable bias when the omitted variables are constant over time within a given state.
Quality (age) of automobiles / Vintage of autos on the road. Older vehicles may be less safe and more likely to cause traffic death. If the vintage of auto happens to be negatively correlated with beer taxes, then there will be a omitted variable bias.
Culture around drinking and driving. Potentially are correlated with the beer tax, so beer taxes could be picking up cultural differences (omitted variable bias). Specifically, “high taxes” could reflect “cultural attitudes towards drinking” (so the OLS coefficient would be biased). OVB table
Driving conditions in states/Quality of roads. Roads may be worse in some states than others, and thus correlated with more fatalities. If driving infrastructure is also (negatively) correlated to alcohol taxes, then OVB.
Weather conditions in states. Some states may have colder weather, causing more skids and thus deaths. If colder weather is also correlated with alcohol taxes, then OVB.
Other omitted variables that vary over time and thus cause a bias. Use Time Fixed Effects
Improvements in auto safety over time
Changing national attitudes towards drunk driving
We have 48 binary regressors here — one for each federal state.
\(Fatality \ Rate_{it} = \beta_0 + \beta_1 Beer \ Tax _{it} + \delta_2 State \ 2_i + \delta_3 State \ 3_i + ... + \delta_n State \ n_i + u_{it}\)
fatal_fe_lm_mod <- lm( data = Fatalities,
formula = frate ~ beertax + state,
)
summary(fatal_fe_lm_mod)
##
## Call:
## lm(formula = frate ~ beertax + state, data = Fatalities)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58696 -0.08284 -0.00127 0.07955 0.89780
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.47763 0.31336 11.098 < 2e-16 ***
## beertax -0.65587 0.18785 -3.491 0.000556 ***
## stateaz -0.56773 0.26667 -2.129 0.034107 *
## statear -0.65495 0.21902 -2.990 0.003028 **
## stateca -1.50947 0.30435 -4.960 1.21e-06 ***
## stateco -1.48428 0.28735 -5.165 4.50e-07 ***
## statect -1.86226 0.28053 -6.638 1.58e-10 ***
## statede -1.30760 0.29395 -4.448 1.24e-05 ***
## statefl -0.26813 0.13933 -1.924 0.055284 .
## statega 0.52460 0.18395 2.852 0.004661 **
## stateid -0.66902 0.25797 -2.593 0.009989 **
## stateil -1.96162 0.29150 -6.730 9.23e-11 ***
## statein -1.46154 0.27254 -5.363 1.69e-07 ***
## stateia -1.54393 0.25344 -6.092 3.58e-09 ***
## stateks -1.22322 0.24544 -4.984 1.08e-06 ***
## stateky -1.21752 0.28717 -4.240 3.02e-05 ***
## statela -0.84712 0.18867 -4.490 1.03e-05 ***
## stateme -1.10795 0.19112 -5.797 1.78e-08 ***
## statemd -1.70644 0.28322 -6.025 5.17e-09 ***
## statema -2.10975 0.27610 -7.641 3.24e-13 ***
## statemi -1.48453 0.23602 -6.290 1.18e-09 ***
## statemn -1.89721 0.26509 -7.157 6.92e-12 ***
## statems -0.02908 0.14845 -0.196 0.844839
## statemo -1.29626 0.26669 -4.861 1.93e-06 ***
## statemt -0.36039 0.26396 -1.365 0.173225
## statene -1.52218 0.24928 -6.106 3.30e-09 ***
## statenv -0.60077 0.28595 -2.101 0.036517 *
## statenh -1.25445 0.20968 -5.983 6.53e-09 ***
## statenj -2.10575 0.30720 -6.855 4.37e-11 ***
## statenm 0.42637 0.25432 1.677 0.094724 .
## stateny -2.18667 0.29890 -7.316 2.57e-12 ***
## statenc -0.29047 0.11984 -2.424 0.015979 *
## statend -1.62344 0.25381 -6.396 6.45e-10 ***
## stateoh -1.67442 0.25381 -6.597 2.02e-10 ***
## stateok -0.54506 0.16912 -3.223 0.001415 **
## stateor -1.16800 0.28572 -4.088 5.65e-05 ***
## statepa -1.76747 0.27610 -6.402 6.26e-10 ***
## stateri -2.26505 0.29376 -7.711 2.07e-13 ***
## statesc 0.55717 0.11000 5.065 7.30e-07 ***
## statesd -1.00372 0.20962 -4.788 2.70e-06 ***
## statetn -0.87566 0.26802 -3.267 0.001218 **
## statetx -0.91747 0.24556 -3.736 0.000225 ***
## stateut -1.16395 0.19642 -5.926 8.88e-09 ***
## statevt -0.96604 0.21113 -4.576 7.08e-06 ***
## stateva -1.29018 0.20416 -6.320 9.99e-10 ***
## statewa -1.65952 0.28346 -5.854 1.31e-08 ***
## statewv -0.89675 0.24661 -3.636 0.000328 ***
## statewi -1.75927 0.29395 -5.985 6.44e-09 ***
## statewy -0.22850 0.31290 -0.730 0.465811
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1899 on 287 degrees of freedom
## Multiple R-squared: 0.905, Adjusted R-squared: 0.8891
## F-statistic: 56.97 on 48 and 287 DF, p-value: < 2.2e-16
stargazer(fatal_fe_lm_mod,
type="text"
)
##
## ===============================================
## Dependent variable:
## ---------------------------
## frate
## -----------------------------------------------
## beertax -0.656***
## (0.188)
##
## stateaz -0.568**
## (0.267)
##
## statear -0.655***
## (0.219)
##
## stateca -1.509***
## (0.304)
##
## stateco -1.484***
## (0.287)
##
## statect -1.862***
## (0.281)
##
## statede -1.308***
## (0.294)
##
## statefl -0.268*
## (0.139)
##
## statega 0.525***
## (0.184)
##
## stateid -0.669***
## (0.258)
##
## stateil -1.962***
## (0.291)
##
## statein -1.462***
## (0.273)
##
## stateia -1.544***
## (0.253)
##
## stateks -1.223***
## (0.245)
##
## stateky -1.218***
## (0.287)
##
## statela -0.847***
## (0.189)
##
## stateme -1.108***
## (0.191)
##
## statemd -1.706***
## (0.283)
##
## statema -2.110***
## (0.276)
##
## statemi -1.485***
## (0.236)
##
## statemn -1.897***
## (0.265)
##
## statems -0.029
## (0.148)
##
## statemo -1.296***
## (0.267)
##
## statemt -0.360
## (0.264)
##
## statene -1.522***
## (0.249)
##
## statenv -0.601**
## (0.286)
##
## statenh -1.254***
## (0.210)
##
## statenj -2.106***
## (0.307)
##
## statenm 0.426*
## (0.254)
##
## stateny -2.187***
## (0.299)
##
## statenc -0.290**
## (0.120)
##
## statend -1.623***
## (0.254)
##
## stateoh -1.674***
## (0.254)
##
## stateok -0.545***
## (0.169)
##
## stateor -1.168***
## (0.286)
##
## statepa -1.767***
## (0.276)
##
## stateri -2.265***
## (0.294)
##
## statesc 0.557***
## (0.110)
##
## statesd -1.004***
## (0.210)
##
## statetn -0.876***
## (0.268)
##
## statetx -0.917***
## (0.246)
##
## stateut -1.164***
## (0.196)
##
## statevt -0.966***
## (0.211)
##
## stateva -1.290***
## (0.204)
##
## statewa -1.660***
## (0.283)
##
## statewv -0.897***
## (0.247)
##
## statewi -1.759***
## (0.294)
##
## statewy -0.229
## (0.313)
##
## Constant 3.478***
## (0.313)
##
## -----------------------------------------------
## Observations 336
## R2 0.905
## Adjusted R2 0.889
## Residual Std. Error 0.190 (df = 287)
## F Statistic 56.969*** (df = 48; 287)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
stargazer(fatal_lm_mod, fatal_fe_lm_mod,
type = 'text',
column.labels = c("OLS with no State Dummy","OLS with State Dummy")
)
##
## ====================================================================
## Dependent variable:
## ------------------------------------------------
## frate
## OLS with no State Dummy OLS with State Dummy
## (1) (2)
## --------------------------------------------------------------------
## beertax 0.365*** -0.656***
## (0.062) (0.188)
##
## stateaz -0.568**
## (0.267)
##
## statear -0.655***
## (0.219)
##
## stateca -1.509***
## (0.304)
##
## stateco -1.484***
## (0.287)
##
## statect -1.862***
## (0.281)
##
## statede -1.308***
## (0.294)
##
## statefl -0.268*
## (0.139)
##
## statega 0.525***
## (0.184)
##
## stateid -0.669***
## (0.258)
##
## stateil -1.962***
## (0.291)
##
## statein -1.462***
## (0.273)
##
## stateia -1.544***
## (0.253)
##
## stateks -1.223***
## (0.245)
##
## stateky -1.218***
## (0.287)
##
## statela -0.847***
## (0.189)
##
## stateme -1.108***
## (0.191)
##
## statemd -1.706***
## (0.283)
##
## statema -2.110***
## (0.276)
##
## statemi -1.485***
## (0.236)
##
## statemn -1.897***
## (0.265)
##
## statems -0.029
## (0.148)
##
## statemo -1.296***
## (0.267)
##
## statemt -0.360
## (0.264)
##
## statene -1.522***
## (0.249)
##
## statenv -0.601**
## (0.286)
##
## statenh -1.254***
## (0.210)
##
## statenj -2.106***
## (0.307)
##
## statenm 0.426*
## (0.254)
##
## stateny -2.187***
## (0.299)
##
## statenc -0.290**
## (0.120)
##
## statend -1.623***
## (0.254)
##
## stateoh -1.674***
## (0.254)
##
## stateok -0.545***
## (0.169)
##
## stateor -1.168***
## (0.286)
##
## statepa -1.767***
## (0.276)
##
## stateri -2.265***
## (0.294)
##
## statesc 0.557***
## (0.110)
##
## statesd -1.004***
## (0.210)
##
## statetn -0.876***
## (0.268)
##
## statetx -0.917***
## (0.246)
##
## stateut -1.164***
## (0.196)
##
## statevt -0.966***
## (0.211)
##
## stateva -1.290***
## (0.204)
##
## statewa -1.660***
## (0.283)
##
## statewv -0.897***
## (0.247)
##
## statewi -1.759***
## (0.294)
##
## statewy -0.229
## (0.313)
##
## Constant 1.853*** 3.478***
## (0.044) (0.313)
##
## --------------------------------------------------------------------
## Observations 336 336
## R2 0.093 0.905
## Adjusted R2 0.091 0.889
## Residual Std. Error 0.544 (df = 334) 0.190 (df = 287)
## F Statistic 34.394*** (df = 1; 334) 56.969*** (df = 48; 287)
## ====================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
\(Fatality \ Rate_{it} - \bar{Fatality \ Rate_{i}} = \beta_1 ( Beer \ Tax _{it} - \bar{Beer \ Tax _{it} }) + (u_{it}-\bar{u_i})\)
EQUIVALENTLY
\(\tilde{Fatality \ Rate} = \beta_1 \tilde{Beer \ Tax} + u_{it}\)
Derivation
\(Y_{it} = \beta_0 + \beta_1 X_{it} + \beta_2 Z_{i} + u_{it}\) implies
Eq 1:
\(Y_{it} = \alpha_i + \beta_1 X_{it} + u_{it}\) where \(\alpha_i = \beta_0 + \beta_2 Z_{i}\)
We have just rewritten the equation above.
Taking average on both sides -
Eq 2:
\(\bar{Y_i} = \beta_1 \bar{X_{i}} + \alpha_i + \bar{u_{i}}\)
Subtracting Eq 2 from Eq 1 gives -
\(Y_{it} - \bar{Y_i} = \beta_1 ( X_{it} - \bar{X_{i}}) + (u_{it}-\bar{u_{i}})\)
The function ave is convenient for computing
group averages. We use it to obtain state specific averages of the
fatality rate and the beer tax.
# obtain demeaned data
?ave
Fatalities_demeaned <- with(data = Fatalities,
expr = data.frame(frate = frate - ave(x = frate, state), # frate grouped by age
beertax = beertax - ave(x = beertax, state) # beertax grouped by age
)
)
# estimate the regression
fatal_demeaned_lm_mod <-
lm(data = Fatalities_demeaned,
formula = frate ~ beertax - 1 , # indicating that you are regressing frate on beertax without an intercept
)
summary(fatal_demeaned_lm_mod)
##
## Call:
## lm(formula = frate ~ beertax - 1, data = Fatalities_demeaned)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58696 -0.08284 -0.00127 0.07955 0.89780
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## beertax -0.6559 0.1739 -3.772 0.000191 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1757 on 335 degrees of freedom
## Multiple R-squared: 0.04074, Adjusted R-squared: 0.03788
## F-statistic: 14.23 on 1 and 335 DF, p-value: 0.0001913
The ID variable for entities is named state and the time
id variable is year.
Since the fixed effects estimator is also called the within
estimator, we set model = "within". Do not
set model="random" as that would run a random effects
model.
# install and load the 'plm' package
## install.packages("plm")
library(plm)
# estimate the fixed effects regression with plm()
fatal_fe_mod <- plm(formula = frate ~ beertax,
data = Fatalities,
index = c("state", "year"), # declaring data to be panel
model = "within" # fixed effects mdodel
)
summary(fatal_fe_mod )
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = frate ~ beertax, data = Fatalities, model = "within",
## index = c("state", "year"))
##
## Balanced Panel: n = 48, T = 7, N = 336
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.5869619 -0.0828376 -0.0012701 0.0795454 0.8977960
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## beertax -0.65587 0.18785 -3.4915 0.000556 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 10.785
## Residual Sum of Squares: 10.345
## R-Squared: 0.040745
## Adj. R-Squared: -0.11969
## F-statistic: 12.1904 on 1 and 287 DF, p-value: 0.00055597
# print summary using robust standard errors
coeftest(fatal_fe_mod, vcov. = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## beertax -0.65587 0.28880 -2.271 0.02388 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note we had to declare our data to be panel in order to use
plm command.
Declaring data as panel in R can have several benefits, including:
Panel data commands: When data is declared as panel, it allows
the use of specialized panel data commands in R, such as the
plm package, which can be used to estimate panel data
models, such as fixed effects and random effects models.
Easy subsetting: Panel data can be easily subsetted by individual or time, which allows for easier analysis and visualization of data.
Easy aggregation: Panel data can also be easily aggregated by individual or time, which can be useful for summarizing data and creating summary statistics.
Panel-specific visualizations: Panel data can be used to create panel-specific visualizations, such as panel plots, which allow for easy comparison of variables across time and individuals.
Panel data modeling: Panel data models can provide more accurate estimates of coefficients, as they control for individual-level and time-specific heterogeneity that can bias estimates in traditional cross-sectional or time-series models.
stargazer(#fatal_lm_mod, # OLS
fatal_fe_lm_mod, fatal_demeaned_lm_mod, fatal_fe_mod, # fixed effect
type = "text",
column.labels = c(#"OLS - no Dummy",
"OLS - State Dummy",
"Demeaned Reg",
"FE"
)
)
##
## ============================================================================================
## Dependent variable:
## ------------------------------------------------------------------------
## frate
## OLS panel
## linear
## OLS - State Dummy Demeaned Reg FE
## (1) (2) (3)
## --------------------------------------------------------------------------------------------
## beertax -0.656*** -0.656*** -0.656***
## (0.188) (0.174) (0.188)
##
## stateaz -0.568**
## (0.267)
##
## statear -0.655***
## (0.219)
##
## stateca -1.509***
## (0.304)
##
## stateco -1.484***
## (0.287)
##
## statect -1.862***
## (0.281)
##
## statede -1.308***
## (0.294)
##
## statefl -0.268*
## (0.139)
##
## statega 0.525***
## (0.184)
##
## stateid -0.669***
## (0.258)
##
## stateil -1.962***
## (0.291)
##
## statein -1.462***
## (0.273)
##
## stateia -1.544***
## (0.253)
##
## stateks -1.223***
## (0.245)
##
## stateky -1.218***
## (0.287)
##
## statela -0.847***
## (0.189)
##
## stateme -1.108***
## (0.191)
##
## statemd -1.706***
## (0.283)
##
## statema -2.110***
## (0.276)
##
## statemi -1.485***
## (0.236)
##
## statemn -1.897***
## (0.265)
##
## statems -0.029
## (0.148)
##
## statemo -1.296***
## (0.267)
##
## statemt -0.360
## (0.264)
##
## statene -1.522***
## (0.249)
##
## statenv -0.601**
## (0.286)
##
## statenh -1.254***
## (0.210)
##
## statenj -2.106***
## (0.307)
##
## statenm 0.426*
## (0.254)
##
## stateny -2.187***
## (0.299)
##
## statenc -0.290**
## (0.120)
##
## statend -1.623***
## (0.254)
##
## stateoh -1.674***
## (0.254)
##
## stateok -0.545***
## (0.169)
##
## stateor -1.168***
## (0.286)
##
## statepa -1.767***
## (0.276)
##
## stateri -2.265***
## (0.294)
##
## statesc 0.557***
## (0.110)
##
## statesd -1.004***
## (0.210)
##
## statetn -0.876***
## (0.268)
##
## statetx -0.917***
## (0.246)
##
## stateut -1.164***
## (0.196)
##
## statevt -0.966***
## (0.211)
##
## stateva -1.290***
## (0.204)
##
## statewa -1.660***
## (0.283)
##
## statewv -0.897***
## (0.247)
##
## statewi -1.759***
## (0.294)
##
## statewy -0.229
## (0.313)
##
## Constant 3.478***
## (0.313)
##
## --------------------------------------------------------------------------------------------
## Observations 336 336 336
## R2 0.905 0.041 0.041
## Adjusted R2 0.889 0.038 -0.120
## Residual Std. Error 0.190 (df = 287) 0.176 (df = 335)
## F Statistic 56.969*** (df = 48; 287) 14.229*** (df = 1; 335) 12.190*** (df = 1; 287)
## ============================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Limitations / Challenges
Need variation in X over time within states
Time lag effects can be important
You should use heteroskedasticity- and autocorrelation consistent (clustered) standard errors if you think \(u_{it}\) could be correlated over time
You were doing one way FE only above (state).
Sometimes you want to do two way FE (state and time).
Here, it seems like time FE are not as important as state FE.
fatal_fe_lm_mod2 <- lm(data = Fatalities,
formula = frate ~ beertax + state + year,
)
# summary(fatal_fe_lm_mod2)
# estimate the fixed effects regression with plm()
fatal_fe_mod2 <- plm(formula = frate ~ beertax + as.factor(year),
data = Fatalities,
index = c("state", "year"), # declaring data to be panel
model = "within" # fixed effects mdodel
)
# summary(fatal_fe_mod2)
stargazer(fatal_fe_lm_mod, # State - One Way OLS (FE)
fatal_fe_lm_mod2, # State - Time (Two Way) OLS
fatal_fe_mod2, # State - Time (Two Way) FE
type = "text",
column.labels = c("FE - One way (state)",
"FE - Two way (state-time) OLS",
"FE - Two way (state-time) PLM"
)
)
##
## ========================================================================================================
## Dependent variable:
## ------------------------------------------------------------------------------------
## frate
## OLS panel
## linear
## FE - One way (state) FE - Two way (state-time) OLS FE - Two way (state-time) PLM
## (1) (2) (3)
## --------------------------------------------------------------------------------------------------------
## beertax -0.656*** -0.640*** -0.640***
## (0.188) (0.197) (0.197)
##
## stateaz -0.568** -0.547*
## (0.267) (0.278)
##
## statear -0.655*** -0.639***
## (0.219) (0.227)
##
## stateca -1.509*** -1.485***
## (0.304) (0.318)
##
## stateco -1.484*** -1.462***
## (0.287) (0.300)
##
## statect -1.862*** -1.840***
## (0.281) (0.293)
##
## statede -1.308*** -1.284***
## (0.294) (0.307)
##
## statefl -0.268* -0.260*
## (0.139) (0.142)
##
## statega 0.525*** 0.512***
## (0.184) (0.190)
##
## stateid -0.669*** -0.649**
## (0.258) (0.269)
##
## stateil -1.962*** -1.939***
## (0.291) (0.304)
##
## statein -1.462*** -1.440***
## (0.273) (0.284)
##
## stateia -1.544*** -1.524***
## (0.253) (0.264)
##
## stateks -1.223*** -1.204***
## (0.245) (0.255)
##
## stateky -1.218*** -1.195***
## (0.287) (0.300)
##
## statela -0.847*** -0.834***
## (0.189) (0.195)
##
## stateme -1.108*** -1.094***
## (0.191) (0.198)
##
## statemd -1.706*** -1.684***
## (0.283) (0.295)
##
## statema -2.110*** -2.088***
## (0.276) (0.288)
##
## statemi -1.485*** -1.466***
## (0.236) (0.245)
##
## statemn -1.897*** -1.876***
## (0.265) (0.276)
##
## statems -0.029 -0.020
## (0.148) (0.152)
##
## statemo -1.296*** -1.275***
## (0.267) (0.278)
##
## statemt -0.360 -0.340
## (0.264) (0.275)
##
## statene -1.522*** -1.503***
## (0.249) (0.259)
##
## statenv -0.601** -0.578*
## (0.286) (0.298)
##
## statenh -1.254*** -1.239***
## (0.210) (0.217)
##
## statenj -2.106*** -2.081***
## (0.307) (0.321)
##
## statenm 0.426* 0.446*
## (0.254) (0.265)
##
## stateny -2.187*** -2.163***
## (0.299) (0.312)
##
## statenc -0.290** -0.285**
## (0.120) (0.121)
##
## statend -1.623*** -1.604***
## (0.254) (0.264)
##
## stateoh -1.674*** -1.655***
## (0.254) (0.264)
##
## stateok -0.545*** -0.534***
## (0.169) (0.174)
##
## stateor -1.168*** -1.145***
## (0.286) (0.298)
##
## statepa -1.767*** -1.746***
## (0.276) (0.288)
##
## stateri -2.265*** -2.242***
## (0.294) (0.307)
##
## statesc 0.557*** 0.554***
## (0.110) (0.110)
##
## statesd -1.004*** -0.988***
## (0.210) (0.217)
##
## statetn -0.876*** -0.855***
## (0.268) (0.279)
##
## statetx -0.917*** -0.899***
## (0.246) (0.256)
##
## stateut -1.164*** -1.150***
## (0.196) (0.203)
##
## statevt -0.966*** -0.950***
## (0.211) (0.219)
##
## stateva -1.290*** -1.275***
## (0.204) (0.211)
##
## statewa -1.660*** -1.637***
## (0.283) (0.296)
##
## statewv -0.897*** -0.878***
## (0.247) (0.257)
##
## statewi -1.759*** -1.736***
## (0.294) (0.307)
##
## statewy -0.229 -0.203
## (0.313) (0.327)
##
## year1983 -0.080**
## (0.038)
##
## year1984 -0.072*
## (0.038)
##
## year1985 -0.124***
## (0.038)
##
## year1986 -0.038
## (0.039)
##
## year1987 -0.051
## (0.039)
##
## year1988 -0.052
## (0.040)
##
## as.factor(year)1983 -0.080**
## (0.038)
##
## as.factor(year)1984 -0.072*
## (0.038)
##
## as.factor(year)1985 -0.124***
## (0.038)
##
## as.factor(year)1986 -0.038
## (0.039)
##
## as.factor(year)1987 -0.051
## (0.039)
##
## as.factor(year)1988 -0.052
## (0.040)
##
## Constant 3.478*** 3.511***
## (0.313) (0.333)
##
## --------------------------------------------------------------------------------------------------------
## Observations 336 336 336
## R2 0.905 0.909 0.080
## Adjusted R2 0.889 0.891 -0.096
## Residual Std. Error 0.190 (df = 287) 0.188 (df = 281)
## F Statistic 56.969*** (df = 48; 287) 51.934*** (df = 54; 281) 3.503*** (df = 7; 281)
## ========================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
An unbalanced data (compared to balanced data) will have much larger
confidence interval (as SE increase due to smaller n).
set.seed(7)
Fatalities_unbalanced <- Fatalities %>% select(frate, beertax, year, state)
indices_to_replace <- sample(x = 1: length(Fatalities_unbalanced$frate),
size = round(.3 * length(Fatalities_unbalanced$frate)),
replace = TRUE
)
Fatalities_unbalanced$frate[indices_to_replace] <- NA
describe(Fatalities_unbalanced$frate)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 249 2.04 0.6 1.95 1.99 0.58 0.82 4.22 3.4 0.78 0.63 0.04
indices_to_replace <- sample(x = 1: length(Fatalities_unbalanced$beertax),
size = .3 * length(Fatalities_unbalanced$beertax),
replace = TRUE
)
Fatalities_unbalanced$beertax[indices_to_replace] <- NA
describe(Fatalities_unbalanced$beertax)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 247 0.51 0.45 0.36 0.42 0.27 0.04 2.61 2.57 2.07 4.79 0.03
vis_dat(Fatalities_unbalanced)
vis_miss(Fatalities_unbalanced)
naniar::gg_miss_upset(Fatalities_unbalanced)
?gg_miss_upset
sum(!is.na(Fatalities_unbalanced$frate) & !is.na(Fatalities_unbalanced$beertax))
## [1] 181
# original model
fatal_fe_lm_mod <- lm(data = Fatalities,
formula = frate ~ beertax + state,
)
# data with NA
fatal_fe_lm_mod_NA <- lm(data = Fatalities_unbalanced,
formula = frate ~ beertax + state,
)
# summary(fatal_fe_mod2)
stargazer(fatal_fe_lm_mod, # State - One Way OLS (FE)
fatal_fe_lm_mod_NA, # State - One Way OLS (FE) with NA data
type = "text",
column.labels = c("State - One Way OLS (FE)",
"State - One Way OLS (FE) with NA data"
)
)
##
## ==================================================================================
## Dependent variable:
## --------------------------------------------------------------
## frate
## State - One Way OLS (FE) State - One Way OLS (FE) with NA data
## (1) (2)
## ----------------------------------------------------------------------------------
## beertax -0.656*** -1.157***
## (0.188) (0.356)
##
## stateaz -0.568** -1.014**
## (0.267) (0.463)
##
## statear -0.655*** -1.086***
## (0.219) (0.378)
##
## stateca -1.509*** -2.264***
## (0.304) (0.549)
##
## stateco -1.484*** -2.241***
## (0.287) (0.519)
##
## statect -1.862*** -2.533***
## (0.281) (0.502)
##
## statede -1.308*** -1.985***
## (0.294) (0.531)
##
## statefl -0.268* -0.478**
## (0.139) (0.221)
##
## statega 0.525*** 0.905***
## (0.184) (0.290)
##
## stateid -0.669*** -1.285***
## (0.258) (0.463)
##
## stateil -1.962*** -2.645***
## (0.291) (0.526)
##
## statein -1.462*** -2.130***
## (0.273) (0.488)
##
## stateia -1.544*** -2.261***
## (0.253) (0.458)
##
## stateks -1.223*** -1.864***
## (0.245) (0.435)
##
## stateky -1.218*** -1.931***
## (0.287) (0.516)
##
## statela -0.847*** -1.242***
## (0.189) (0.318)
##
## stateme -1.108*** -1.439***
## (0.191) (0.336)
##
## statemd -1.706*** -2.382***
## (0.283) (0.513)
##
## statema -2.110*** -2.785***
## (0.276) (0.493)
##
## statemi -1.485*** -2.053***
## (0.236) (0.418)
##
## statemn -1.897*** -2.595***
## (0.265) (0.480)
##
## statems -0.029 -0.302
## (0.148) (0.241)
##
## statemo -1.296*** -1.974***
## (0.267) (0.479)
##
## statemt -0.360 -0.816*
## (0.264) (0.475)
##
## statene -1.522*** -2.243***
## (0.249) (0.467)
##
## statenv -0.601** -1.179**
## (0.286) (0.521)
##
## statenh -1.254*** -1.767***
## (0.210) (0.370)
##
## statenj -2.106*** -2.889***
## (0.307) (0.557)
##
## statenm 0.426* -0.310
## (0.254) (0.446)
##
## stateny -2.187*** -2.914***
## (0.299) (0.539)
##
## statenc -0.290** -0.539***
## (0.120) (0.193)
##
## statend -1.623*** -2.324***
## (0.254) (0.457)
##
## stateoh -1.674*** -2.305***
## (0.254) (0.454)
##
## stateok -0.545*** -0.927***
## (0.169) (0.274)
##
## stateor -1.168*** -1.851***
## (0.286) (0.514)
##
## statepa -1.767*** -2.378***
## (0.276) (0.509)
##
## stateri -2.265*** -3.058***
## (0.294) (0.531)
##
## statesc 0.557*** 0.637***
## (0.110) (0.178)
##
## statesd -1.004*** -1.518***
## (0.210) (0.383)
##
## statetn -0.876*** -1.583***
## (0.268) (0.480)
##
## statetx -0.917*** -1.615***
## (0.246) (0.439)
##
## stateut -1.164*** -1.572***
## (0.196) (0.319)
##
## statevt -0.966*** -1.380***
## (0.211) (0.398)
##
## stateva -1.290*** -1.761***
## (0.204) (0.348)
##
## statewa -1.660*** -2.309***
## (0.283) (0.511)
##
## statewv -0.897*** -1.465***
## (0.247) (0.439)
##
## statewi -1.759*** -2.492***
## (0.294) (0.530)
##
## statewy -0.229 -0.644
## (0.313) (0.575)
##
## Constant 3.478*** 4.289***
## (0.313) (0.576)
##
## ----------------------------------------------------------------------------------
## Observations 336 181
## R2 0.905 0.919
## Adjusted R2 0.889 0.890
## Residual Std. Error 0.190 (df = 287) 0.190 (df = 132)
## F Statistic 56.969*** (df = 48; 287) 31.360*** (df = 48; 132)
## ==================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
paste0(-1.157 - 1.65 * 0.355, "," ,-1.157 + 1.65 * 0.355 )
## [1] "-1.74275,-0.57125"
paste0(-0.656 - 1.65 * 0.188, "," ,-0.656 + 1.65 * 0.188 )
## [1] "-0.9662,-0.3458"
CI is indeed larger for the unbalanced data, as expected.
In Economics, we mostly have fixed effects (and not random effects or mixed effects {not to be confused with mixed models} ).
The rationale behind random effects model is that, unlike the fixed effects model, the variation across entities is assumed to be random and uncorrelated with the predictor or independent variables included in the model:
“…the crucial distinction between fixed and random effects is whether the unobserved individual effect embodies elements that are correlated with the regressors in the model, not whether these effects are stochastic or not” [Green, 2008, p.183]
If you have reason to believe that differences across entities have some influence on your dependent variable but are not correlated with the predictors then you should use random effects. An advantage of random effects is that you can include time invariant variables (i.e. gender). In the fixed effects model these variables are absorbed by the intercept.
Random effects assume that the entity's error term is not correlated with the predictors which allows for timeinvariant variables to play a role as explanatory variables.
In random-effects you need to specify those individual characteristics that may or may not influence the predictor variables. The problem with this is that some variables may not be available therefore leading to omitted variable bias in the model.
RE allows to generalize the inferences beyond the sample used in the model.