Fixed Effects

Introduction

Fixed effects regression can be done three ways:

n-1 binary regressors method when n is small
Entity-demeaned regression. “Changes” method when T = 2
FE estimator directly.

Setup

The third biggest factor of deaths in US is accidents (which includes motor vehicle crashes).

# Clear the workspace
rm(list = ls()) # Clear environment-remove all files from your workspace
gc()            # Clear unused memory

          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells  597877 32.0    1358095 72.6         NA   700248 37.4
Vcells 1111078  8.5    8388608 64.0      49152  1963379 15.0

cat("\f")       # Clear the console

graphics.off()  # Clear all graphs


# Prepare needed libraries
packages <- c("AER",          # load dataset
              "plm",          # for FE
              "stargazer",    # summary stats for sharing
              "ggplot2",      #for graphing,
              "psych",
              "naniar",      # for visualisation of missing data,
              "visdat",      # for visualisation of missing data,
              "VIM",         # for visualisation of missing data,
              "DataExplorer", # for visualisation of missing data,
              "dplyr",
              "magrittr"
              )

for (i in 1:length(packages)) {
  if (!packages[i] %in% rownames(installed.packages())) {
    install.packages(packages[i]
                     , repos = "http://cran.rstudio.com/"
                     , dependencies = TRUE
                     )
  }
  library(packages[i], character.only = TRUE)
}

rm(packages)

Data

Create dependent variable.

Load data and generate fatality rate dependent variable.

## data from Stock and Watson (2007)
data("Fatalities", 
     package = "AER"
     )
Fatalities_original <- Fatalities

## add fatality rate (number of traffic deaths per 10,000 people living in that state in that year)
Fatalities$frate <- with(data = Fatalities,
                         expr =  fatal/pop * 10000
                         )

Summary Statistics

## ALL VARIABLES IN THE ORIGINAL DATAFRAME
stargazer(Fatalities, 
          type = "text", # html, latex
          # out =
          # summary.stat = 
          # covariate.labels = labels,
          digits = 1)


===============================================================
Statistic     N     Mean      St. Dev.      Min        Max     
---------------------------------------------------------------
spirits      336     1.8         0.7        0.8        4.9     
unemp        336     7.3         2.5        2.4        18.0    
income       336  13,880.2     2,253.0    9,513.8    22,193.5  
emppop       336    60.8         4.7       43.0        71.3    
beertax      336     0.5         0.5       0.04        2.7     
baptist      336     7.2         9.8        0.0        30.4    
mormon       336     2.8         9.7        0.1        65.9    
drinkage     336    20.5         0.9       18.0        21.0    
dry          336     4.3         9.5        0.0        45.8    
youngdrivers 336     0.2        0.02        0.1        0.3     
miles        336   7,890.8     1,475.7    4,576.3    26,148.3  
fatal        336    928.7       934.1       79        5,504    
nfatal       336    182.6       188.4       13        1,049    
sfatal       336    109.9       108.5        8         603     
fatal1517    336    62.6        55.7         3         318     
nfatal1517   336    12.3        12.3         0          76     
fatal1820    336    106.7       104.2        7         601     
nfatal1820   336    33.5        33.2         0         196     
fatal2124    336    126.9       131.8       12         770     
nfatal2124   336    41.4        42.9         1         249     
afatal       336    293.3       303.6      24.6      2,094.9   
pop          336 4,930,272.0 5,073,704.0 478,999.7 28,314,028.0
pop1517      336  230,815.5   229,896.3  21,000.0  1,172,000.0 
pop1820      336  249,090.4   249,345.6  21,000.0  1,321,004.0 
pop2124      336  336,389.9   345,304.4  30,000.2  1,892,998.0 
milestot     336  37,101.5    37,454.4    3,993.0   241,575.0  
unempus      336     7.5         1.5        5.5        9.7     
emppopus     336    60.0         1.6       57.8        62.3    
gsp          336    0.03        0.04       -0.1        0.1     
frate        336     2.0         0.6        0.8        4.2     
---------------------------------------------------------------

Data Dictionary

A data frame containing 336 observations on 34 variables.

Data Dictionary

5 factor variables

How to plot these?

Balanced Panel !

state - factor indicating state.
year - factor indicating year.
breath - factor. Preliminary breath test law?
jail - factor. Mandatory jail sentence?
service - factor. Mandatory community service?

# re-order levels
reorder_size <- function(x) {
        factor(x, levels = names(sort(table(x), decreasing = TRUE)))
}

ggplot(data = Fatalities, 
       aes(x = reorder_size(year)
           )
       ) +
        geom_bar() +
        xlab("Year") + ylab("Frequency") +
        theme(axis.text.x = element_text(angle = 45)
              )

ggplot(data = Fatalities, 
       aes(x = reorder_size(state)
           )
       ) +
        geom_bar() +
        xlab("State") + ylab("Frequency") +
        theme(axis.text.x = element_text(angle = 90)
              )

29 Numeric Variables

Basic economic variables, demographic decomposition by religion, drinking by age, fatality decomposition like by night or single vehicle, and by age,…

spirits - numeric. Spirits consumption.
unemp - numeric. Unemployment rate.
income - numeric. Per capita personal income in 1987 dollars. US Bureau of Economic Analysis.
emppop - numeric. Employment/population ratio.
beertax - numeric. Tax on case of beer, which is an available measure of state alcohol taxes more generally.
baptist - numeric. Percent of southern baptist.
mormon - numeric. Percent of mormon.
drinkage - numeric. Minimum legal drinking age. EG. 18, 19, or 20.
dry - numeric. Percent residing in “dry” countries.
youngdrivers - numeric. Percent of drivers aged 15–24.
miles - numeric. Average miles per driver. Department of Transportation.
fatal - numeric. Number of vehicle fatalities. Department of Transportation Fatal Accident Reporting System.
nfatal - numeric. Number of night-time vehicle fatalities.
sfatal - numeric. Number of single vehicle fatalities.
fatal1517 - numeric. Number of vehicle fatalities, 15–17 year olds.
nfatal1517 - numeric. Number of night-time vehicle fatalities, 15–17 year olds.
fatal1820 - numeric. Number of vehicle fatalities, 18–20 year olds.
nfatal1820 - numeric. Number of night-time vehicle fatalities, 18–20 year olds.
fatal2124 - numeric. Number of vehicle fatalities, 21–24 year olds.
nfatal2124 - numeric. Number of night-time vehicle fatalities, 21–24 year olds.
afatal - numeric. Number of alcohol-involved vehicle fatalities.
pop - numeric. Population.
pop1517 - numeric. Population, 15–17 year olds.
pop1820 - numeric. Population, 18–20 year olds.
pop2124 - numeric. Population, 21–24 year olds.
milestot - numeric. Total vehicle miles (millions).
unempus - numeric. US unemployment rate. US Bureau of Labor Statistics.
emppopus - numeric. US employment/population ratio.
gsp - numeric. GSP rate of change.

OLS

We would expect more beer drinking linked with more fatal car crashes.

States mainly levy excise taxes on beer, wine, and spirits, although some states levy taxes on additional beverages (such as sparkling wine and cider).

Thus, we would expect more beer taxes (reduces beer sales) linked with less fatal car crashes.

Without FE

Unfortunately, we find that more state beer tax leads to more fatal DUI (driving under influence) related deaths, not less !!!

fatal_lm_mod <- lm(data   = Fatalities,
                  formula = frate ~ beertax , 
                     )

summary(fatal_lm_mod)


Call:
lm(formula = frate ~ beertax, data = Fatalities)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.09060 -0.37768 -0.09436  0.28548  2.27643 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.85331    0.04357  42.539  < 2e-16 ***
beertax      0.36461    0.06217   5.865 1.08e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5437 on 334 degrees of freedom
Multiple R-squared:  0.09336,   Adjusted R-squared:  0.09065 
F-statistic: 34.39 on 1 and 334 DF,  p-value: 1.082e-08

If the tax on a case of beer increases by $1, then the number of vehicle fatalities increases 0.36 deaths per 10,000 people. This is statistically significant, seems a bit high and in the unexpected direction.

Omitted Variables

What could cause this result??? Omitted Variable Bias due to unobserved heterogeneity ???

Risk of a bias due to omitted factors that vary across states but not over time. Use State Fixed Effects. Start by thinking about factors that determine traffic fatality rate, and could be correlated with beer taxes
- Density of cars on the road. Specifically, “high taxes” could reflect “high traffic density” (so the OLS coefficient would be biased positively – high taxes, more deaths). Panel data lets us eliminate omitted variable bias when the omitted variables are constant over time within a given state.
- Quality (age) of automobiles / Vintage of autos on the road. Older vehicles may be less safe and more likely to cause traffic death. If the vintage of auto happens to be negatively correlated with beer taxes, then there will be a omitted variable bias.
- Culture around drinking and driving. Potentially are correlated with the beer tax, so beer taxes could be picking up cultural differences (omitted variable bias). Specifically, “high taxes” could reflect “cultural attitudes towards drinking” (so the OLS coefficient would be biased). OVB table
- Driving conditions in states/Quality of roads. Roads may be worse in some states than others, and thus correlated with more fatalities. If driving infrastructure is also (negatively) correlated to alcohol taxes, then OVB.
- Weather conditions in states. Some states may have colder weather, causing more skids and thus deaths. If colder weather is also correlated with alcohol taxes, then OVB.
Other omitted variables that vary over time and thus cause a bias. Use Time Fixed Effects. These affect all states in a particular year !!!
- Improvements in auto safety over time
- Changing national attitudes towards drunk driving

Model: OLS with Dummy Variables i.e. With FE

We have 48 binary regressors here — one for each federal state.

$Fatality \ Rate_{it} = \beta_0 + \beta_1 Beer \ Tax _{it} + \delta_2 State \ 2_i + \delta_3 State \ 3_i + ... + \delta_n State \ n_i + u_{it}$

fatal_fe_lm_mod   <- lm( data    = Fatalities,
                      formula = frate ~ beertax + state, 
                     )

# summary(fatal_fe_lm_mod)

stargazer(fatal_fe_lm_mod,
          type="text"
          )


===============================================
                        Dependent variable:    
                    ---------------------------
                               frate           
-----------------------------------------------
beertax                      -0.656***         
                              (0.188)          
                                               
stateaz                      -0.568**          
                              (0.267)          
                                               
statear                      -0.655***         
                              (0.219)          
                                               
stateca                      -1.509***         
                              (0.304)          
                                               
stateco                      -1.484***         
                              (0.287)          
                                               
statect                      -1.862***         
                              (0.281)          
                                               
statede                      -1.308***         
                              (0.294)          
                                               
statefl                       -0.268*          
                              (0.139)          
                                               
statega                      0.525***          
                              (0.184)          
                                               
stateid                      -0.669***         
                              (0.258)          
                                               
stateil                      -1.962***         
                              (0.291)          
                                               
statein                      -1.462***         
                              (0.273)          
                                               
stateia                      -1.544***         
                              (0.253)          
                                               
stateks                      -1.223***         
                              (0.245)          
                                               
stateky                      -1.218***         
                              (0.287)          
                                               
statela                      -0.847***         
                              (0.189)          
                                               
stateme                      -1.108***         
                              (0.191)          
                                               
statemd                      -1.706***         
                              (0.283)          
                                               
statema                      -2.110***         
                              (0.276)          
                                               
statemi                      -1.485***         
                              (0.236)          
                                               
statemn                      -1.897***         
                              (0.265)          
                                               
statems                       -0.029           
                              (0.148)          
                                               
statemo                      -1.296***         
                              (0.267)          
                                               
statemt                       -0.360           
                              (0.264)          
                                               
statene                      -1.522***         
                              (0.249)          
                                               
statenv                      -0.601**          
                              (0.286)          
                                               
statenh                      -1.254***         
                              (0.210)          
                                               
statenj                      -2.106***         
                              (0.307)          
                                               
statenm                       0.426*           
                              (0.254)          
                                               
stateny                      -2.187***         
                              (0.299)          
                                               
statenc                      -0.290**          
                              (0.120)          
                                               
statend                      -1.623***         
                              (0.254)          
                                               
stateoh                      -1.674***         
                              (0.254)          
                                               
stateok                      -0.545***         
                              (0.169)          
                                               
stateor                      -1.168***         
                              (0.286)          
                                               
statepa                      -1.767***         
                              (0.276)          
                                               
stateri                      -2.265***         
                              (0.294)          
                                               
statesc                      0.557***          
                              (0.110)          
                                               
statesd                      -1.004***         
                              (0.210)          
                                               
statetn                      -0.876***         
                              (0.268)          
                                               
statetx                      -0.917***         
                              (0.246)          
                                               
stateut                      -1.164***         
                              (0.196)          
                                               
statevt                      -0.966***         
                              (0.211)          
                                               
stateva                      -1.290***         
                              (0.204)          
                                               
statewa                      -1.660***         
                              (0.283)          
                                               
statewv                      -0.897***         
                              (0.247)          
                                               
statewi                      -1.759***         
                              (0.294)          
                                               
statewy                       -0.229           
                              (0.313)          
                                               
Constant                     3.478***          
                              (0.313)          
                                               
-----------------------------------------------
Observations                    336            
R2                             0.905           
Adjusted R2                    0.889           
Residual Std. Error      0.190 (df = 287)      
F Statistic          56.969*** (df = 48; 287)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

?stargazer
stargazer(fatal_fe_lm_mod,
          type="text", 
          omit = c("state")
          )


===============================================
                        Dependent variable:    
                    ---------------------------
                               frate           
-----------------------------------------------
beertax                      -0.656***         
                              (0.188)          
                                               
Constant                     3.478***          
                              (0.313)          
                                               
-----------------------------------------------
Observations                    336            
R2                             0.905           
Adjusted R2                    0.889           
Residual Std. Error      0.190 (df = 287)      
F Statistic          56.969*** (df = 48; 287)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

?stargazer
stargazer(fatal_lm_mod, fatal_fe_lm_mod, 
          type  = 'text',
          column.labels = c("OLS with no State Dummy","OLS with State Dummy"), 
          omit = c("state"),
          notes = c("Dropping state coefficients")
          )


====================================================================
                                  Dependent variable:               
                    ------------------------------------------------
                                         frate                      
                    OLS with no State Dummy   OLS with State Dummy  
                              (1)                     (2)           
--------------------------------------------------------------------
beertax                    0.365***                -0.656***        
                            (0.062)                 (0.188)         
                                                                    
Constant                   1.853***                 3.478***        
                            (0.044)                 (0.313)         
                                                                    
--------------------------------------------------------------------
Observations                  336                     336           
R2                           0.093                   0.905          
Adjusted R2                  0.091                   0.889          
Residual Std. Error    0.544 (df = 334)         0.190 (df = 287)    
F Statistic         34.394*** (df = 1; 334) 56.969*** (df = 48; 287)
====================================================================
Note:                                    *p<0.1; **p<0.05; ***p<0.01
                                         Dropping state coefficients

Model: Demeaned Regression

$Fatality \ Rate_{it} - \bar{Fatality \ Rate_{i}} = \beta_1 ( Beer \ Tax _{it} - \bar{Beer \ Tax _{i} }) + (u_{it}-\bar{u_i})$

EQUIVALENTLY

$\tilde{Fatality \ Rate_{it}} = \beta_1 \tilde{Beer \ Tax_{it}} + u_{it}$

Derivation

$Y_{it} = \beta_0 + \beta_1 X_{it} + \beta_2 Z_{i} + u_{it}$ implies

Eq 1:

$Y_{it} = \alpha_i + \beta_1 X_{it} + u_{it}$ where $\alpha_i = \beta_0 + \beta_2 Z_{i}$

We have just rewritten the equation above.

Taking average on both sides (over time $t$ ) -

Eq 2:

$\bar{Y_i} = \beta_1 \bar{X_{i}} + \alpha_i + \bar{u_{i}}$

Subtracting Eq 2 from Eq 1 gives -

$Y_{it} - \bar{Y_i} = \beta_1 ( X_{it} - \bar{X_{i}}) + (u_{it}-\bar{u_{i}})$

The function ave is convenient for computing group averages. We use it to obtain state specific averages of the fatality rate and the beer tax.

# obtain demeaned data
?ave
Fatalities_demeaned <- with(data = Fatalities,
                            expr = data.frame(frate   = frate   - ave(x = frate, state), # frate grouped by state
                                              beertax = beertax - ave(x = beertax, state)  # beertax grouped by state
                                              )
                            )


# estimate the regression

fatal_demeaned_lm_mod <-
lm(data    = Fatalities_demeaned,
           formula = frate ~ beertax - 1 , # indicating that you are regressing frate on beertax without an intercept
           )

summary(fatal_demeaned_lm_mod)


Call:
lm(formula = frate ~ beertax - 1, data = Fatalities_demeaned)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.58696 -0.08284 -0.00127  0.07955  0.89780 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
beertax  -0.6559     0.1739  -3.772 0.000191 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1757 on 335 degrees of freedom
Multiple R-squared:  0.04074,   Adjusted R-squared:  0.03788 
F-statistic: 14.23 on 1 and 335 DF,  p-value: 0.0001913

Model: Fixed Effects

The ID variable for entities is named state and the time id variable is year.

Since the fixed effects estimator is also called the within estimator, we set model = "within". Do not set model="random" as that would run a random effects model.

# install and load the 'plm' package
## install.packages("plm")
library(plm)

# estimate the fixed effects regression with plm()
fatal_fe_mod <- plm(formula = frate ~ beertax, 
                    data    = Fatalities,
                    index   = c("state", "year"), # declaring data to be panel
                    model   = "within"            # fixed effects mdodel
                    )

summary(fatal_fe_mod )

Oneway (individual) effect Within Model

Call:
plm(formula = frate ~ beertax, data = Fatalities, model = "within", 
    index = c("state", "year"))

Balanced Panel: n = 48, T = 7, N = 336

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.5869619 -0.0828376 -0.0012701  0.0795454  0.8977960 

Coefficients:
        Estimate Std. Error t-value Pr(>|t|)    
beertax -0.65587    0.18785 -3.4915 0.000556 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    10.785
Residual Sum of Squares: 10.345
R-Squared:      0.040745
Adj. R-Squared: -0.11969
F-statistic: 12.1904 on 1 and 287 DF, p-value: 0.00055597

# print summary using robust standard errors
coeftest(fatal_fe_mod, vcov. = vcovHC, type = "HC1")


t test of coefficients:

        Estimate Std. Error t value Pr(>|t|)  
beertax -0.65587    0.28880  -2.271  0.02388 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note we had to declare our data to be panel in order to use plm command.

Declaring data as panel in R can have several benefits, including:

Panel data commands: When data is declared as panel, it allows the use of specialized panel data commands in R, such as the plm package, which can be used to estimate panel data models, such as fixed effects and random effects models.
Easy subsetting: Panel data can be easily subsetted by individual or time, which allows for easier analysis and visualization of data.
Easy aggregation: Panel data can also be easily aggregated by individual or time, which can be useful for summarizing data and creating summary statistics.
Panel-specific visualizations: Panel data can be used to create panel-specific visualizations, such as panel plots, which allow for easy comparison of variables across time and individuals.
Panel data modeling: Panel data models can provide more accurate estimates of coefficients, as they control for individual-level and time-specific heterogeneity that can bias estimates in traditional cross-sectional or time-series models.

stargazer(#fatal_lm_mod,                                            # OLS
          fatal_fe_lm_mod, fatal_demeaned_lm_mod,  fatal_fe_mod,    # fixed effect
                  type  = "text",
          column.labels = c(#"OLS - no Dummy",
                            "OLS - State Dummy",
                            "Demeaned Reg",
                            "FE"
                            ),
          omit = c("state")
          )


============================================================================================
                                              Dependent variable:                           
                    ------------------------------------------------------------------------
                                                     frate                                  
                                          OLS                                 panel         
                                                                             linear         
                       OLS - State Dummy          Demeaned Reg                 FE           
                              (1)                      (2)                     (3)          
--------------------------------------------------------------------------------------------
beertax                    -0.656***                -0.656***               -0.656***       
                            (0.188)                  (0.174)                 (0.188)        
                                                                                            
Constant                    3.478***                                                        
                            (0.313)                                                         
                                                                                            
--------------------------------------------------------------------------------------------
Observations                  336                      336                     336          
R2                           0.905                    0.041                   0.041         
Adjusted R2                  0.889                    0.038                  -0.120         
Residual Std. Error     0.190 (df = 287)        0.176 (df = 335)                            
F Statistic         56.969*** (df = 48; 287) 14.229*** (df = 1; 335) 12.190*** (df = 1; 287)
============================================================================================
Note:                                                            *p<0.1; **p<0.05; ***p<0.01

FE limitations

Limitations / Challenges

Need variation in X over time within states
Time lag effects can be important
You should use heteroskedasticity- and autocorrelation consistent (clustered) standard errors if you think $u_{it}$ could be correlated over time

Two Way FE Implementation

You were doing one way FE only above (state).
Sometimes you want to do two way FE (state and time).
Here, it seems like time FE are not as important as state FE.

fatal_fe_lm_mod2 <- lm(data    = Fatalities,
                       formula = frate ~ beertax + state + year, 
                     )
# summary(fatal_fe_lm_mod2)


# estimate the fixed effects regression with plm()
fatal_fe_mod2 <- plm(formula = frate ~ beertax + as.factor(year), 
                     data    = Fatalities,
                     index   = c("state", "year"), # declaring data to be panel
                     model   = "within"            # fixed effects mdodel
                     )

# summary(fatal_fe_mod2)

stargazer(fatal_fe_lm_mod,  # State - One Way OLS (FE)
          fatal_fe_lm_mod2, # State - Time (Two Way) OLS
          fatal_fe_mod2,    # State - Time (Two Way) FE
                  type  = "text",
          column.labels = c("FE - One way (state)",
                            "FE - Two way (state-time) OLS",
                            "FE - Two way (state-time) PLM"
                            ),
          omit = c("state")
          )


========================================================================================================
                                                    Dependent variable:                                 
                    ------------------------------------------------------------------------------------
                                                           frate                                        
                                             OLS                                       panel            
                                                                                      linear            
                      FE - One way (state)   FE - Two way (state-time) OLS FE - Two way (state-time) PLM
                              (1)                         (2)                           (3)             
--------------------------------------------------------------------------------------------------------
beertax                    -0.656***                   -0.640***                     -0.640***          
                            (0.188)                     (0.197)                       (0.197)           
                                                                                                        
year1983                                               -0.080**                                         
                                                        (0.038)                                         
                                                                                                        
year1984                                                -0.072*                                         
                                                        (0.038)                                         
                                                                                                        
year1985                                               -0.124***                                        
                                                        (0.038)                                         
                                                                                                        
year1986                                                -0.038                                          
                                                        (0.039)                                         
                                                                                                        
year1987                                                -0.051                                          
                                                        (0.039)                                         
                                                                                                        
year1988                                                -0.052                                          
                                                        (0.040)                                         
                                                                                                        
as.factor(year)1983                                                                  -0.080**           
                                                                                      (0.038)           
                                                                                                        
as.factor(year)1984                                                                   -0.072*           
                                                                                      (0.038)           
                                                                                                        
as.factor(year)1985                                                                  -0.124***          
                                                                                      (0.038)           
                                                                                                        
as.factor(year)1986                                                                   -0.038            
                                                                                      (0.039)           
                                                                                                        
as.factor(year)1987                                                                   -0.051            
                                                                                      (0.039)           
                                                                                                        
as.factor(year)1988                                                                   -0.052            
                                                                                      (0.040)           
                                                                                                        
Constant                    3.478***                   3.511***                                         
                            (0.313)                     (0.333)                                         
                                                                                                        
--------------------------------------------------------------------------------------------------------
Observations                  336                         336                           336             
R2                           0.905                       0.909                         0.080            
Adjusted R2                  0.889                       0.891                        -0.096            
Residual Std. Error     0.190 (df = 287)           0.188 (df = 281)                                     
F Statistic         56.969*** (df = 48; 287)   51.934*** (df = 54; 281)       3.503*** (df = 7; 281)    
========================================================================================================
Note:                                                                        *p<0.1; **p<0.05; ***p<0.01

Better presentation

Lets hide the state coefficients and the time coefficients. with keep or omit commands.

?stargazer

stargazer(fatal_fe_lm_mod,  # State - One Way OLS (FE)
          fatal_fe_lm_mod2, # State - Time (Two Way) OLS
          fatal_fe_mod2,    # State - Time (Two Way) FE
                  type  = "text",
          column.labels = c("FE - One way (state)",
                            "FE - Two way (state-time) OLS",
                            "FE - Two way (state-time) PLM"
                            ),
          add.lines=list(c('Two Way Fixed effects', "No","Yes", "Yes")),
          keep = c("beertax")
          )


==========================================================================================================
                                                      Dependent variable:                                 
                      ------------------------------------------------------------------------------------
                                                             frate                                        
                                               OLS                                       panel            
                                                                                        linear            
                        FE - One way (state)   FE - Two way (state-time) OLS FE - Two way (state-time) PLM
                                (1)                         (2)                           (3)             
----------------------------------------------------------------------------------------------------------
beertax                      -0.656***                   -0.640***                     -0.640***          
                              (0.188)                     (0.197)                       (0.197)           
                                                                                                          
----------------------------------------------------------------------------------------------------------
Two Way Fixed effects            No                         Yes                           Yes             
Observations                    336                         336                           336             
R2                             0.905                       0.909                         0.080            
Adjusted R2                    0.889                       0.891                        -0.096            
Residual Std. Error       0.190 (df = 287)           0.188 (df = 281)                                     
F Statistic           56.969*** (df = 48; 287)   51.934*** (df = 54; 281)       3.503*** (df = 7; 281)    
==========================================================================================================
Note:                                                                          *p<0.1; **p<0.05; ***p<0.01

Unbalanced Panel

An unbalanced data (compared to balanced data) will have much larger confidence interval (as SE increase due to smaller n).

Creating an unbalanced data

set.seed(7)
Fatalities_unbalanced <- Fatalities %>% select(frate, beertax, year, state)


indices_to_replace <- sample(x = 1: length(Fatalities_unbalanced$frate),
                      size = round(.3 * length(Fatalities_unbalanced$frate)), 
                      replace = TRUE 
         )

Fatalities_unbalanced$frate[indices_to_replace] <- NA 
describe(Fatalities_unbalanced$frate)

   vars   n mean  sd median trimmed  mad  min  max range skew kurtosis   se
X1    1 249 2.04 0.6   1.95    1.99 0.58 0.82 4.22   3.4 0.78     0.63 0.04

indices_to_replace <- sample(x = 1: length(Fatalities_unbalanced$beertax),
                      size = .3 * length(Fatalities_unbalanced$beertax), 
                      replace = TRUE 
         )

Fatalities_unbalanced$beertax[indices_to_replace] <- NA 
describe(Fatalities_unbalanced$beertax)

   vars   n mean   sd median trimmed  mad  min  max range skew kurtosis   se
X1    1 247 0.51 0.45   0.36    0.42 0.27 0.04 2.61  2.57 2.07     4.79 0.03

vis_dat(Fatalities_unbalanced)

vis_miss(Fatalities_unbalanced)

naniar::gg_miss_upset(Fatalities_unbalanced)

?gg_miss_upset

sum(!is.na(Fatalities_unbalanced$frate) & !is.na(Fatalities_unbalanced$beertax))

[1] 181

Create the CI

# original model
fatal_fe_lm_mod <- lm(data    = Fatalities,
                       formula = frate ~ beertax + state, 
                     )

# data with NA
fatal_fe_lm_mod_NA <- lm(data    = Fatalities_unbalanced,
                       formula = frate ~ beertax + state, 
                     )

# summary(fatal_fe_mod2)

stargazer(fatal_fe_lm_mod,    #  State - One Way OLS (FE)
          fatal_fe_lm_mod_NA, #  State - One Way OLS (FE) with NA data
                  type  = "text",
          column.labels = c("State - One Way OLS (FE)",
                            "State - One Way OLS (FE) with NA data"
                            ), 
          keep = c("beertax")
          )


==================================================================================
                                         Dependent variable:                      
                    --------------------------------------------------------------
                                                frate                             
                    State - One Way OLS (FE) State - One Way OLS (FE) with NA data
                              (1)                             (2)                 
----------------------------------------------------------------------------------
beertax                    -0.656***                       -1.157***              
                            (0.188)                         (0.356)               
                                                                                  
----------------------------------------------------------------------------------
Observations                  336                             181                 
R2                           0.905                           0.919                
Adjusted R2                  0.889                           0.890                
Residual Std. Error     0.190 (df = 287)               0.190 (df = 132)           
F Statistic         56.969*** (df = 48; 287)       31.360*** (df = 48; 132)       
==================================================================================
Note:                                                  *p<0.1; **p<0.05; ***p<0.01

SE is indeed larger for the unbalanced data, as expected.

Random Effect

In Economics, we mostly have fixed effects (and not random effects or mixed effects {not to be confused with mixed models} ).

The rationale behind random effects model is that, unlike the fixed effects model, the variation across entities is assumed to be random and uncorrelated with the predictor or independent variables included in the model:
“…the crucial distinction between fixed and random effects is whether the unobserved individual effect embodies elements that are correlated with the regressors in the model, not whether these effects are stochastic or not” [Green, 2008, p.183]
If you have reason to believe that differences across entities have some influence on your dependent variable but are not correlated with the predictors then you should use random effects. An advantage of random effects is that you can include time invariant variables (i.e. gender). In the fixed effects model these variables are absorbed by the intercept.
Random effects assume that the entity’s error term is not correlated with the predictors which allows for timeinvariant variables to play a role as explanatory variables.
In random-effects you need to specify those individual characteristics that may or may not influence the predictor variables. The problem with this is that some variables may not be available therefore leading to omitted variable bias in the model.
RE allows to generalize the inferences beyond the sample used in the model.

References

Beer Example Theory - three equivalent methods
Beer Example Extra Code
FE Summary
https://www.econometrics-with-r.org/10-rwpd.html
Super Advanced - Kit’s notes http://fmwww.bc.edu/EC-C/S2021/8823/ECON8823.CRE.slides.pdf