Fixed and Random effect models

Taking another look at the Fixed and Random effect model. All the code in this blog is from Nick Huntington-Klein (https://www.youtube.com/watch?v=2igMNODFypk). I have modified where required to add color and allow the code to work in the post r 4.0 world. He does a very good series of videos covering regression techniques from an economist standpoint.

Data Load

#Load in Wooldridge data on crime
#updated to readstrata13 format
crime <- read.dta13("http://fmwww.bc.edu/ec-p/data/wooldridge/crime4.dta")

The package requires the data be in a panel format.

crime.p <- pdata.frame(crime,index=c("county","year")) #define what makes it a panel, the items you are looking at
  • County is the individual variable
    • helps R define each individual in focus
  • Year is the time variable

Panel Models

Fixed effects

Fixed effects: \(Y={ a }_{ i }+b*X\) where each individual gets their own \({ a }_{ i }\)

A fixed effects model only focuses on variation within the individual observations. In the data we have County 1, country 2, country 3, etc… The model will take all the variables we are interested in and calculate the mean within each individual (in this case the country) and subtract out the mean leaving only the variation. The model them compares the variation allowing us to compare a single individual to itself within different periods of time.

#Run a panel model
#fixed effects / within
# crmrte = Crime Rate
# polpc = police per capita
# within = only look at variations *within* each individual

fixedeff <- plm(log(crmrte)~polpc,data=crime.p,model="within")
stargazer(fixedeff,type='text')
## 
## ========================================
##                  Dependent variable:    
##              ---------------------------
##                      log(crmrte)        
## ----------------------------------------
## polpc                 30.990***         
##                        (4.078)          
##                                         
## ----------------------------------------
## Observations             630            
## R2                      0.097           
## Adjusted R2            -0.054           
## F Statistic    57.753*** (df = 1; 539)  
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01

Police per capita is positively correlated to crime Rate

Random Effects Model

Random effects: \(Y={ a }_{ i }+b*X\) where \({ a }_{ i }\) takes a normal distribution over individuals

Each individual county has a different intercept. We are not going to let the model set the intercept to whatever fits outside the mean (as in the fixed effects), we are going to state the intercept follows a normal distribution.

The code the same as above only changing out the model type and setting stargazer to random effect

#Random effects
randomeff <- plm(log(crmrte)~polpc,data=crime.p,model="random")
stargazer(randomeff,type='text')
## 
## ========================================
##                  Dependent variable:    
##              ---------------------------
##                      log(crmrte)        
## ----------------------------------------
## polpc                 29.675***         
##                        (4.044)          
##                                         
## Constant              -3.666***         
##                        (0.059)          
##                                         
## ----------------------------------------
## Observations             630            
## R2                      0.079           
## Adjusted R2             0.078           
## F Statistic           53.845***         
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01

First difference model

Fist Diff: \({ Y }_{ it }-{ Y }_{ i(t-1) }=a+b({ X }_{ it }-{ X }_{ i(t-1) }\)

First difference model takes the difference from a variable subtracted from the time(year) before it. So in this data, it takes the ‘polpc’ value from 1982 and subtracts the ‘polpc’ value from 1981; noting the difference. It continues do perform the action for all values. Vary defined set of variance.

#First difference effects
firstdiff <- plm(log(crmrte)~polpc,data=crime.p,model="fd")
stargazer(firstdiff,type='text')
## 
## ========================================
##                  Dependent variable:    
##              ---------------------------
##                      log(crmrte)        
## ----------------------------------------
## polpc                 33.690***         
##                        (4.712)          
##                                         
## Constant                0.002           
##                        (0.008)          
##                                         
## ----------------------------------------
## Observations             540            
## R2                      0.087           
## Adjusted R2             0.085           
## F Statistic    51.110*** (df = 1; 538)  
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01

Between Model

Additional models would include ‘between’. This model looks at the variation within the noted time variable across the identified individual value. In this case, the variance between the years across all the counties.

#First difference effects
betweendiff <- plm(log(crmrte)~polpc,data=crime.p,model="between")
stargazer(betweendiff,type='text')
## 
## ========================================
##                  Dependent variable:    
##              ---------------------------
##                      log(crmrte)        
## ----------------------------------------
## polpc                  -28.150          
##                       (27.042)          
##                                         
## Constant              -3.555***         
##                        (0.078)          
##                                         
## ----------------------------------------
## Observations             90             
## R2                      0.012           
## Adjusted R2             0.001           
## F Statistic      1.084 (df = 1; 88)     
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01

Hausman Test

Assumes the individual \({ a }_{ i }s\) are uncorrelated with the controls.

The test allows you to compare the results from the fixed effects the the Random effects model; see if they are different.

Random effects takes some assumptions beyond what the fixed effects takes. If those assumptions are wrong the random effects model is going to be biased; if they are right, the model is more efficient smaller standard errors. You are using outside info to add precision.

We are going to check if they are different. If they are different, the fixed effects may be the better way to go; the same Random effects may be the cleaner model.

#Run a hausman test comparing random and fixed effects
phtest(fixedeff,randomeff)
## 
##  Hausman Test
## 
## data:  log(crmrte) ~ polpc
## chisq = 6.298, df = 1, p-value = 0.01209
## alternative hypothesis: one model is inconsistent

In this case, since the p-value is not zero or very near zero, we reject the random effects and chose the fixed effects model.

Adding a Lag

Say we want to regress the log prime rate on the number of the member piece police per capita but also on last year’s crime rate. Basically, the number per of police per capita affect how Crime rate changes from one year to another.

#Include a lag
fixedeffwithlag <- plm(log(crmrte)~lag(log(crmrte))+polpc,data=crime.p,model="within")
stargazer(fixedeffwithlag,type='text')
## 
## ============================================
##                      Dependent variable:    
##                  ---------------------------
##                          log(crmrte)        
## --------------------------------------------
## lag(log(crmrte))          0.231***          
##                            (0.043)          
##                                             
## polpc                     26.731***         
##                            (4.313)          
##                                             
## --------------------------------------------
## Observations                 540            
## R2                          0.148           
## Adjusted R2                -0.025           
## F Statistic        39.008*** (df = 2; 448)  
## ============================================
## Note:            *p<0.1; **p<0.05; ***p<0.01

Shows the log of the crime rate as an independent variable in the previous period. Caution should be taken to only use the lag function inside the PLM command. R itself has a lag function which uses the observation-above, not the period before.

Cluster SE at individual level

library(lmtest)
## Warning: package 'lmtest' was built under R version 3.6.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.6.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
#Cluster standard errors at the individual level
coeftest(fixedeff,vcovHC(fixedeff,type="HC0",cluster="group"))
## 
## t test of coefficients:
## 
##       Estimate Std. Error t value  Pr(>|t|)    
## polpc  30.9905     4.0725  7.6098 1.234e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1