Taking another look at the Fixed and Random effect model. All the code in this blog is from Nick Huntington-Klein (https://www.youtube.com/watch?v=2igMNODFypk). I have modified where required to add color and allow the code to work in the post r 4.0 world. He does a very good series of videos covering regression techniques from an economist standpoint.
#Load in Wooldridge data on crime
#updated to readstrata13 format
crime <- read.dta13("http://fmwww.bc.edu/ec-p/data/wooldridge/crime4.dta")
The package requires the data be in a panel format.
crime.p <- pdata.frame(crime,index=c("county","year")) #define what makes it a panel, the items you are looking at
Fixed effects: \(Y={ a }_{ i }+b*X\) where each individual gets their own \({ a }_{ i }\)
A fixed effects model only focuses on variation within the individual observations. In the data we have County 1, country 2, country 3, etc… The model will take all the variables we are interested in and calculate the mean within each individual (in this case the country) and subtract out the mean leaving only the variation. The model them compares the variation allowing us to compare a single individual to itself within different periods of time.
#Run a panel model
#fixed effects / within
# crmrte = Crime Rate
# polpc = police per capita
# within = only look at variations *within* each individual
fixedeff <- plm(log(crmrte)~polpc,data=crime.p,model="within")
stargazer(fixedeff,type='text')
##
## ========================================
## Dependent variable:
## ---------------------------
## log(crmrte)
## ----------------------------------------
## polpc 30.990***
## (4.078)
##
## ----------------------------------------
## Observations 630
## R2 0.097
## Adjusted R2 -0.054
## F Statistic 57.753*** (df = 1; 539)
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Police per capita is positively correlated to crime Rate
Random effects: \(Y={ a }_{ i }+b*X\) where \({ a }_{ i }\) takes a normal distribution over individuals
Each individual county has a different intercept. We are not going to let the model set the intercept to whatever fits outside the mean (as in the fixed effects), we are going to state the intercept follows a normal distribution.
The code the same as above only changing out the model type and setting stargazer to random effect
#Random effects
randomeff <- plm(log(crmrte)~polpc,data=crime.p,model="random")
stargazer(randomeff,type='text')
##
## ========================================
## Dependent variable:
## ---------------------------
## log(crmrte)
## ----------------------------------------
## polpc 29.675***
## (4.044)
##
## Constant -3.666***
## (0.059)
##
## ----------------------------------------
## Observations 630
## R2 0.079
## Adjusted R2 0.078
## F Statistic 53.845***
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Fist Diff: \({ Y }_{ it }-{ Y }_{ i(t-1) }=a+b({ X }_{ it }-{ X }_{ i(t-1) }\)
First difference model takes the difference from a variable subtracted from the time(year) before it. So in this data, it takes the ‘polpc’ value from 1982 and subtracts the ‘polpc’ value from 1981; noting the difference. It continues do perform the action for all values. Vary defined set of variance.
#First difference effects
firstdiff <- plm(log(crmrte)~polpc,data=crime.p,model="fd")
stargazer(firstdiff,type='text')
##
## ========================================
## Dependent variable:
## ---------------------------
## log(crmrte)
## ----------------------------------------
## polpc 33.690***
## (4.712)
##
## Constant 0.002
## (0.008)
##
## ----------------------------------------
## Observations 540
## R2 0.087
## Adjusted R2 0.085
## F Statistic 51.110*** (df = 1; 538)
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Additional models would include ‘between’. This model looks at the variation within the noted time variable across the identified individual value. In this case, the variance between the years across all the counties.
#First difference effects
betweendiff <- plm(log(crmrte)~polpc,data=crime.p,model="between")
stargazer(betweendiff,type='text')
##
## ========================================
## Dependent variable:
## ---------------------------
## log(crmrte)
## ----------------------------------------
## polpc -28.150
## (27.042)
##
## Constant -3.555***
## (0.078)
##
## ----------------------------------------
## Observations 90
## R2 0.012
## Adjusted R2 0.001
## F Statistic 1.084 (df = 1; 88)
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Assumes the individual \({ a }_{ i }s\) are uncorrelated with the controls.
The test allows you to compare the results from the fixed effects the the Random effects model; see if they are different.
Random effects takes some assumptions beyond what the fixed effects takes. If those assumptions are wrong the random effects model is going to be biased; if they are right, the model is more efficient smaller standard errors. You are using outside info to add precision.
We are going to check if they are different. If they are different, the fixed effects may be the better way to go; the same Random effects may be the cleaner model.
#Run a hausman test comparing random and fixed effects
phtest(fixedeff,randomeff)
##
## Hausman Test
##
## data: log(crmrte) ~ polpc
## chisq = 6.298, df = 1, p-value = 0.01209
## alternative hypothesis: one model is inconsistent
In this case, since the p-value is not zero or very near zero, we reject the random effects and chose the fixed effects model.
Say we want to regress the log prime rate on the number of the member piece police per capita but also on last year’s crime rate. Basically, the number per of police per capita affect how Crime rate changes from one year to another.
#Include a lag
fixedeffwithlag <- plm(log(crmrte)~lag(log(crmrte))+polpc,data=crime.p,model="within")
stargazer(fixedeffwithlag,type='text')
##
## ============================================
## Dependent variable:
## ---------------------------
## log(crmrte)
## --------------------------------------------
## lag(log(crmrte)) 0.231***
## (0.043)
##
## polpc 26.731***
## (4.313)
##
## --------------------------------------------
## Observations 540
## R2 0.148
## Adjusted R2 -0.025
## F Statistic 39.008*** (df = 2; 448)
## ============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Shows the log of the crime rate as an independent variable in the previous period. Caution should be taken to only use the lag function inside the PLM command. R itself has a lag function which uses the observation-above, not the period before.
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.6.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.6.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
#Cluster standard errors at the individual level
coeftest(fixedeff,vcovHC(fixedeff,type="HC0",cluster="group"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## polpc 30.9905 4.0725 7.6098 1.234e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1