EC3133
Estimation using longitudinal data
Fixed Effects Estimators
We will use the Fatalities data on traffic accidents and alcohol taxes in order to run some sample panel data estimations.
Here, we are interested in the relationship between alcohol taxes and fatalities so we will first visualize and summarize the data below:
## [1] TRUE
## [1] 336 34
## Classes 'pdata.frame' and 'data.frame': 336 obs. of 34 variables:
## $ state : Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ spirits : 'pseries' Named num 1.37 1.36 1.32 1.28 1.23 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ unemp : 'pseries' Named num 14.4 13.7 11.1 8.9 9.8 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ income : 'pseries' Named num 10544 10733 11109 11333 11662 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ emppop : 'pseries' Named num 50.7 52.1 54.2 55.3 56.5 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ beertax : 'pseries' Named num 1.54 1.79 1.71 1.65 1.61 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ baptist : 'pseries' Named num 30.4 30.3 30.3 30.3 30.3 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ mormon : 'pseries' Named num 0.328 0.343 0.359 0.376 0.393 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ drinkage : 'pseries' Named num 19 19 19 19.7 21 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ dry : 'pseries' Named num 25 23 24 23.6 23.5 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ youngdrivers: 'pseries' Named num 0.212 0.211 0.211 0.211 0.213 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ miles : 'pseries' Named num 7234 7836 8263 8727 8953 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ breath : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ jail : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 2 2 2 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ service : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 2 2 2 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ fatal : 'pseries' Named int 839 930 932 882 1081 1110 1023 724 675 869 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ nfatal : 'pseries' Named int 146 154 165 146 172 181 139 131 112 149 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ sfatal : 'pseries' Named int 99 98 94 98 119 114 89 76 60 81 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ fatal1517 : 'pseries' Named int 53 71 49 66 82 94 66 40 40 51 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ nfatal1517 : 'pseries' Named int 9 8 7 9 10 11 8 7 7 8 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ fatal1820 : 'pseries' Named int 99 108 103 100 120 127 105 81 83 118 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ nfatal1820 : 'pseries' Named int 34 26 25 23 23 31 24 16 19 34 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ fatal2124 : 'pseries' Named int 120 124 118 114 119 138 123 96 80 123 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ nfatal2124 : 'pseries' Named int 32 35 34 45 29 30 25 36 17 33 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ afatal : 'pseries' Named num 309 342 305 277 361 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ pop : 'pseries' Named num 3942002 3960008 3988992 4021008 4049994 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ pop1517 : 'pseries' Named num 209000 202000 197000 195000 204000 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ pop1820 : 'pseries' Named num 221553 219125 216724 214349 212000 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ pop2124 : 'pseries' Named num 290000 290000 288000 284000 263000 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ milestot : 'pseries' Named num 28516 31032 32961 35091 36259 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ unempus : 'pseries' Named num 9.7 9.6 7.5 7.2 7 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ emppopus : 'pseries' Named num 57.8 57.9 59.5 60.1 60.7 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ gsp : 'pseries' Named num -0.0221 0.0466 0.0628 0.0275 0.0321 ...
## ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
## ..- attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
## - attr(*, "index")=Classes 'pindex' and 'data.frame': 336 obs. of 2 variables:
## ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
## ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
provide a summary of state and year variables:
## state year
## al : 7 1982:48
## az : 7 1983:48
## ar : 7 1984:48
## ca : 7 1985:48
## co : 7 1986:48
## ct : 7 1987:48
## (Other):294 1988:48
The variable state is a factor variable with 48 levels (one for each
federal state of the US) and there are 7 values for the year variable.
Then we have \(7 x 48=336\)
observations in total. Because all the variables are observed for all
entities and over all time periods, we say that this panel is
balanced. If there were missing data for at least one
entitry in at least one time period, this would be an unbalanced
panel.
Now we will look at the fatality rates for two different years.
We will look at the following regression functions: \[ \hat{FatalityRate}=2.01 + 0.15 \text{ * } BeerTax \text{ .........1982 data} \\ \hat{FatalityRate}=1.86 + 0.44 \text{ * } BeerTax \text{ .........1988 data} \]
Run the below code to see these estimation results:
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.01038 0.14957 13.4408 <2e-16 ***
## beertax 0.14846 0.13261 1.1196 0.2687
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.85907 0.11461 16.2205 < 2.2e-16 ***
## beertax 0.43875 0.12786 3.4314 0.001279 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.01038 0.14957 13.4408 <2e-16 ***
## beertax 0.14846 0.13261 1.1196 0.2687
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.85907 0.11461 16.2205 < 2.2e-16 ***
## beertax 0.43875 0.12786 3.4314 0.001279 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Each point in the above graphs represents observations of beer tax and fatality rate for a given state in the respective year.
Note that the regression results indicate a positive relationship between the beer tax and the fatality rate for both years. Moreover, the estimated coefficient on beer tax for the 1988 data is almost three times as large as for the 1982 dataset.
Is this what you would expect? Would you expect alcohol taxes to lead to an INCREASE or DECREASE in the rate of traffic fatalities?
What exactly is behind this?
Omitted variable bias? Since both models do not include any covariates, e.g., economic conditions. This could be corrected by using a multiple regression approach. However, this cannot account for omitted unobservable factors that differ from state to state but can be assumed to be constant over the observation span, e.g., the populations’ attitude towards drunk driving. As shown in the next section, panel data allow us to hold such factors constant.
Let’s suppose that there are only 2 periods i.e. 1982 and 1988. Having the advantage of information about the same variable over multiple years allows us to look at how it has changed. In this case, we can see how fatality rates have changed from year 1982 to 1988 for each observation unit (in this case, states). We will use this information to get more out of the data.
See the below population regression model that relates fatality rates and alcohol taxes: \[ FatalityRate_{it} = \beta_0 + \beta_1 BeerTax_{it} + \beta_2 Z_{i} + u_{it} \]
\(Z_i\) are state-specific characteristics that differ between states but are constant over time (hence no time subscript). So if we wrote the above equation separately for years 1982 and 1988, we would have \[ FatalityRate_{i,1982} = \beta_0 + \beta_1 BeerTax_{i,1982} + \beta_2 Z_{i} + u_{i,1982} \\ FatalityRate_{i,1988} = \beta_0 + \beta_1 BeerTax_{i,1988} + \beta_2 Z_{i} + u_{i,1988} \] We can get rid of \(Z_i\) by regressing the difference in the fatality rate between 1988 and 1982 on the difference in beer tax between those years: \[ FatalityRate_{i,1988} - FatalityRate_{i,1982} = \beta_1 (BeerTax_{i,1988} - BeerTax_{i,1982}) + u_{i,1988} - u_{i,1982} \]
This regression model, where the difference in fatality rate between 1988 and 1982 is regressed on the difference in beer tax between thos eyears, yields an estimate for \(\beta_1\) that is robust to a possible bias due to omission of \(Z_i\), as these influences are eliminated from the model. Next we will estimate a regression model based on the differenced data and we will plot the estimated regression function.
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.072037 0.065355 -1.1022 0.276091
## diff_beertax -1.040973 0.355006 -2.9323 0.005229 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What does including the intercept do here? It allows for a change in the mean fatality rate in the time between 1982 and 1988 in the absence of a change in the beer tax.
Here is the OLS estimated regression function that we estimated: \[ \hat{FatalityRate}_{i,1988} - \hat{FatalityRate}_{i,1982} = -0.072 - 1.04 \text{ * } ( BeerTax_{i,1988} - BeerTax_{i,1982} ) \]
Both from the estimated coefficient and the graph (the red line) you can see that the relationship between beer tax and fatality rates is now negative! Moreover, it is statistically significant at 5 percent.
How do we interpret the coefficient estimate? Raising teh beer tax by $1 causes traffic fatalities to decrease by 1.04 per 10000 people. This is rather large as the average fatality rate is approximately 2 persons per 10000 people.
Once more, this outcome is likely to be a consequence of omitting factors in the sinlg year regression that influence the fatality rate and are correlated with teh beer tax and change over time. The message is that we need to be more careful and control for such factors before drawing conclusions about the effect of a raise in beer taxes.
\[
Y_{it} = \beta_0 + \beta_1 X_{it} + \beta_2 Z_i + u_{it} = \alpha_i
+ \beta_1 X_{it} + u_{it}
\] where \(\alpha_i = \beta_0 + \beta_2
Z_{i}\) is the fixed effect of entity \(i\) and the above model is called the
fixed effects model.
The variation in the \(\alpha_i\) comes from \(Z_i\).
\[ \frac{1}{T} \sum\limits_{t=1}^T Y_{it} = \beta_0 + \beta_1 \frac{1}{T} \sum\limits_{t=1}^T X_{it} + \alpha_i + \frac{1}{T} \sum\limits_{t=1}^T u_{it} \]
\[ \bar{Y}_{i} = \beta_1 \bar{X}_{i} + \alpha_i + \bar{u}_{i} \]
\[ Y_{it} - \bar{Y}_i = \beta_1 ( X_{it} - \bar{X}_i) + \alpha_i + ( u_{it} - \bar{u}_i) \\ \tilde{Y}_{it} = \beta_1 \tilde{X}_{it} + \tilde{u}_{it} \]
If the following assumptions hold, then
The sampling distribution of the OLS estimator in the fixed effects regression model is normal in large samples. The variance of the estimates can be estimated and we can compute the standard errors, t-statistics and confidence intervals for coefficients.
We will now see how to estimate a fixed effects model using R and how to obtain a model summary that reports heteroskedasticity-robust standard errors. We will leave aside complicated formulas of the estimators.
The simple fixed effects model for estimation of the relation between traffic fatality rates and the beer taxes is
\[ FatalityRate_{it} = \beta_1 BeerTax_{it} + StateFixedEffects + u_{it} \]
a regression of the traffic fatality rate on beer tax and 48 binary regressors - one for each state.
We can simply use the function lm() to obtain an estimate of \(\beta_1\).
##
## Call:
## lm(formula = fatal_rate ~ beertax + state - 1, data = Fatalities)
##
## Coefficients:
## beertax stateal stateaz statear stateca stateco statect statede
## -0.6559 3.4776 2.9099 2.8227 1.9682 1.9933 1.6154 2.1700
## statefl statega stateid stateil statein stateia stateks stateky
## 3.2095 4.0022 2.8086 1.5160 2.0161 1.9337 2.2544 2.2601
## statela stateme statemd statema statemi statemn statems statemo
## 2.6305 2.3697 1.7712 1.3679 1.9931 1.5804 3.4486 2.1814
## statemt statene statenv statenh statenj statenm stateny statenc
## 3.1172 1.9555 2.8769 2.2232 1.3719 3.9040 1.2910 3.1872
## statend stateoh stateok stateor statepa stateri statesc statesd
## 1.8542 1.8032 2.9326 2.3096 1.7102 1.2126 4.0348 2.4739
## statetn statetx stateut statevt stateva statewa statewv statewi
## 2.6020 2.5602 2.3137 2.5116 2.1874 1.8181 2.5809 1.7184
## statewy
## 3.2491
It is also possible to estimate \(\beta_1\) by applying OLS to the demeaned data, that is, to run the regression
\[ \tilde{FatalityRate} = \beta_1 \tilde{BeerTax}_{it} + u_{it} \]
##
## Call:
## lm(formula = fatal_rate ~ beertax - 1, data = fatal_demeaned)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58696 -0.08284 -0.00127 0.07955 0.89780
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## beertax -0.6559 0.1739 -3.772 0.000191 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1757 on 335 degrees of freedom
## Multiple R-squared: 0.04074, Adjusted R-squared: 0.03788
## F-statistic: 14.23 on 1 and 335 DF, p-value: 0.0001913
The function ave is for computing group averages. We use it to obtain state specific averages of of the fatality rate and the beer tax. Alternatively, one may use plm() from the package with the same name.
As for lm() we have to specify the regression formula and the data to be used in our call of plm(). Additionally, it is required to pass avector of names of entity and time ID variables to the argument index. For Fatalities, the ID variable for entities is named state and the time id variabe is year.
Since the fixed effects estimator is also called the within estimator, we set model = “within.”
The function coeftest() allows to obtain inference based on robust standard errors.
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## beertax -0.65587 0.28880 -2.271 0.02388 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated coefficient is again \(-0.6559\). Note that plm() uses the entity-demeaned OLS algorithm and thus does not report dummy coefficients. The estimated regression function is \[ FatalityRate = -0.66 * BeerTax + StateFixedEffects. \]
The coefficient on BeerTax is negative and significant. The interpretation is that the estimated reduction in traffic fatalities due to an increase in the real beer tax by $1 is 0.66 per 10000 people, which is still pretty high. Although including state fixed effects eliminates the risk of a bias due to omitted factors that vary across states but not over time, we suspect that there are other omitted variables that ary over time and thus cause bias.
Controlling for variables that are constant across entities but vary over time can be done by including time fixed effects. If there are only time fixed effects, the fixed effects regression model becomes \[ Y_{it} = \beta_0 + \beta_1 X_{it} + \delta_2 B2_t + \delta_3 B3_t + ..... + \delta_T BT_t + u_{it} \]
Note that only \(T-1\) dummies are included (B1 is omitted) since the model includes an intercept.
This model eliminates omitted variable bias caused by excluding unobserved variables that evolve over time but are constant across entities.
In some applications it is meaningful to include both entity and time fixed effects. The entity and time fixed effects model is \[ Y_{it} = \beta_0 + \beta_1 X_{it} + \gamma_2 D2_t + \gamma_3 D3_t + ..... + \gamma_T DT_t + + \delta_2 B2_t + \delta_3 B3_t + ..... + \delta_T BT_t + u_{it} \] The combined model allows to eliminate bias from unobserveables that change over time but are constant over entities AND it controls for factors that differ across entitities but are constant over time. Such models can be estimated using the OLS algorithm.
The following estimates the combined entity and time fixed effects model of the relation between fatalities and beer tax: \[ FatalityRate_{it} = \beta_1 * BeerTax_{it} + StateEffects + TimeFixedEffects + u_{it} \]
using both lm() and plm().
year for time fixed effects. In
our call of plm() we set another argument effect=“twoways” for inclusion
of entity and time dummies.##
## Call:
## lm(formula = fatal_rate ~ beertax + state + year - 1, data = Fatalities)
##
## Coefficients:
## beertax stateal stateaz statear stateca stateco statect statede
## -0.63998 3.51137 2.96451 2.87284 2.02618 2.04984 1.67125 2.22711
## statefl statega stateid stateil statein stateia stateks stateky
## 3.25132 4.02300 2.86242 1.57287 2.07123 1.98709 2.30707 2.31659
## statela stateme statemd statema statemi statemn statems statemo
## 2.67772 2.41713 1.82731 1.42335 2.04488 1.63488 3.49146 2.23598
## statemt statene statenv statenh statenj statenm stateny statenc
## 3.17160 2.00846 2.93322 2.27245 1.43016 3.95748 1.34849 3.22630
## statend stateoh stateok stateor statepa stateri statesc statesd
## 1.90762 1.85664 2.97776 2.36597 1.76563 1.26964 4.06496 2.52317
## statetn statetx stateut statevt stateva statewa statewv statewi
## 2.65670 2.61282 2.36165 2.56100 2.23618 1.87424 2.63364 1.77545
## statewy year1983 year1984 year1985 year1986 year1987 year1988
## 3.30791 -0.07990 -0.07242 -0.12398 -0.03786 -0.05090 -0.05180
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## beertax -0.63998 0.35015 -1.8277 0.06865 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Before discussing the outcomes we convince ourselves that state and year are of the class factor.
## [1] "pseries" "factor"
## [1] "pseries" "factor"
The lm() functions converts factors into dummies automatically. Since we exclude the intercept by adding -1 to the right-hand side of the regression formula, lm() estimates coefficients for \(n+(T-1)=48+6=54\) binary variables (6 year dummies and 48 state dummies). Again, plm() only reports the estimated coefficient on BeerTax.
The estimated regression function is \[ FatalityRate = -0.64 * BeerTax + StateEffects + TimeFixedEffects. \]
The result -0.66 is close to the estimated coefficient for the regression model including only entity fixed effects. Unsurprisingly, the coefficient is less precisely estimated but significantly different from zero at 10 percent.
We conclude that the estimated relationship between traffic fatalities and the real beer tax is not affected by omitted variable bias due to factors that are constant either over time or across states.
\[ Y_{it} = \beta_1 X_{it} + \alpha_i + u_{it} , i=1,....,n, t=1,...,T \] 1. The error term \(u_{it}\) has conditional mean zero, that is, \(E(u_{it} | X_{i1},X_{i2},....X_{iT}=0)\).
\((X_{i1},X_{i2},....X_{iT},u_{i1},u_{i2},....u_{iT})\) \(i=1,...,n\) are i.i.d. draws from their joint distribution.
Large outliers are unlikely, i.e., \((X_{it},u_{it})\) have nonzero finite fourth moments.
There is no perfect multicollinearity.
In cases where there are multiple regressors, \(X_{it}\), is replaced by \(X_{1,it},X_{2,it},....,X_{k,it}\).
Recall from our last lecture what these assumptions mean:
The first assumption is that the error is uncorrelated with ALL observations of the variable \(X\) for the entity \(i\) over time. For example, when we have omitted variables this assumption would be violated and we would have omitted variable bias.
The second assumption indicates that variables i.i.d. across entities \(i=1,...,n\). This does not require the observations to be uncorrelated WITHIN an entity. In other words, \(X_{it}\), can be autocorrelated within entities, which is usually the case with time series data. The same holds for the error terms \(u_{it}\) as well. As long as entities are selected by simple random sampling this assumption is satisfied. What examples can we think of that violate it?
The third and fourth assumptions are analagous to the multiple regression assumptions we discussed in the last lecture.