Application: Traffic Deaths and Alcohol Taxes

We will use the Fatalities data on traffic accidents and alcohol taxes in order to run some sample panel data estimations.
Here, we are interested in the relationship between alcohol taxes and fatalities so we will first visualize and summarize the data below:

## [1] TRUE

## [1] 336  34

## Classes 'pdata.frame' and 'data.frame':  336 obs. of  34 variables:
##  $ state       : Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ year        : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ spirits     : 'pseries' Named num  1.37 1.36 1.32 1.28 1.23 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ unemp       : 'pseries' Named num  14.4 13.7 11.1 8.9 9.8 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ income      : 'pseries' Named num  10544 10733 11109 11333 11662 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ emppop      : 'pseries' Named num  50.7 52.1 54.2 55.3 56.5 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ beertax     : 'pseries' Named num  1.54 1.79 1.71 1.65 1.61 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ baptist     : 'pseries' Named num  30.4 30.3 30.3 30.3 30.3 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ mormon      : 'pseries' Named num  0.328 0.343 0.359 0.376 0.393 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ drinkage    : 'pseries' Named num  19 19 19 19.7 21 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ dry         : 'pseries' Named num  25 23 24 23.6 23.5 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ youngdrivers: 'pseries' Named num  0.212 0.211 0.211 0.211 0.213 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ miles       : 'pseries' Named num  7234 7836 8263 8727 8953 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ breath      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ jail        : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 2 2 2 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ service     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 2 2 2 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ fatal       : 'pseries' Named int  839 930 932 882 1081 1110 1023 724 675 869 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ nfatal      : 'pseries' Named int  146 154 165 146 172 181 139 131 112 149 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ sfatal      : 'pseries' Named int  99 98 94 98 119 114 89 76 60 81 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ fatal1517   : 'pseries' Named int  53 71 49 66 82 94 66 40 40 51 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ nfatal1517  : 'pseries' Named int  9 8 7 9 10 11 8 7 7 8 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ fatal1820   : 'pseries' Named int  99 108 103 100 120 127 105 81 83 118 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ nfatal1820  : 'pseries' Named int  34 26 25 23 23 31 24 16 19 34 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ fatal2124   : 'pseries' Named int  120 124 118 114 119 138 123 96 80 123 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ nfatal2124  : 'pseries' Named int  32 35 34 45 29 30 25 36 17 33 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ afatal      : 'pseries' Named num  309 342 305 277 361 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ pop         : 'pseries' Named num  3942002 3960008 3988992 4021008 4049994 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ pop1517     : 'pseries' Named num  209000 202000 197000 195000 204000 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ pop1820     : 'pseries' Named num  221553 219125 216724 214349 212000 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ pop2124     : 'pseries' Named num  290000 290000 288000 284000 263000 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ milestot    : 'pseries' Named num  28516 31032 32961 35091 36259 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ unempus     : 'pseries' Named num  9.7 9.6 7.5 7.2 7 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ emppopus    : 'pseries' Named num  57.8 57.9 59.5 60.1 60.7 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ gsp         : 'pseries' Named num  -0.0221 0.0466 0.0628 0.0275 0.0321 ...
##   ..- attr(*, "names")= chr [1:336] "al-1982" "al-1983" "al-1984" "al-1985" ...
##   ..- attr(*, "index")=Classes 'pindex' and 'data.frame':    336 obs. of  2 variables:
##   .. ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...
##  - attr(*, "index")=Classes 'pindex' and 'data.frame':   336 obs. of  2 variables:
##   ..$ state: Factor w/ 48 levels "al","az","ar",..: 1 1 1 1 1 1 1 2 2 2 ...
##   ..$ year : Factor w/ 7 levels "1982","1983",..: 1 2 3 4 5 6 7 1 2 3 ...

provide a summary of state and year variables:

##      state       year   
##  al     :  7   1982:48  
##  az     :  7   1983:48  
##  ar     :  7   1984:48  
##  ca     :  7   1985:48  
##  co     :  7   1986:48  
##  ct     :  7   1987:48  
##  (Other):294   1988:48

The variable state is a factor variable with 48 levels (one for each federal state of the US) and there are 7 values for the year variable. Then we have $7 x 48=336$ observations in total. Because all the variables are observed for all entities and over all time periods, we say that this panel is balanced. If there were missing data for at least one entitry in at least one time period, this would be an unbalanced panel.

Now we will look at the fatality rates for two different years.

We will look at the following regression functions: \[ \hat{FatalityRate}=2.01 + 0.15 \text{ * } BeerTax \text{ .........1982 data} \\ \hat{FatalityRate}=1.86 + 0.44 \text{ * } BeerTax \text{ .........1988 data} \]

Run the below code to see these estimation results:

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.01038    0.14957 13.4408   <2e-16 ***
## beertax      0.14846    0.13261  1.1196   0.2687    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  1.85907    0.11461 16.2205 < 2.2e-16 ***
## beertax      0.43875    0.12786  3.4314  0.001279 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.01038    0.14957 13.4408   <2e-16 ***
## beertax      0.14846    0.13261  1.1196   0.2687    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  1.85907    0.11461 16.2205 < 2.2e-16 ***
## beertax      0.43875    0.12786  3.4314  0.001279 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Each point in the above graphs represents observations of beer tax and fatality rate for a given state in the respective year.
Note that the regression results indicate a positive relationship between the beer tax and the fatality rate for both years. Moreover, the estimated coefficient on beer tax for the 1988 data is almost three times as large as for the 1982 dataset.
Is this what you would expect? Would you expect alcohol taxes to lead to an INCREASE or DECREASE in the rate of traffic fatalities?
What exactly is behind this?
Omitted variable bias? Since both models do not include any covariates, e.g., economic conditions. This could be corrected by using a multiple regression approach. However, this cannot account for omitted unobservable factors that differ from state to state but can be assumed to be constant over the observation span, e.g., the populations’ attitude towards drunk driving. As shown in the next section, panel data allow us to hold such factors constant.

“Before” and “After” Comparisons

Let’s suppose that there are only 2 periods i.e. 1982 and 1988. Having the advantage of information about the same variable over multiple years allows us to look at how it has changed. In this case, we can see how fatality rates have changed from year 1982 to 1988 for each observation unit (in this case, states). We will use this information to get more out of the data.
See the below population regression model that relates fatality rates and alcohol taxes: \[ FatalityRate_{it} = \beta_0 + \beta_1 BeerTax_{it} + \beta_2 Z_{i} + u_{it} \]
$Z_i$ are state-specific characteristics that differ between states but are constant over time (hence no time subscript). So if we wrote the above equation separately for years 1982 and 1988, we would have \[ FatalityRate_{i,1982} = \beta_0 + \beta_1 BeerTax_{i,1982} + \beta_2 Z_{i} + u_{i,1982} \\ FatalityRate_{i,1988} = \beta_0 + \beta_1 BeerTax_{i,1988} + \beta_2 Z_{i} + u_{i,1988} \] We can get rid of $Z_i$ by regressing the difference in the fatality rate between 1988 and 1982 on the difference in beer tax between those years: \[ FatalityRate_{i,1988} - FatalityRate_{i,1982} = \beta_1 (BeerTax_{i,1988} - BeerTax_{i,1982}) + u_{i,1988} - u_{i,1982} \]

This regression model, where the difference in fatality rate between 1988 and 1982 is regressed on the difference in beer tax between thos eyears, yields an estimate for $\beta_1$ that is robust to a possible bias due to omission of $Z_i$, as these influences are eliminated from the model. Next we will estimate a regression model based on the differenced data and we will plot the estimated regression function.

## 
## t test of coefficients:
## 
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -0.072037   0.065355 -1.1022 0.276091   
## diff_beertax -1.040973   0.355006 -2.9323 0.005229 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What does including the intercept do here? It allows for a change in the mean fatality rate in the time between 1982 and 1988 in the absence of a change in the beer tax.
Here is the OLS estimated regression function that we estimated: \[ \hat{FatalityRate}_{i,1988} - \hat{FatalityRate}_{i,1982} = -0.072 - 1.04 \text{ * } ( BeerTax_{i,1988} - BeerTax_{i,1982} ) \]

Both from the estimated coefficient and the graph (the red line) you can see that the relationship between beer tax and fatality rates is now negative! Moreover, it is statistically significant at 5 percent.
How do we interpret the coefficient estimate? Raising teh beer tax by $1 causes traffic fatalities to decrease by 1.04 per 10000 people. This is rather large as the average fatality rate is approximately 2 persons per 10000 people.
Once more, this outcome is likely to be a consequence of omitting factors in the sinlg year regression that influence the fatality rate and are correlated with teh beer tax and change over time. The message is that we need to be more careful and control for such factors before drawing conclusions about the effect of a raise in beer taxes.

Fixed Effects Regression: Estimation and Inference

\[ Y_{it} = \beta_0 + \beta_1 X_{it} + \beta_2 Z_i + u_{it} = \alpha_i + \beta_1 X_{it} + u_{it} \] where $\alpha_i = \beta_0 + \beta_2 Z_{i}$ is the fixed effect of entity $i$ and the above model is called the fixed effects model.

The variation in the $\alpha_i$ comes from $Z_i$.

Take the above equation and take the average of both sides:

\[ \frac{1}{T} \sum\limits_{t=1}^T Y_{it} = \beta_0 + \beta_1 \frac{1}{T} \sum\limits_{t=1}^T X_{it} + \alpha_i + \frac{1}{T} \sum\limits_{t=1}^T u_{it} \]

\[ \bar{Y}_{i} = \beta_1 \bar{X}_{i} + \alpha_i + \bar{u}_{i} \]

\[ Y_{it} - \bar{Y}_i = \beta_1 ( X_{it} - \bar{X}_i) + \alpha_i + ( u_{it} - \bar{u}_i) \\ \tilde{Y}_{it} = \beta_1 \tilde{X}_{it} + \tilde{u}_{it} \]

If the following assumptions hold, then

The sampling distribution of the OLS estimator in the fixed effects regression model is normal in large samples. The variance of the estimates can be estimated and we can compute the standard errors, t-statistics and confidence intervals for coefficients.
We will now see how to estimate a fixed effects model using R and how to obtain a model summary that reports heteroskedasticity-robust standard errors. We will leave aside complicated formulas of the estimators.

Fixed Effects Regression: Application to Traffic Deaths

The simple fixed effects model for estimation of the relation between traffic fatality rates and the beer taxes is

\[ FatalityRate_{it} = \beta_1 BeerTax_{it} + StateFixedEffects + u_{it} \]

a regression of the traffic fatality rate on beer tax and 48 binary regressors - one for each state.

We can simply use the function lm() to obtain an estimate of $\beta_1$.

## 
## Call:
## lm(formula = fatal_rate ~ beertax + state - 1, data = Fatalities)
## 
## Coefficients:
## beertax  stateal  stateaz  statear  stateca  stateco  statect  statede  
## -0.6559   3.4776   2.9099   2.8227   1.9682   1.9933   1.6154   2.1700  
## statefl  statega  stateid  stateil  statein  stateia  stateks  stateky  
##  3.2095   4.0022   2.8086   1.5160   2.0161   1.9337   2.2544   2.2601  
## statela  stateme  statemd  statema  statemi  statemn  statems  statemo  
##  2.6305   2.3697   1.7712   1.3679   1.9931   1.5804   3.4486   2.1814  
## statemt  statene  statenv  statenh  statenj  statenm  stateny  statenc  
##  3.1172   1.9555   2.8769   2.2232   1.3719   3.9040   1.2910   3.1872  
## statend  stateoh  stateok  stateor  statepa  stateri  statesc  statesd  
##  1.8542   1.8032   2.9326   2.3096   1.7102   1.2126   4.0348   2.4739  
## statetn  statetx  stateut  statevt  stateva  statewa  statewv  statewi  
##  2.6020   2.5602   2.3137   2.5116   2.1874   1.8181   2.5809   1.7184  
## statewy  
##  3.2491

It is also possible to estimate $\beta_1$ by applying OLS to the demeaned data, that is, to run the regression

\[ \tilde{FatalityRate} = \beta_1 \tilde{BeerTax}_{it} + u_{it} \]

## 
## Call:
## lm(formula = fatal_rate ~ beertax - 1, data = fatal_demeaned)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58696 -0.08284 -0.00127  0.07955  0.89780 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## beertax  -0.6559     0.1739  -3.772 0.000191 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1757 on 335 degrees of freedom
## Multiple R-squared:  0.04074,    Adjusted R-squared:  0.03788 
## F-statistic: 14.23 on 1 and 335 DF,  p-value: 0.0001913

The function ave is for computing group averages. We use it to obtain state specific averages of of the fatality rate and the beer tax. Alternatively, one may use plm() from the package with the same name.

As for lm() we have to specify the regression formula and the data to be used in our call of plm(). Additionally, it is required to pass avector of names of entity and time ID variables to the argument index. For Fatalities, the ID variable for entities is named state and the time id variabe is year.

Since the fixed effects estimator is also called the within estimator, we set model = “within.”

The function coeftest() allows to obtain inference based on robust standard errors.

## 
## t test of coefficients:
## 
##         Estimate Std. Error t value Pr(>|t|)  
## beertax -0.65587    0.28880  -2.271  0.02388 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimated coefficient is again $-0.6559$. Note that plm() uses the entity-demeaned OLS algorithm and thus does not report dummy coefficients. The estimated regression function is \[ FatalityRate = -0.66 * BeerTax + StateFixedEffects. \]

The coefficient on BeerTax is negative and significant. The interpretation is that the estimated reduction in traffic fatalities due to an increase in the real beer tax by $1 is 0.66 per 10000 people, which is still pretty high. Although including state fixed effects eliminates the risk of a bias due to omitted factors that vary across states but not over time, we suspect that there are other omitted variables that ary over time and thus cause bias.

Regression with Time Fixed Effects

Controlling for variables that are constant across entities but vary over time can be done by including time fixed effects. If there are only time fixed effects, the fixed effects regression model becomes \[ Y_{it} = \beta_0 + \beta_1 X_{it} + \delta_2 B2_t + \delta_3 B3_t + ..... + \delta_T BT_t + u_{it} \]
Note that only $T-1$ dummies are included (B1 is omitted) since the model includes an intercept.
This model eliminates omitted variable bias caused by excluding unobserved variables that evolve over time but are constant across entities.
In some applications it is meaningful to include both entity and time fixed effects. The entity and time fixed effects model is \[ Y_{it} = \beta_0 + \beta_1 X_{it} + \gamma_2 D2_t + \gamma_3 D3_t + ..... + \gamma_T DT_t + + \delta_2 B2_t + \delta_3 B3_t + ..... + \delta_T BT_t + u_{it} \] The combined model allows to eliminate bias from unobserveables that change over time but are constant over entities AND it controls for factors that differ across entitities but are constant over time. Such models can be estimated using the OLS algorithm.
The following estimates the combined entity and time fixed effects model of the relation between fatalities and beer tax: \[ FatalityRate_{it} = \beta_1 * BeerTax_{it} + StateEffects + TimeFixedEffects + u_{it} \]

using both lm() and plm().

To estimate this regression with lm() since it is just an extension of the previous so we only have to adjust the formula argument by adding the additional regressor year for time fixed effects. In our call of plm() we set another argument effect=“twoways” for inclusion of entity and time dummies.

## 
## Call:
## lm(formula = fatal_rate ~ beertax + state + year - 1, data = Fatalities)
## 
## Coefficients:
##  beertax   stateal   stateaz   statear   stateca   stateco   statect   statede  
## -0.63998   3.51137   2.96451   2.87284   2.02618   2.04984   1.67125   2.22711  
##  statefl   statega   stateid   stateil   statein   stateia   stateks   stateky  
##  3.25132   4.02300   2.86242   1.57287   2.07123   1.98709   2.30707   2.31659  
##  statela   stateme   statemd   statema   statemi   statemn   statems   statemo  
##  2.67772   2.41713   1.82731   1.42335   2.04488   1.63488   3.49146   2.23598  
##  statemt   statene   statenv   statenh   statenj   statenm   stateny   statenc  
##  3.17160   2.00846   2.93322   2.27245   1.43016   3.95748   1.34849   3.22630  
##  statend   stateoh   stateok   stateor   statepa   stateri   statesc   statesd  
##  1.90762   1.85664   2.97776   2.36597   1.76563   1.26964   4.06496   2.52317  
##  statetn   statetx   stateut   statevt   stateva   statewa   statewv   statewi  
##  2.65670   2.61282   2.36165   2.56100   2.23618   1.87424   2.63364   1.77545  
##  statewy  year1983  year1984  year1985  year1986  year1987  year1988  
##  3.30791  -0.07990  -0.07242  -0.12398  -0.03786  -0.05090  -0.05180

## 
## t test of coefficients:
## 
##         Estimate Std. Error t value Pr(>|t|)  
## beertax -0.63998    0.35015 -1.8277  0.06865 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Before discussing the outcomes we convince ourselves that state and year are of the class factor.

## [1] "pseries" "factor"

## [1] "pseries" "factor"

The lm() functions converts factors into dummies automatically. Since we exclude the intercept by adding -1 to the right-hand side of the regression formula, lm() estimates coefficients for $n+(T-1)=48+6=54$ binary variables (6 year dummies and 48 state dummies). Again, plm() only reports the estimated coefficient on BeerTax.

The estimated regression function is \[ FatalityRate = -0.64 * BeerTax + StateEffects + TimeFixedEffects. \]

The result -0.66 is close to the estimated coefficient for the regression model including only entity fixed effects. Unsurprisingly, the coefficient is less precisely estimated but significantly different from zero at 10 percent.

We conclude that the estimated relationship between traffic fatalities and the real beer tax is not affected by omitted variable bias due to factors that are constant either over time or across states.

Lecture 4 - Panel Data

Introduction

Application: Traffic Deaths and Alcohol Taxes

“Before” and “After” Comparisons

Fixed Effects Regression: Estimation and Inference

Fixed Effects Regression: Application to Traffic Deaths

Regression with Time Fixed Effects

FE Assumptions