Q1

Use the data in rental (R package: wooldridge) for this question. The data on rental prices and other variables for college towns are for the years 1980 and 1990. The idea is to see whether a stronger presence of students affects rental rates. The unobserved effects model is log(rentit) = β0 + δ0y90t + β1 log(popit) + β2 log(avgincit) +β3pctstuit + ai + uit, where pop is city population, avginc is average income, and pctstu is student population as a percentage of city population (during the school year).

1.1

Estimate the equation by pooled OLS and report the results in standard form. What do you make of the estimate on the 1990 dummy variable? What do you get for βˆpctstu?

#str(rental)

model <- lrent~y90+lpop+lavginc+pctstu
#model2 <- lrent~y90+lpop+lavginc+pctstu+factor(year) #Alternative model just to check

fit.pl <- plm(model, data=rental, 
              index=c("city", "year"), effect="individual", model="pooling")
summary(fit.pl)
## Pooling Model
## 
## Call:
## plm(formula = model, data = rental, effect = "individual", model = "pooling", 
##     index = c("city", "year"))
## 
## Balanced Panel: n = 64, T = 2, N = 128
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -0.242332 -0.078236 -0.016416  0.043890  0.480817 
## 
## Coefficients:
##               Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept) -0.5688069  0.5348808 -1.0634    0.2897    
## y90          0.2622267  0.0347632  7.5432 8.781e-12 ***
## lpop         0.0406863  0.0225154  1.8070    0.0732 .  
## lavginc      0.5714461  0.0530981 10.7621 < 2.2e-16 ***
## pctstu       0.0050436  0.0010192  4.9486 2.401e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    14.058
## Residual Sum of Squares: 1.9501
## R-Squared:      0.86128
## Adj. R-Squared: 0.85677
## F-statistic: 190.922 on 4 and 123 DF, p-value: < 2.22e-16

For the Dummy we got y90 = 0.2622267, that represents the average difference between 1980 and 1990.

For (pctstu) we got 0.0050436, the student population as a percentage of city population as well as Year dummy are statistically significant.

1.2

Are the standard errors you report in (1.1) valid? Explain.

coeftest(fit.pl, vcovHC(fit.pl, cluster="group", type="HC0"))
## 
## t test of coefficients:
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -0.5688069  1.0153227 -0.5602 0.5763464    
## y90          0.2622267  0.0627318  4.1801 5.484e-05 ***
## lpop         0.0406863  0.0281426  1.4457 0.1507985    
## lavginc      0.5714461  0.1200039  4.7619 5.293e-06 ***
## pctstu       0.0050436  0.0014908  3.3831 0.0009618 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# for mode details https://data.library.virginia.edu/understanding-robust-standard-errors/

# Test for heteroscedasticity  
library(lmtest)

#perform Breusch-Pagan Test
bptest(fit.pl)
## 
##  studentized Breusch-Pagan test
## 
## data:  fit.pl
## BP = 5.9512, df = 4, p-value = 0.2028

As we can see from the robust standard errors the numbers change indicating that we have some heteroscedasticity. So we should use the second estimation.

1.3

Now, difference the equation and estimate by OLS. Compare your estimate of βpctstu with that from (1.1). Does the relative size of the student population appear to affect rental prices?

model.d <- clrent~y90+clpop+clavginc+cpctstu
fit.pl2 <- plm(model.d, data=rental, 
              index=c("city", "year"), effect="individual", model="pooling")
summary(fit.pl2)
## Pooling Model
## 
## Call:
## plm(formula = model.d, data = rental, effect = "individual", 
##     model = "pooling", index = c("city", "year"))
## 
## Balanced Panel: n = 64, T = 1, N = 64
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -0.186972 -0.062160 -0.014383  0.055183  0.237830 
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept) 0.3855214  0.0368245 10.4692 3.661e-15 ***
## clpop       0.0722456  0.0883426  0.8178  0.416714    
## clavginc    0.3099605  0.0664771  4.6627 1.788e-05 ***
## cpctstu     0.0112033  0.0041319  2.7114  0.008726 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    0.7191
## Residual Sum of Squares: 0.48736
## R-Squared:      0.32226
## Adj. R-Squared: 0.28837
## F-statistic: 9.50992 on 3 and 60 DF, p-value: 3.1362e-05

As we can see the coefficient of percentage of population students increased (0.0050436 from to 0.0112033) so yes, we can say that size influence.

Also is interesting to notice that clpop = 0.0722456 also cinreased from lpop = 0.0406863

1.4

Estimate the model by fixed effects. Do you get identical estimates and standard errors to those in (1.3)? Explain.

fit.fe <- plm(model, data=rental, 
              index=c("city", "year"), effect="individual", model="within")
summary(fit.fe)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = model, data = rental, effect = "individual", model = "within", 
##     index = c("city", "year"))
## 
## Balanced Panel: n = 64, T = 2, N = 128
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -0.118915 -0.029559  0.000000  0.029559  0.118915 
## 
## Coefficients:
##          Estimate Std. Error t-value  Pr(>|t|)    
## y90     0.3855214  0.0368245 10.4692 3.661e-15 ***
## lpop    0.0722456  0.0883426  0.8178  0.416714    
## lavginc 0.3099605  0.0664771  4.6627 1.788e-05 ***
## pctstu  0.0112033  0.0041319  2.7114  0.008726 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    10.383
## Residual Sum of Squares: 0.24368
## R-Squared:      0.97653
## Adj. R-Squared: 0.95032
## F-statistic: 624.146 on 4 and 60 DF, p-value: < 2.22e-16

Yes, we got exactly the same result as it should be.

That happens because we have only two years and the difference between them will equal to the FE model.

Q2

Use the data in driving (R package: wooldridge) for this question. This data includes state-level panel data (for the 48 continental U.S. states) from 1980 through 2004, for a total of 25 years. Various driving laws are indicated in the data set, including the alcohol level at which drivers are considered legally intoxicated. There are also indicators for “per se” laws—where licenses can be revoked without a trial—and seat belt laws. Some economics and demographic variables are also included.

drive.80 <- subset(driving$totfatrte, driving$d80 == 1)
drive.92 <- subset(driving$totfatrte, driving$d92 == 1)
drive.04 <- subset(driving$totfatrte, driving$d04 == 1)
mean(drive.80)
## [1] 25.49458
mean(drive.92)
## [1] 17.15792
mean(drive.04)
## [1] 16.72896
#summary(driving$bac08)

model <- (totfatrte~factor(year))

fit2.pl <- plm(model, data=driving, 
              index=c("state", "year"), effect="individual", model="pooling")
summary(fit2.pl)
## Pooling Model
## 
## Call:
## plm(formula = model, data = driving, effect = "individual", model = "pooling", 
##     index = c("state", "year"))
## 
## Balanced Panel: n = 48, T = 25, N = 1200
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -12.93021  -4.34682  -0.73052   3.74875  29.64979 
## 
## Coefficients:
##                  Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept)      25.49458    0.86712 29.4015 < 2.2e-16 ***
## factor(year)1981 -1.82438    1.22629 -1.4877 0.1370936    
## factor(year)1982 -4.55208    1.22629 -3.7121 0.0002152 ***
## factor(year)1983 -5.34167    1.22629 -4.3560 1.440e-05 ***
## factor(year)1984 -5.22708    1.22629 -4.2625 2.183e-05 ***
## factor(year)1985 -5.64313    1.22629 -4.6018 4.644e-06 ***
## factor(year)1986 -4.69417    1.22629 -3.8279 0.0001360 ***
## factor(year)1987 -4.71979    1.22629 -3.8488 0.0001251 ***
## factor(year)1988 -4.60292    1.22629 -3.7535 0.0001829 ***
## factor(year)1989 -5.72229    1.22629 -4.6663 3.418e-06 ***
## factor(year)1990 -5.98938    1.22629 -4.8841 1.182e-06 ***
## factor(year)1991 -7.39979    1.22629 -6.0343 2.137e-09 ***
## factor(year)1992 -8.33667    1.22629 -6.7983 1.681e-11 ***
## factor(year)1993 -8.36688    1.22629 -6.8229 1.425e-11 ***
## factor(year)1994 -8.33938    1.22629 -6.8005 1.656e-11 ***
## factor(year)1995 -7.82604    1.22629 -6.3819 2.512e-10 ***
## factor(year)1996 -8.12521    1.22629 -6.6258 5.246e-11 ***
## factor(year)1997 -7.88396    1.22629 -6.4291 1.863e-10 ***
## factor(year)1998 -8.22917    1.22629 -6.7106 3.007e-11 ***
## factor(year)1999 -8.24417    1.22629 -6.7228 2.774e-11 ***
## factor(year)2000 -8.66896    1.22629 -7.0692 2.666e-12 ***
## factor(year)2001 -8.70188    1.22629 -7.0961 2.214e-12 ***
## factor(year)2002 -8.46500    1.22629 -6.9029 8.316e-12 ***
## factor(year)2003 -8.73104    1.22629 -7.1199 1.877e-12 ***
## factor(year)2004 -8.76563    1.22629 -7.1481 1.542e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    48612
## Residual Sum of Squares: 42407
## R-Squared:      0.12765
## Adj. R-Squared: 0.10983
## F-statistic: 7.16387 on 24 and 1175 DF, p-value: < 2.22e-16

totfatrte: total fatalities per 100,000 population

mean(drive.80) = 25.49458 mean(drive.92) = 17.15792 mean(drive.04) = 16.72896

From regression, year 80 became your base year, so to analyse if it became safer over the years we analyse if the coefficients of the next years are increasing in magnitude and have a negative signal. As we can see that happened, but after 1992 the changes are very small, so we can say that is stable after that point. It is important to notice that Robust Standard also change as we see below.

coeftest(fit2.pl, vcovHC(fit2.pl, cluster="group", type="HC0"))
## 
## t test of coefficients:
## 
##                  Estimate Std. Error  t value  Pr(>|t|)    
## (Intercept)      25.49458    1.14518  22.2626 < 2.2e-16 ***
## factor(year)1981 -1.82438    0.41927  -4.3513 1.471e-05 ***
## factor(year)1982 -4.55208    0.44091 -10.3244 < 2.2e-16 ***
## factor(year)1983 -5.34167    0.49913 -10.7020 < 2.2e-16 ***
## factor(year)1984 -5.22708    0.58189  -8.9829 < 2.2e-16 ***
## factor(year)1985 -5.64313    0.61371  -9.1951 < 2.2e-16 ***
## factor(year)1986 -4.69417    0.71110  -6.6013 6.155e-11 ***
## factor(year)1987 -4.71979    0.77174  -6.1158 1.306e-09 ***
## factor(year)1988 -4.60292    0.72186  -6.3765 2.599e-10 ***
## factor(year)1989 -5.72229    0.73854  -7.7481 2.007e-14 ***
## factor(year)1990 -5.98938    0.71881  -8.3323 < 2.2e-16 ***
## factor(year)1991 -7.39979    0.74791  -9.8940 < 2.2e-16 ***
## factor(year)1992 -8.33667    0.77725 -10.7259 < 2.2e-16 ***
## factor(year)1993 -8.36688    0.81950 -10.2097 < 2.2e-16 ***
## factor(year)1994 -8.33938    0.75037 -11.1136 < 2.2e-16 ***
## factor(year)1995 -7.82604    0.72411 -10.8078 < 2.2e-16 ***
## factor(year)1996 -8.12521    0.73837 -11.0042 < 2.2e-16 ***
## factor(year)1997 -7.88396    0.76859 -10.2577 < 2.2e-16 ***
## factor(year)1998 -8.22917    0.77045 -10.6810 < 2.2e-16 ***
## factor(year)1999 -8.24417    0.76633 -10.7579 < 2.2e-16 ***
## factor(year)2000 -8.66896    0.81170 -10.6800 < 2.2e-16 ***
## factor(year)2001 -8.70188    0.72123 -12.0654 < 2.2e-16 ***
## factor(year)2002 -8.46500    0.74538 -11.3567 < 2.2e-16 ***
## factor(year)2003 -8.73104    0.76602 -11.3980 < 2.2e-16 ***
## factor(year)2004 -8.76563    0.77733 -11.2766 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.2

Add the variables bac08, bac10, perse, sbprim, sbsecon, sl70plus, gdl, perc14 24, unem, and vehicmilespc to the regression from (2.1). Interpret the coefficients on bac8 and bac10. Do per se laws have a negative effect on the fatality rate? What about having a primary seat belt law? (Note that if a law was enacted sometime within a year the fraction of the year is recorded in place of the zero-one indicator.)

model <- (totfatrte~factor(year)+bac08+ bac10+ perse+ sbprim+ sbsecon+ sl70plus+ gdl+ perc14_24+ unem+ vehicmilespc)
fit3.pl <- plm(model, data=driving, 
              index=c("state", "year"), effect="individual", model="pooling")
summary(fit3.pl)
## Pooling Model
## 
## Call:
## plm(formula = model, data = driving, effect = "individual", model = "pooling", 
##     index = c("state", "year"))
## 
## Balanced Panel: n = 48, T = 25, N = 1200
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -14.91602  -2.73839  -0.27779   2.28591  21.42027 
## 
## Coefficients:
##                     Estimate  Std. Error  t-value  Pr(>|t|)    
## (Intercept)      -2.7161e+00  2.4758e+00  -1.0970 0.2728472    
## factor(year)1981 -2.1755e+00  8.2761e-01  -2.6286 0.0086859 ** 
## factor(year)1982 -6.5960e+00  8.5340e-01  -7.7290 2.330e-14 ***
## factor(year)1983 -7.3967e+00  8.6902e-01  -8.5115 < 2.2e-16 ***
## factor(year)1984 -5.8504e+00  8.7634e-01  -6.6760 3.792e-11 ***
## factor(year)1985 -6.4833e+00  8.9480e-01  -7.2455 7.820e-13 ***
## factor(year)1986 -5.8528e+00  9.3067e-01  -6.2888 4.516e-10 ***
## factor(year)1987 -6.3674e+00  9.6696e-01  -6.5850 6.869e-11 ***
## factor(year)1988 -6.5916e+00  1.0137e+00  -6.5024 1.170e-10 ***
## factor(year)1989 -8.0710e+00  1.0526e+00  -7.6675 3.684e-14 ***
## factor(year)1990 -8.9587e+00  1.0770e+00  -8.3185 2.463e-16 ***
## factor(year)1991 -1.1069e+01  1.1012e+00 -10.0517 < 2.2e-16 ***
## factor(year)1992 -1.2878e+01  1.1225e+00 -11.4728 < 2.2e-16 ***
## factor(year)1993 -1.2731e+01  1.1363e+00 -11.2038 < 2.2e-16 ***
## factor(year)1994 -1.2365e+01  1.1572e+00 -10.6849 < 2.2e-16 ***
## factor(year)1995 -1.1953e+01  1.1836e+00 -10.0985 < 2.2e-16 ***
## factor(year)1996 -1.3876e+01  1.2233e+00 -11.3430 < 2.2e-16 ***
## factor(year)1997 -1.4258e+01  1.2498e+00 -11.4085 < 2.2e-16 ***
## factor(year)1998 -1.5042e+01  1.2655e+00 -11.8861 < 2.2e-16 ***
## factor(year)1999 -1.5091e+01  1.2843e+00 -11.7499 < 2.2e-16 ***
## factor(year)2000 -1.5444e+01  1.3053e+00 -11.8314 < 2.2e-16 ***
## factor(year)2001 -1.6184e+01  1.3340e+00 -12.1314 < 2.2e-16 ***
## factor(year)2002 -1.6724e+01  1.3480e+00 -12.4065 < 2.2e-16 ***
## factor(year)2003 -1.7021e+01  1.3595e+00 -12.5206 < 2.2e-16 ***
## factor(year)2004 -1.6711e+01  1.3870e+00 -12.0488 < 2.2e-16 ***
## bac08            -2.4985e+00  5.3751e-01  -4.6483 3.729e-06 ***
## bac10            -1.4176e+00  3.9633e-01  -3.5768 0.0003622 ***
## perse            -6.2011e-01  2.9820e-01  -2.0795 0.0377907 *  
## sbprim           -7.5335e-02  4.9078e-01  -0.1535 0.8780318    
## sbsecon           6.7280e-02  4.2930e-01   0.1567 0.8754918    
## sl70plus          3.3479e+00  4.4517e-01   7.5205 1.086e-13 ***
## gdl              -4.2691e-01  5.2691e-01  -0.8102 0.4179781    
## perc14_24         1.4159e-01  1.2268e-01   1.1542 0.2486752    
## unem              7.5705e-01  7.7906e-02   9.7176 < 2.2e-16 ***
## vehicmilespc      2.9254e-03  9.4968e-05  30.8042 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    48612
## Residual Sum of Squares: 19067
## R-Squared:      0.60778
## Adj. R-Squared: 0.59633
## F-statistic: 53.0957 on 34 and 1165 DF, p-value: < 2.22e-16

bac08 = -2.4985e+00
bac10 = -1.4176e+00

These coefficients indicates the blood alcohol limit 0.08 and 0.1

Since we are doing the regression against total fatality rate we expect that as the blood alcohol limit get lower the fatalities also get lower. Less alcohol in the blood, less deaths.

sbprim = -7.5335e-02 sbsecon = 6.7280e-02

As for the seat belt the first law has a negative effect meaning reduction in death, but the second has a positive number. But is important to notice that both are not significant.

2.3

Reestimate the model from (2.2) using fixed effects (at the state level). How do the coefficients on bac08, bac10, perse, and sbprim compare with the pooled OLS estimates? Which set of estimates do you think is more reliable? Explain.

model <- (totfatrte~factor(year)+bac08+ bac10+ perse+ sbprim+ sbsecon+ sl70plus+ gdl+ perc14_24+ unem+ vehicmilespc)

fit3.fe <- plm(model, data=driving, 
              index=c("state", "year"), effect="individual", model="within")
summary(fit3.fe)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = model, data = driving, effect = "individual", model = "within", 
##     index = c("state", "year"))
## 
## Balanced Panel: n = 48, T = 25, N = 1200
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -8.4273592 -1.0258600 -0.0029547  0.9572345 14.8109310 
## 
## Coefficients:
##                     Estimate  Std. Error  t-value  Pr(>|t|)    
## factor(year)1981 -1.51107133  0.41321486  -3.6569 0.0002672 ***
## factor(year)1982 -3.02549578  0.44243119  -6.8383 1.316e-11 ***
## factor(year)1983 -3.50360069  0.45657705  -7.6736 3.628e-14 ***
## factor(year)1984 -4.25936110  0.46494255  -9.1610 < 2.2e-16 ***
## factor(year)1985 -4.72679311  0.48547032  -9.7365 < 2.2e-16 ***
## factor(year)1986 -3.66118539  0.51769787  -7.0721 2.686e-12 ***
## factor(year)1987 -4.30578838  0.55532856  -7.7536 2.001e-14 ***
## factor(year)1988 -4.76712131  0.60155650  -7.9246 5.501e-15 ***
## factor(year)1989 -6.12997263  0.64019069  -9.5752 < 2.2e-16 ***
## factor(year)1990 -6.22973766  0.66485076  -9.3701 < 2.2e-16 ***
## factor(year)1991 -6.91714040  0.68195432 -10.1431 < 2.2e-16 ***
## factor(year)1992 -7.77417239  0.70288580 -11.0604 < 2.2e-16 ***
## factor(year)1993 -8.09410864  0.71594741 -11.3055 < 2.2e-16 ***
## factor(year)1994 -8.50421668  0.73410866 -11.5844 < 2.2e-16 ***
## factor(year)1995 -8.25540198  0.75623634 -10.9164 < 2.2e-16 ***
## factor(year)1996 -8.60661913  0.79594975 -10.8130 < 2.2e-16 ***
## factor(year)1997 -8.70781739  0.81975686 -10.6224 < 2.2e-16 ***
## factor(year)1998 -9.34924025  0.83373487 -11.2137 < 2.2e-16 ***
## factor(year)1999 -9.47489124  0.84399083 -11.2263 < 2.2e-16 ***
## factor(year)2000 -9.99185979  0.85606370 -11.6719 < 2.2e-16 ***
## factor(year)2001 -9.63121721  0.87255395 -11.0380 < 2.2e-16 ***
## factor(year)2002 -8.90673015  0.88205263 -10.0977 < 2.2e-16 ***
## factor(year)2003 -8.93650263  0.88994687 -10.0416 < 2.2e-16 ***
## factor(year)2004 -9.33936116  0.91107045 -10.2510 < 2.2e-16 ***
## bac08            -1.43722116  0.39421213  -3.6458 0.0002788 ***
## bac10            -1.06266776  0.26883763  -3.9528 8.208e-05 ***
## perse            -1.15161719  0.23398721  -4.9217 9.867e-07 ***
## sbprim           -1.22739974  0.34271485  -3.5814 0.0003564 ***
## sbsecon          -0.34970784  0.25217091  -1.3868 0.1657826    
## sl70plus         -0.06253283  0.26931063  -0.2322 0.8164283    
## gdl              -0.41177619  0.29257391  -1.4074 0.1595790    
## perc14_24         0.18712169  0.09509969   1.9676 0.0493567 *  
## unem             -0.57183997  0.06057851  -9.4397 < 2.2e-16 ***
## vehicmilespc      0.00094005  0.00011104   8.4656 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    12134
## Residual Sum of Squares: 4535.3
## R-Squared:      0.62624
## Adj. R-Squared: 0.59916
## F-statistic: 55.0943 on 34 and 1118 DF, p-value: < 2.22e-16

The coefficients became smaller.

I would use the FE because we know that each state have different characteristics and FE can deal with that.

2.4

Suppose that vehicmilespc, the number of miles driven per capita, increases by 1,000. Using the FE estimates, what is the estimated effect on totf atrte? Be sure to interpret the estimate as if explaining to a layperson.

Answer

The coefficient is 0.00094005 what means that by driving 1.000 we would expect a increase of 0.94 fatalities for each 100.000 people.

I am assuming that the per capita will follow the same idea of the other numbers, because the data set help has not explanation of vehicmilespc

2.5

If there is serial correlation or heteroskedasticity in the idiosyncratic errors of the model, are the standard errors in (2.3) valid? If we use robust standard errors for the fixed effects estimates, what happens to the statistical significance of the policy variables in (2.3)?

coeftest(fit3.fe, vcovHC(fit3.fe, cluster="group", type="HC0"))
## 
## t test of coefficients:
## 
##                     Estimate  Std. Error  t value  Pr(>|t|)    
## factor(year)1981 -1.51107133  0.43628851  -3.4635 0.0005534 ***
## factor(year)1982 -3.02549578  0.48661275  -6.2175 7.123e-10 ***
## factor(year)1983 -3.50360069  0.50722346  -6.9074 8.267e-12 ***
## factor(year)1984 -4.25936110  0.43860375  -9.7112 < 2.2e-16 ***
## factor(year)1985 -4.72679311  0.45737157 -10.3347 < 2.2e-16 ***
## factor(year)1986 -3.66118539  0.58416846  -6.2673 5.234e-10 ***
## factor(year)1987 -4.30578838  0.67027727  -6.4239 1.961e-10 ***
## factor(year)1988 -4.76712131  0.75197856  -6.3394 3.339e-10 ***
## factor(year)1989 -6.12997263  0.85707933  -7.1522 1.541e-12 ***
## factor(year)1990 -6.22973766  0.90514332  -6.8826 9.773e-12 ***
## factor(year)1991 -6.91714040  0.97365427  -7.1043 2.149e-12 ***
## factor(year)1992 -7.77417239  1.06203082  -7.3201 4.725e-13 ***
## factor(year)1993 -8.09410864  1.08994968  -7.4261 2.212e-13 ***
## factor(year)1994 -8.50421668  1.07290343  -7.9264 5.430e-15 ***
## factor(year)1995 -8.25540198  1.15959440  -7.1192 1.938e-12 ***
## factor(year)1996 -8.60661913  1.14818594  -7.4958 1.337e-13 ***
## factor(year)1997 -8.70781739  1.19826853  -7.2670 6.884e-13 ***
## factor(year)1998 -9.34924025  1.21240453  -7.7113 2.742e-14 ***
## factor(year)1999 -9.47489124  1.33589783  -7.0925 2.331e-12 ***
## factor(year)2000 -9.99185979  1.31554509  -7.5952 6.469e-14 ***
## factor(year)2001 -9.63121721  1.43580699  -6.7079 3.130e-11 ***
## factor(year)2002 -8.90673015  1.44602568  -6.1595 1.017e-09 ***
## factor(year)2003 -8.93650263  1.48817832  -6.0050 2.584e-09 ***
## factor(year)2004 -9.33936116  1.62460726  -5.7487 1.159e-08 ***
## bac08            -1.43722116  0.80266842  -1.7906 0.0736354 .  
## bac10            -1.06266776  0.47955910  -2.2159 0.0268973 *  
## perse            -1.15161719  0.43310154  -2.6590 0.0079493 ** 
## sbprim           -1.22739974  0.54500980  -2.2521 0.0245113 *  
## sbsecon          -0.34970784  0.36181365  -0.9665 0.3339824    
## sl70plus         -0.06253283  0.55279337  -0.1131 0.9099545    
## gdl              -0.41177619  0.37386299  -1.1014 0.2709556    
## perc14_24         0.18712169  0.16965909   1.1029 0.2702960    
## unem             -0.57183997  0.11744736  -4.8689 1.283e-06 ***
## vehicmilespc      0.00094005  0.00033858   2.7765 0.0055868 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The statistical significance drops. that is a evidence that we have problems in model 2.3. Although it is already better than the first model. Because we are using cluster we are dealing with serial correlation and heteroskedasticity at the same time.

Q3

Suppose that, for one semester, you can collect the following data on a random sample of college juniors and seniors for each class taken: a standardized final exam score, percentage of lectures attended, a dummy variable indicating whether the class is within the student’s major, cumulative grade point average prior to the start of the semester, and SAT score

3.1

Write a model that explains final exam performance in terms of attendance and the other characteristics. Use s to subscript student and c to subscript class. Which variables do not change within a student?

Answer

final: standardized final exam score attendance: percentage of lectures attended major: dummy variable indicating whether the class is within the student’s major sumgrade: cumulative grade point average prior to the start of the semester sat: SAT score

final = β0 + β1attendance_sc + β2major_sc + β3sumgrade_s + β4sat_s + ui

Major is fixed after the student choose the course.

3.2

If you pool all of the data and use OLS, what are you assuming about unobserved student characteristics that affect performance and attendance rate? What roles do SAT score and prior GPA play in this regard?

Answer

You would assume that unobserved student characteristics that affect performance and attendance rate are not correlated and don’t affect your model. You will be also assuming homogeneity.

You would expect that SAT and GPA are ways to measure previous student ability, so you are assuming that there is no serial correlation. SAT and GPA would be your base point to compare if attendance changed the final exam score.

SAT and GPA may have multicollinearity problem since they measure the same thing.

The model probably would not be correct.

3.3

If you think SAT score and prior GPA do not adequately capture student ability, how would you estimate the effect of attendance on final exam performance?

Answer

First we need to know why do you think that SAT and GPA are not adequately. If you think that they are not exogenous, than you would like to look for a IV. If the reason is because you think that there is no correlation so no explanation power, you would like to change the model and look for another possible explanatory factors.

If I would like to capture student ability I would use other variables, it could be participation, previews student records, but I believe the best way would be to collect a panel data so you would be able to control for many unobserved factors using FE.

Q4

Use the data in DemocracyIncome (R package: pder) for this question. This panel data includes 5-year observations of 211 countries from 1950 to 2000. In a widely cited and very influential article, Acemoglu et al. (2008) revisit the relationship between income per capita and democracy. Previous investigations had found a strong positive statistical association between income and democracy across countries. This was typically interpreted as evidence in favor of Lipset’s (1959) “Modernization Hypothesis.” Acemoglu et al. (2008) estimate the linear relationship between income and democracy by exploiting within-country variation over time and find that the positive association between income and democracy disappears in data for the postwar period, 1960 to 2000, once country and time fixed effects are explicitly accounted for. Their rather precise point estimate of zero is robust to extensive robustness checks.

library(pder)
data(DemocracyIncome)
str(DemocracyIncome)
## 'data.frame':    2321 obs. of  5 variables:
##  $ country  : Factor w/ 211 levels "Afghanistan",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ year     : Factor w/ 11 levels "1950-1954","1955-1959",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ democracy: num  NA NA NA NA 0.5 NA NA NA NA 1 ...
##  $ income   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sample   : int  0 0 1 1 1 1 1 1 1 1 ...
demo <- subset(DemocracyIncome, sample == 1)
#working
# Pooled Column 1
fit1.pl <- plm(democracy ~ lag(democracy)+lag(income)+factor(year), data=DemocracyIncome, 
              index=c("country", "year"), effect="individual", model="pooling",subset = sample == 1)
coef1 <- coeftest(fit1.pl, vcovHC(fit1.pl, cluster="group", type="HC0"))
coef1 <- coef1[3]/(1-coef1[2]) #LogGDP_per_capita_t-1 / (1-democracy_t-1)

#Fixed Effects column 2

fit2.fe <- plm(democracy ~ lag(democracy) + lag(income)+ factor(year) - 1,DemocracyIncome, index = c("country", "year"),
           model = "within", effect = "twoways", subset = sample == 1)
coef2 <- coeftest(fit2.fe, vcovHC(fit2.fe, cluster="group", type="HC0"))
coef2 <- coef2[2]/(1-coef2[1]) #LogGDP_per_capita_t-1 / (1-democracy_t-1)

# Anderson-Hisao IV Column 3
fit3.IV <- plm(diff(democracy) ~ lag(diff(democracy)) + lag(diff(income)) + factor(year) - 1 | lag(democracy, 2) + 
                lag(income, 2) + year - 1, DemocracyIncome, index = c("country", "year"), model = "pooling", 
              subset = sample == 1)
coef3 <- coeftest(fit3.IV, vcovHC(fit3.IV, cluster="group", type="HC0"))
coef3 <- coef3[2]/(1-coef3[1]) #LogGDP_per_capita_t-1 / (1-democracy_t-1)

## Column 4
diff1 <- pgmm(democracy ~ lag(democracy) + lag(income) |
lag(democracy, 2:8)| lag(income, 2),
DemocracyIncome, index=c("country", "year"),collapse = FALSE,
model="onestep", effect="twoways", subset = sample == 1,transformation = "d")

summary(diff1, robust = TRUE, time.dummies = FALSE)
## Twoways effects One-step model Difference GMM 
## 
## Call:
## pgmm(formula = democracy ~ lag(democracy) + lag(income) | lag(democracy, 
##     2:8) | lag(income, 2), data = DemocracyIncome, subset = sample == 
##     1, effect = "twoways", model = "onestep", collapse = FALSE, 
##     transformation = "d", index = c("country", "year"))
## 
## Balanced Panel: n = 211, T = 11, N = 2321
## 
## Number of Observations Used: 838
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -1.236110  0.000000  0.000000  0.000527  0.000000  0.996764 
## 
## Coefficients:
##                 Estimate Std. Error z-value  Pr(>|z|)    
## lag(democracy)  0.483741   0.092883  5.2081 1.908e-07 ***
## lag(income)    -0.135445   0.099077 -1.3671    0.1716    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sargan test: chisq(41) = 71.12547 (p-value = 0.0024325)
## Autocorrelation test (1): normal = -6.020494 (p-value = 1.7389e-09)
## Autocorrelation test (2): normal = 0.7520725 (p-value = 0.45201)
## Wald test for coefficients: chisq(2) = 34.24209 (p-value = 3.668e-08)
## Wald test for time dummies: chisq(9) = 46.33509 (p-value = 5.2181e-07)
coef4 <- coeftest(diff1, vcovHC(diff1, cluster="group", type="HC0"))
coef4 <- coef4[2]/(1-coef4[1]) #LogGDP_per_capita_t-1 / (1-democracy_t-1)


# Fixed Effect Column 5
fit5.fe <- plm(democracy ~ lag(income)+factor(year), data=DemocracyIncome, 
              index=c("country", "year"), effect="individual", model="within",subset = sample == 1)
coef5 <- coeftest(fit5.fe, vcovHC(fit5.fe, cluster="group", type="HC0"))
coef5
## 
## t test of coefficients:
## 
##                          Estimate  Std. Error t value  Pr(>|t|)    
## lag(income)            0.05358777  0.04209102  1.2731  0.203339    
## factor(year)1965-1969  0.00023471  0.02075179  0.0113  0.990979    
## factor(year)1970-1974 -0.12680764  0.03377169 -3.7549  0.000186 ***
## factor(year)1975-1979 -0.14772639  0.03671786 -4.0233 6.284e-05 ***
## factor(year)1980-1984 -0.09782203  0.03525431 -2.7748  0.005653 ** 
## factor(year)1985-1989 -0.08710245  0.03879196 -2.2454  0.025017 *  
## factor(year)1990-1994 -0.04212164  0.03501979 -1.2028  0.229412    
## factor(year)1995-1999  0.00956458  0.04226699  0.2263  0.821034    
## factor(year)2000-2004  0.03236363  0.04285654  0.7552  0.450374    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
stargazer(fit1.pl, fit2.fe, fit3.IV, diff1, fit5.fe,
          column.labels = c("OLS", "Within", "IV", "Arellano-Bond", "FE-OLS"),
          column.separate = c(1,1,1,1,1),
          dep.var.labels = "Democracyincome",
          omit.stat = c("adj.rsq", "f"),
          title = "Estimation for Dynamic Panel Models",
          label = "tab:dynamic",
          no.space = TRUE, type="text"
)
## 
## Estimation for Dynamic Panel Models
## ================================================================================
##                                          Dependent variable:                    
##                       ----------------------------------------------------------
##                        Democracyincome   diff(democracy)        democracy       
##                             panel             panel          panel       panel  
##                             linear           linear           GMM       linear  
##                          OLS     Within        IV        Arellano-Bond  FE-OLS  
##                          (1)      (2)          (3)            (4)         (5)   
## --------------------------------------------------------------------------------
## lag(democracy)        0.706***  0.379***                   0.484***             
##                        (0.024)  (0.033)                     (0.093)             
## lag(income)           0.072***   0.010                      -0.135      0.054*  
##                        (0.008)  (0.026)                     (0.099)     (0.028) 
## lag(diff(democracy))                        0.469***                            
##                                              (0.118)                            
## lag(diff(income))                            -0.104                             
##                                              (0.305)                            
## factor(year)1960-1964                         0.062                             
##                                              (0.061)                            
## factor(year)1965-1969 -0.083**               -0.027                     0.0002  
##                        (0.035)               (0.054)                    (0.035) 
## factor(year)1970-1974 -0.199***              -0.090*                   -0.127***
##                        (0.034)               (0.052)                    (0.035) 
## factor(year)1975-1979 -0.112***               0.059                    -0.148***
##                        (0.034)               (0.047)                    (0.036) 
## factor(year)1980-1984  -0.050                0.090**                   -0.098***
##                        (0.033)               (0.040)                    (0.037) 
## factor(year)1985-1989 -0.073**                0.003                    -0.087** 
##                        (0.033)               (0.040)                    (0.038) 
## factor(year)1990-1994  -0.053                 0.035                     -0.042  
##                        (0.033)               (0.026)                    (0.038) 
## factor(year)1995-1999  -0.032                 0.044                      0.010  
##                        (0.033)               (0.033)                    (0.039) 
## factor(year)2000-2004  -0.056*                0.003                      0.032  
##                        (0.032)               (0.028)                    (0.040) 
## Constant              -0.347***                                                 
##                        (0.064)                                                  
## --------------------------------------------------------------------------------
## Observations             945      945          838            211         958   
## R2                      0.725    0.144        0.005                      0.118  
## ================================================================================
## Note:                                                *p<0.1; **p<0.05; ***p<0.01
stargazer(coef1, coef2, coef3, coef4,
          column.labels = c("OLS", "Within", "IV", "Arellano-Bond"),
          dep.var.labels = "Democracyincome",
          omit.stat = c("adj.rsq", "f"),
          title = "Implied Cummulative effect of income",
          label = "tab:dynamic",
          no.space = TRUE, type="text"
)
## 
## Implied Cummulative effect of income
## =====
## 0.246
## -----
## 
## Implied Cummulative effect of income
## =====
## 0.017
## -----
## 
## Implied Cummulative effect of income
## ======
## -0.195
## ------
## 
## Implied Cummulative effect of income
## ======
## -0.262
## ------

Answer

I reported all the columns and I was able to correct identify the regression on column 4 with a rough approximation. I got some coefficients perfect matching (N=838). I also included further investigations at the end of this notebook. All the other columns are perfect match.

4.2

In Equation (1) of the original paper, why do the authors include the influence of the lagged value of the dependent variable, di,t−1? How to interpret the coefficient γ?

Asnwer

They included because they believed that previews Democracy would help to explain the future Democracy.

The coefficient measures the causal effect of income per capita on democracy.

” to capture persistence in democracy and also potentially mean-reverting dynamics (i.e., the tendency of the democracy score to return to some equilibrium value for the country)”

4.3

Although the fixed effects estimation in (4.1) is useful in removing the influence of long-run determinants of both democracy and income, it does not necessarily estimate the causal effect of income on democracy. Discuss possible sources of endogeneity bias in the fixed effect estimation. What econometric method can be used to correct the endogeneity bias?

Asnwer

Because this is a complex relationship to be established many things could influence. FE can take care of fixed effects over time or within countries, however other variables that are fixed, but influence democracy could not be used in FE method because the way FE works. (things that don’t change cannot be used in this model).

Some sources that the authors mention are the collonization process, density previous population, and development paths. Also FE can not deal with Nickell’s Bias when N is large and T is fixed, this is the case here.

Another important econometric method that can be used to deal with that would be IV.

4.4

Based on the results of Acemoglu et al. (2008), what do you think of the relationship between income and democracy?

Answer

The direct interpretation is that using some techniques the result is significant, but using others it is not. The interesting thing is that column 3 and 4 are actually significant. Meaning Previous democracy can explain future democracy. However Income can not.

I believe they did a very good job of explaining Democracy in terms of income. However, as they mentioned many other factors could be leading to untruthful results due to endogeneity problem.

The study is interesting, but I believe additional data could be used to strength the results.

Further investigation

For GMM I also tried system GMM using only one step because twosteps is very sensitive to small sample. However, Using System GMM we actually get significant effect of income in democracy for one step or two steps. That may be due miss specification.

diff1 <- pgmm(democracy ~ lag(democracy) + lag(income) |
lag(democracy, 2:8) | lag(income, 2),
DemocracyIncome, index=c("country", "year"),collapse = FALSE,
model="onestep", effect="twoways", subset = sample == 1,transformation = "ld")

summary(diff1, robust = TRUE, time.dummies = FALSE)
## Twoways effects One-step model System GMM 
## 
## Call:
## pgmm(formula = democracy ~ lag(democracy) + lag(income) | lag(democracy, 
##     2:8) | lag(income, 2), data = DemocracyIncome, subset = sample == 
##     1, effect = "twoways", model = "onestep", collapse = FALSE, 
##     transformation = "ld", index = c("country", "year"))
## 
## Balanced Panel: n = 211, T = 11, N = 2321
## 
## Number of Observations Used: 1826
## Residuals:
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1.2949814  0.0000000  0.0000000  0.0005865  0.0000000  1.1016058 
## 
## Coefficients:
##                Estimate Std. Error z-value  Pr(>|z|)    
## lag(democracy) 0.575544   0.062178  9.2565 < 2.2e-16 ***
## lag(income)    0.121765   0.017648  6.8998 5.206e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sargan test: chisq(51) = 84.87646 (p-value = 0.0020301)
## Autocorrelation test (1): normal = -4.670188 (p-value = 3.0092e-06)
## Autocorrelation test (2): normal = 0.9663345 (p-value = 0.33388)
## Wald test for coefficients: chisq(2) = 969.9313 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(9) = 51.37138 (p-value = 5.9421e-08)
coef4 <- coeftest(diff1, vcovHC(diff1, cluster="group", type="HC0"))
coef4
## 
## z test of coefficients:
## 
##                 Estimate Std. Error z value  Pr(>|z|)    
## lag(democracy)  0.575544   0.062178  9.2565 < 2.2e-16 ***
## lag(income)     0.121765   0.017648  6.8998 5.206e-12 ***
## (Intercept)    -0.693048   0.113025 -6.1318 8.687e-10 ***
## 1960-1964       0.038134   0.032363  1.1783  0.238672    
## 1965-1969      -0.025873   0.025459 -1.0163  0.309497    
## 1970-1974      -0.156614   0.036649 -4.2734 1.926e-05 ***
## 1975-1979      -0.107179   0.033973 -3.1549  0.001606 ** 
## 1980-1984      -0.045852   0.033125 -1.3842  0.166292    
## 1985-1989      -0.065611   0.037082 -1.7694  0.076832 .  
## 1990-1994      -0.043940   0.031814 -1.3812  0.167225    
## 1995-1999      -0.022388   0.036363 -0.6157  0.538107    
## 2000-2004      -0.042280   0.030635 -1.3802  0.167540    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coef4 <- coef4[2]/(1-coef4[1]) #LogGDP_per_capita_t-1 / (1-democracy_t-1)
coef4
## [1] 0.2868743

Aditional work

According to more recent articles (Acemoglu et. al, 2016) Democracy does cause growth. In this new study they found a complementary relationship to the one we studied here.

For future work (Not working)

Aditional Analysis

Maximum likelihood for cross-lagged panels models with fixed effects

For model specification DSEM in Mplus for estimation without all the specifications (xtdpdml in Stata)

# cig_ivreg3 <- ivreg(log(packs) ~ log(rprice) + log(rincome) | 
#                       log(rincome) + salestax + cigtax, data = c1995)
# 
# coeftest(cig_ivreg3, vcov = vcovHC, type = "HC1")
# 
# library(sem)
# library(AER)             # load package; to run IV regression; also contains data
# library(stargazer)        # load package; to put regression results into a single stargazer table
# library(estimatr)
# 
# #https://www.analyticsvidhya.com/blog/2018/07/introductory-guide-maximum-likelihood-estimation-case-study-r/
# 
# nll <- function(theta0,theta1) {
#     x <- DemocracyIncome$income[-idx]
#     y <- DemocracyIncome$democracy[-idx]
#     mu = exp(theta0 + x*theta1)
#     -sum(y*(log(mu)) - mu)
# }
# 
# set.seed(200)
# idx <- createDataPartition(DemocracyIncome$democracy, p=0.25,list=FALSE)
# 
# 
# ## IV
# 
# IV <- iv_robust(diff(democracy) ~ lag(diff(democracy)) + lag(diff(income))   | lag(diff(democracy)), data = DemocracyIncome, diagnostics = TRUE)
# 
# fit2.fe <- ivreg(democracy ~ lag(democracy) + lag(income)+ factor(year) ,DemocracyIncome, index = c("country", "year"),
#             subset = sample == 1)
# coef2 <- coeftest(fit2.fe, vcovHC(fit2.fe, cluster="group", type="HC0"))
# coef2
# coef2 <- coef2[2]/(1-coef2[1]) #LogGDP_per_capita_t-1 / (1-democracy_t-1)
# install.packages("bbmble")
# library(bbmble)
# 
# # Examine estimates
# MLE_par <- MLE_estimates$par
# MLE_SE <- sqrt(diag(solve(MLE_estimates$hessian)))
# MLE <- data.table(param = c("beta", "sigma_2"),
#                   estimates = MLE_par,
#                   sd = MLE_SE)
# 
# kable(MLE)
# 
# #Hausman-Taylor
# 
# fit4.ht <- pht(democracy ~ lag(democracy)+lag(income)+ year| lag(income, 2) + year - 1,
# data= DemocracyIncome, index = c("country", "year"), model = c("ht","am,bms"), subset = sample == 1) 
# 
# #effect="individual", model="ht")
# summary(fit4.ht)
# 
# coeftest(fit4.ht, vcovHC(fit4.ht, cluster="group", type="HC0"))
# 
# model = c("ht","am,bms")
# 
# ###
# fit <- plm(democracy ~ lag(democracy)+lag(income)+factor(year)-1, data= DemocracyIncome, index = c("country", "year"), random.method ="ht", model = "random", inst.method = "am")
# 
# summary(fit)
# 
# coeftest(fit, vcovHC(fit, cluster="group", type="HC0"))