# Libraries
library(readxl)
library(plm)
Question one is asking about a law banning open alcohol containers in vehicle passenger compartments. Florida passed the law in 1990 Georgia did not. Find out whether the law has an effect on how many people got arrested for driving under the influence of drugs or alcohol.
\[y_{it} = \beta_0 + \delta_0D_p + \delta_1D_t + \delta_2D_p*D_t + \beta_1x_1 + ... + \beta_kx_k + u \] The definitions are:
We may want to control for other factors. There is a high probability there are other factors to control for because this model seems to simple. Other factors we might control for might be the time and day of the arrest. On a weekend during spring break there might be a high drunk driving arrest in Florida than in Georgia. Another factor might be population.
The econometric method I would use wold be panel data. The reason is there is going to be little \(t\) which are 1985 and 1990. There is going to be many \(i\)’s with county level of Florida and Georgia and instead of sampling by state we would have the rate of DUI by county.
This question deals with whether an incinerator reduces the value of homes closer to it site.
\[log(price) = \beta_0 + \delta_0y81 + \beta_1log(dist) + \delta_1y81*log(dist) + u\]
If incinerators reduces the value of homes closer to the site, the sign of \(\delta_1\) is positive because the longer the distance from the incinerator the higher the price of the house. \(\beta_1 > 0\) means the effect of the distance on the price. For example, the greater the variable distance is, the greater the price. Distance and price have a positive correlation.
kielmc = read_xls("kielmc.xls")
price = lm(kielmc$lprice ~ # log(price)
kielmc$y81 + # 1 if year == 1981
kielmc$ldist + # log(dist)
I(kielmc$y81*kielmc$ldist) # Year of the house times distance.
)
summary(price)
##
## Call:
## lm(formula = kielmc$lprice ~ kielmc$y81 + kielmc$ldist + I(kielmc$y81 *
## kielmc$ldist))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.01930 -0.20807 0.00539 0.21128 1.49747
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.05848 0.50844 15.850 < 2e-16 ***
## kielmc$y81 -0.01133 0.80506 -0.014 0.989
## kielmc$ldist 0.31669 0.05153 6.145 2.39e-09 ***
## I(kielmc$y81 * kielmc$ldist) 0.04819 0.08179 0.589 0.556
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3422 on 317 degrees of freedom
## Multiple R-squared: 0.3958, Adjusted R-squared: 0.3901
## F-statistic: 69.22 on 3 and 317 DF, p-value: < 2.2e-16
The coefficient \(\delta_181*log(dist)\) means: if the house was built in 1981 or higher and is farthest from the incinerator then it has a high positive effect on price and is not statistically significant. Both variables are in log format so is the the percent change.
price2 = lm(kielmc$lprice ~ # log(price)
kielmc$y81 +
kielmc$ldist +
kielmc$y81*kielmc$ldist +
kielmc$age + # age of house
kielmc$agesq + # age^2
kielmc$rooms + # # rooms in house
kielmc$baths + # # bathrooms
kielmc$lintst + # log(dist)
kielmc$lland + # log(land)
kielmc$larea # log(area)
)
summary(price2)
##
## Call:
## lm(formula = kielmc$lprice ~ kielmc$y81 + kielmc$ldist + kielmc$y81 *
## kielmc$ldist + kielmc$age + kielmc$agesq + kielmc$rooms +
## kielmc$baths + kielmc$lintst + kielmc$lland + kielmc$larea)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.19636 -0.09957 0.01064 0.11414 0.78094
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.674e+00 5.016e-01 15.300 < 2e-16 ***
## kielmc$y81 -2.255e-01 4.947e-01 -0.456 0.648882
## kielmc$ldist 9.216e-04 4.462e-02 0.021 0.983534
## kielmc$age -8.007e-03 1.417e-03 -5.650 3.64e-08 ***
## kielmc$agesq 3.570e-05 8.708e-06 4.099 5.29e-05 ***
## kielmc$rooms 4.614e-02 1.734e-02 2.660 0.008216 **
## kielmc$baths 1.010e-01 2.782e-02 3.632 0.000329 ***
## kielmc$lintst -5.998e-02 3.172e-02 -1.891 0.059600 .
## kielmc$lland 9.534e-02 2.473e-02 3.856 0.000140 ***
## kielmc$larea 3.507e-01 5.195e-02 6.752 7.20e-11 ***
## kielmc$y81:kielmc$ldist 6.247e-02 5.028e-02 1.242 0.215015
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2055 on 310 degrees of freedom
## Multiple R-squared: 0.787, Adjusted R-squared: 0.7802
## F-statistic: 114.6 on 10 and 310 DF, p-value: < 2.2e-16
I conclude that incinerators has a less effect on housing prices. Other factors that have a higher effect on housing factor are the area of the house and land.
log(dist) is positive and statistically significant in part b and not c because in part b we were not controlling for other factors that influence the model. In part c we control for other factors that influence the model and we found out there are other factors that influence the dependent variable more than distance. For example age of the house.
This question deals with whether more generous worker compensation cause people to stay out of work longer, holding everything else constant.
\[log(durat) = \beta_0 + \beta_1afchnge + \beta_2highearn + \beta_3afchnge * highearn + u\]
The variable definitions are:
injury = read_xls("injury.xls")
dura <- lm(injury$ldurat ~ # duration of benefits
injury$afchnge + # =1 if after change in benefits
injury$highearn + #=1 if high earner
I(injury$afchnge*injury$highearn)
)
summary(dura)
##
## Call:
## lm(formula = injury$ldurat ~ injury$afchnge + injury$highearn +
## I(injury$afchnge * injury$highearn))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0128 -0.7214 -0.0171 0.7714 4.0047
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.19934 0.02711 44.241 < 2e-16 ***
## injury$afchnge 0.02364 0.03970 0.595 0.55164
## injury$highearn 0.21520 0.04336 4.963 7.11e-07 ***
## I(injury$afchnge * injury$highearn) 0.18835 0.06279 2.999 0.00271 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.298 on 7146 degrees of freedom
## Multiple R-squared: 0.01584, Adjusted R-squared: 0.01543
## F-statistic: 38.34 on 3 and 7146 DF, p-value: < 2.2e-16
The coefficient that represent the policy effect is \(\beta_3\) and it is significant. After the policy effect; if you are a high earner, you are going to be \(\beta_3\) on workers comp.
\[log(durat) = \beta_0 + \beta_2highearn + \beta_3afchnge * highearn + u\]
dura2 <- lm(injury$ldurat ~ # Duration on workers compensation.
injury$highearn + # dummy variable for high earners.
I(injury$afchnge*injury$highearn)
)
summary(dura2)
##
## Call:
## lm(formula = injury$ldurat ~ injury$highearn + I(injury$afchnge *
## injury$highearn))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0128 -0.7214 -0.0171 0.7714 3.9937
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.21036 0.01980 61.116 < 2e-16 ***
## injury$highearn 0.20418 0.03921 5.207 1.97e-07 ***
## I(injury$afchnge * injury$highearn) 0.21198 0.04865 4.357 1.33e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.298 on 7147 degrees of freedom
## Multiple R-squared: 0.01579, Adjusted R-squared: 0.01552
## F-statistic: 57.34 on 2 and 7147 DF, p-value: < 2.2e-16
\(\beta_3\) in part b is different from part a but not so very different. \(\beta_3\) is more statistically significant. The reason because of omitted variable bias.
\[log(durat) = \beta_0 + \beta_1afchnge + \beta_3afchnge * highearn + u\]
dura3 <- lm(injury$ldurat ~ # Duration on workers compesention.
injury$afchnge +
I(injury$afchnge*injury$highearn)
)
summary(dura3)
##
## Call:
## lm(formula = injury$ldurat ~ injury$afchnge + I(injury$afchnge *
## injury$highearn))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0128 -0.5903 0.1028 0.7960 3.9810
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.28345 0.02119 60.561 <2e-16 ***
## injury$afchnge -0.06048 0.03596 -1.682 0.0927 .
## I(injury$afchnge * injury$highearn) 0.40355 0.04549 8.870 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.301 on 7147 degrees of freedom
## Multiple R-squared: 0.01245, Adjusted R-squared: 0.01217
## F-statistic: 45.05 on 2 and 7147 DF, p-value: < 2.2e-16
\(\beta_3\) in part c is different from part a but not so very different. \(\beta_3\) is more statistically significant. The reason because of omitted variable bias.
The question deals whether a stronger presence of students affects rental rates.
\[log(rent_{it}) = \beta_0 + \delta_0y90_t + \beta_1log(pop_{it}) + \beta_2log(avginc_{it}) + \beta_3pctstu_{it} + a_i + u_{it}\]
pol.p.rental <- plm(formula = p.rental$lrent ~ # log average rent
p.rental$y90 + # =1 if year == 90
p.rental$lpop + # city population
p.rental$lavginc + # average income
p.rental$pctstu, # student population as a percentage of city population
data = p.rental, model = "pooling" # Calls out for pooling model
)
summary(pol.p.rental)
## Pooling Model
##
## Call:
## plm(formula = p.rental$lrent ~ p.rental$y90 + p.rental$lpop +
## p.rental$lavginc + p.rental$pctstu, data = p.rental, model = "pooling")
##
## Balanced Panel: n = 64, T = 2, N = 128
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.242331 -0.078237 -0.016417 0.043890 0.480819
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -0.5688071 0.5348815 -1.0634 0.2897
## p.rental$y90 0.2622267 0.0347633 7.5432 8.782e-12 ***
## p.rental$lpop 0.0406864 0.0225154 1.8070 0.0732 .
## p.rental$lavginc 0.5714460 0.0530981 10.7621 < 2.2e-16 ***
## p.rental$pctstu 0.0050436 0.0010192 4.9486 2.401e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 14.058
## Residual Sum of Squares: 1.9501
## R-Squared: 0.86128
## Adj. R-Squared: 0.85677
## F-statistic: 190.922 on 4 and 123 DF, p-value: < 2.22e-16
The 1990 dummy variable estimate is statistically significant at 1% level. \(\beta_3\) is 0.5714460 and is statistically significant at 1% level.
fd.p.rental <- plm(formula = p.rental$lrent ~ # log average rent
p.rental$y90 + # =1 if year == 90
p.rental$lpop + # log city population
p.rental$lavginc + # log per capital income
p.rental$pctstu, # percent of population students
data = p.rental, # Where to get the data from
model = "fd" # Request the model - first difference
)
summary(fd.p.rental)
## Oneway (individual) effect First-Difference Model
##
## Call:
## plm(formula = p.rental$lrent ~ p.rental$y90 + p.rental$lpop +
## p.rental$lavginc + p.rental$pctstu, data = p.rental, model = "fd")
##
## Balanced Panel: n = 64, T = 2, N = 128
## Observations used in estimation: 64
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.186972 -0.062161 -0.014384 0.055182 0.237830
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 0.3855215 0.0368245 10.4692 3.661e-15 ***
## p.rental$lpop 0.0722453 0.0883435 0.8178 0.416720
## p.rental$lavginc 0.3099604 0.0664771 4.6627 1.788e-05 ***
## p.rental$pctstu 0.0112033 0.0041319 2.7114 0.008727 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 0.7191
## Residual Sum of Squares: 0.48736
## R-Squared: 0.32226
## Adj. R-Squared: 0.28837
## F-statistic: 9.50991 on 3 and 60 DF, p-value: 3.1362e-05
According to the results, the first difference doubles the student effect and reduces the statistically significant.
fe.p.rental <- plm(formula = rental$lrent ~ # log average rent
p.rental$y90 + # =1 if year == 90
p.rental$lpop + # log city population
p.rental$lavginc + # log per capital income
p.rental$pctstu, # percent of population students
data = p.rental, # Where to get the data from
model = "within", # calls out fix effect model
index =c("city", "year") # Index the data
)
summary(fe.p.rental)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = rental$lrent ~ p.rental$y90 + p.rental$lpop + p.rental$lavginc +
## p.rental$pctstu, data = p.rental, model = "within", index = c("city",
## "year"))
##
## Balanced Panel: n = 64, T = 2, N = 128
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.1892e-01 -2.9559e-02 -2.7582e-16 2.9559e-02 1.1892e-01
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## p.rental$y90 0.3855215 0.0368245 10.4692 3.661e-15 ***
## p.rental$lpop 0.0722453 0.0883435 0.8178 0.416720
## p.rental$lavginc 0.3099604 0.0664771 4.6627 1.788e-05 ***
## p.rental$pctstu 0.0112033 0.0041319 2.7114 0.008727 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 10.383
## Residual Sum of Squares: 0.24368
## R-Squared: 0.97653
## Adj. R-Squared: 0.95032
## F-statistic: 624.146 on 4 and 60 DF, p-value: < 2.22e-16
The model in fixed effect and first difference gives out the same results. Confirm.