C1 Use the data in KIELMC, only for the year 1981, to answer the following questions. The data are for houses that sold during 1981 in North Andover, Massachusetts; 1981 was the year construction began on a local garbage incinerator.
library(wooldridge)
data<-wooldridge::kielmc
head(data,10)
## year age agesq nbh cbd intst lintst price rooms area land baths dist
## 1 1978 48 2304 4 3000 1000 6.9078 60000 7 1660 4578 1 10700
## 2 1978 83 6889 4 4000 1000 6.9078 40000 6 2612 8370 2 11000
## 3 1978 58 3364 4 4000 1000 6.9078 34000 6 1144 5000 1 11500
## 4 1978 11 121 4 4000 1000 6.9078 63900 5 1136 10000 1 11900
## 5 1978 48 2304 4 4000 2000 7.6009 44000 5 1868 10000 1 12100
## 6 1978 78 6084 4 3000 2000 7.6009 46000 6 1780 9500 3 10000
## 7 1978 22 484 4 4000 2000 7.6009 56000 6 1700 10878 2 11700
## 8 1978 78 6084 4 3000 2000 7.6009 38500 6 1556 3870 2 10200
## 9 1978 42 1764 4 3000 2000 7.6009 60500 8 1642 7000 2 10500
## 10 1978 41 1681 4 3000 2000 7.6009 55000 5 1443 7950 2 11000
## ldist wind lprice y81 larea lland y81ldist lintstsq nearinc
## 1 9.277999 3 11.00210 0 7.414573 8.429017 0 47.71770 1
## 2 9.305651 3 10.59663 0 7.867871 9.032409 0 47.71770 1
## 3 9.350102 3 10.43412 0 7.042286 8.517193 0 47.71770 1
## 4 9.384294 3 11.06507 0 7.035269 9.210340 0 47.71770 1
## 5 9.400961 3 10.69195 0 7.532624 9.210340 0 57.77368 1
## 6 9.210340 3 10.73640 0 7.484369 9.159047 0 57.77368 1
## 7 9.367344 3 10.93311 0 7.438384 9.294497 0 57.77368 1
## 8 9.230143 3 10.55841 0 7.349874 8.261010 0 57.77368 1
## 9 9.259131 3 11.01040 0 7.403670 8.853665 0 57.77368 1
## 10 9.305651 3 10.91509 0 7.274479 8.980927 0 57.77368 1
## y81nrinc rprice lrprice
## 1 0 60000 11.00210
## 2 0 40000 10.59663
## 3 0 34000 10.43412
## 4 0 63900 11.06507
## 5 0 44000 10.69195
## 6 0 46000 10.73640
## 7 0 56000 10.93311
## 8 0 38500 10.55841
## 9 0 60500 11.01040
## 10 0 55000 10.91509
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
y1981 <-data %>% filter(year==1981)
head(y1981,10)
## year age agesq nbh cbd intst lintst price rooms area land baths dist
## 1 1981 81 6561 4 4000 1000 6.9078 49000 6 1554 6790 1 11800
## 2 1981 71 5041 4 3000 2000 7.6009 52000 5 1575 3485 1 10100
## 3 1981 31 961 4 3000 2000 7.6009 68000 6 3304 18731 2 10200
## 4 1981 41 1681 4 3000 2000 7.6009 54000 6 1700 7500 1 11200
## 5 1981 31 961 4 4000 2000 7.6009 70000 6 1454 5500 2 11800
## 6 1981 81 6561 4 3000 2000 7.6009 47000 6 1410 4500 1 10200
## 7 1981 19 361 0 5000 4000 8.2940 73900 6 1592 13068 1 13200
## 8 1981 1 1 0 9000 9000 9.1050 117500 9 1910 15246 3 18200
## 9 1981 41 1681 0 10000 10000 9.2103 60000 5 735 10454 1 19000
## 10 1981 1 1 0 16000 15000 9.6158 85000 6 1154 43560 2 23000
## ldist wind lprice y81 larea lland y81ldist lintstsq nearinc
## 1 9.375855 4 10.79958 1 7.348588 8.823206 9.375855 47.71770 1
## 2 9.220291 3 10.85900 1 7.362010 8.156223 9.220291 57.77368 1
## 3 9.230143 3 11.12726 1 8.102889 9.837935 9.230143 57.77368 1
## 4 9.323669 3 10.89674 1 7.438384 8.922658 9.323669 57.77368 1
## 5 9.375855 3 11.15625 1 7.282073 8.612503 9.375855 57.77368 1
## 6 9.230143 3 10.75790 1 7.251345 8.411833 9.230143 57.77368 1
## 7 9.487972 3 11.21047 1 7.372746 9.477921 9.487972 68.79043 1
## 8 9.809177 5 11.67419 1 7.554859 9.632072 9.809177 82.90102 0
## 9 9.852194 5 11.00210 1 6.599871 9.254740 9.852194 84.82964 0
## 10 10.043250 5 11.35041 1 7.050990 10.681894 10.043250 92.46361 0
## y81nrinc rprice lrprice
## 1 1 37634.41 10.53567
## 2 1 39938.55 10.59510
## 3 1 52227.34 10.86336
## 4 1 41474.66 10.63284
## 5 1 53763.44 10.89235
## 6 1 36098.31 10.49400
## 7 1 56758.83 10.94657
## 8 0 90245.77 11.41029
## 9 0 46082.95 10.73820
## 10 0 65284.18 11.08650
attach(y1981)
mean("price")
## Warning in mean.default("price"): argument is not numeric or logical: returning
## NA
## [1] NA
mean("dist")
## Warning in mean.default("dist"): argument is not numeric or logical: returning
## NA
## [1] NA
sd("dist")
## Warning in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm =
## na.rm): NAs introduced by coercion
## [1] NA
model<-lm(log(price)~log(dist), data= y1981)
summary(model)
##
## Call:
## lm(formula = log(price) ~ log(dist), data = y1981)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.87318 -0.22657 -0.01985 0.25687 0.95045
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.04716 0.64624 12.452 < 2e-16 ***
## log(dist) 0.36488 0.06576 5.548 1.39e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3543 on 140 degrees of freedom
## Multiple R-squared: 0.1803, Adjusted R-squared: 0.1744
## F-statistic: 30.79 on 1 and 140 DF, p-value: 1.395e-07
ANSWER: 8.047+0.365log(dist)+u The result shows that when the distance increases by 1 point, the price increases by 0.365 point. (ii) To the simple regression model in part (i), add the variables log(intst), log(area), log(land), rooms, baths, and age, where intst is distance from the home to the interstate, area is square footage of the house, land is the lot size in square feet, rooms is total number of rooms, baths is number of bathrooms, and age is age of the house in years. Now, what do you conclude about the effects of the incinerator? Explain why (i) and (ii) give conflicting results.
model2 <- lm(log(price)~log(dist)+log(intst)+log(area)+log(land)+rooms+baths+age, data= y1981)
summary(model2)
##
## Call:
## lm(formula = log(price) ~ log(dist) + log(intst) + log(area) +
## log(land) + rooms + baths + age, data = y1981)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74072 -0.10669 0.00932 0.11817 0.61387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.592332 0.641711 11.831 < 2e-16 ***
## log(dist) 0.055389 0.057621 0.961 0.338153
## log(intst) -0.039032 0.051662 -0.756 0.451261
## log(area) 0.319294 0.076418 4.178 5.27e-05 ***
## log(land) 0.076824 0.039505 1.945 0.053908 .
## rooms 0.042528 0.028251 1.505 0.134588
## baths 0.166923 0.041944 3.980 0.000113 ***
## age -0.003567 0.001059 -3.369 0.000985 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.201 on 134 degrees of freedom
## Multiple R-squared: 0.7475, Adjusted R-squared: 0.7344
## F-statistic: 56.68 on 7 and 134 DF, p-value: < 2.2e-16
ANSWER: The coefficient for distance decreased because of the other factors added. When the variables lintst, larea, land, rooms, baths, and age are added to the regression, the coefficient on ldist becomes approximately 0.055%. The effect is much smaller now, and statistically insignificant. This is because we have explicitly controlled for several other factors that determine a home’s quality (such as size and number of baths) and its location (distance to state). This is consistent with the hypothesis that the bins were located near less desirable houses to begin with (iii) Add [log(intst)]^2 to the model from part (ii). Now what happens? What do you conclude about the importance of functional form?
model3 <- lm(log(price)~log(dist)+(log(intst))^2+log(area)+log(land)+rooms+baths+age, data= y1981)
summary(model3)
##
## Call:
## lm(formula = log(price) ~ log(dist) + (log(intst))^2 + log(area) +
## log(land) + rooms + baths + age, data = y1981)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74072 -0.10669 0.00932 0.11817 0.61387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.592332 0.641711 11.831 < 2e-16 ***
## log(dist) 0.055389 0.057621 0.961 0.338153
## log(intst) -0.039032 0.051662 -0.756 0.451261
## log(area) 0.319294 0.076418 4.178 5.27e-05 ***
## log(land) 0.076824 0.039505 1.945 0.053908 .
## rooms 0.042528 0.028251 1.505 0.134588
## baths 0.166923 0.041944 3.980 0.000113 ***
## age -0.003567 0.001059 -3.369 0.000985 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.201 on 134 degrees of freedom
## Multiple R-squared: 0.7475, Adjusted R-squared: 0.7344
## F-statistic: 56.68 on 7 and 134 DF, p-value: < 2.2e-16
ANSWER: The coefficient on ldist is now highly statistically significant, with a T-statistic of around three. The coefficients on log(inst) and [log(inst)]2 are both highly statistically significant, each with a T-statistic above four in absolute value. Just adding [log(inst)]2 has a very large effect on the coefficients important for policy purposes. This means that distance from the incinerator and distance from the interstate are correlated in some nonlinear way that also affects house prices. We can find a value of log(inst) where the effect on log(price) actually becomes negative: 2.073/[2(.1193)] = 8.69. When we calculate this, we get about 5,943 feet from the state. Therefore, it is best to have your home away from the highway for a distance of less than one mile. After that, moving further from the interstate will lower predicted home prices. (iv) Is the square of log(dist) significant when you add it to the model from part (iii)?
model4 <- lm(log(price)~log(dist)^2+log(intst)^2+log(area)+log(land)+rooms+baths+age, data= y1981)
summary(model4)
##
## Call:
## lm(formula = log(price) ~ log(dist)^2 + log(intst)^2 + log(area) +
## log(land) + rooms + baths + age, data = y1981)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74072 -0.10669 0.00932 0.11817 0.61387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.592332 0.641711 11.831 < 2e-16 ***
## log(dist) 0.055389 0.057621 0.961 0.338153
## log(intst) -0.039032 0.051662 -0.756 0.451261
## log(area) 0.319294 0.076418 4.178 5.27e-05 ***
## log(land) 0.076824 0.039505 1.945 0.053908 .
## rooms 0.042528 0.028251 1.505 0.134588
## baths 0.166923 0.041944 3.980 0.000113 ***
## age -0.003567 0.001059 -3.369 0.000985 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.201 on 134 degrees of freedom
## Multiple R-squared: 0.7475, Adjusted R-squared: 0.7344
## F-statistic: 56.68 on 7 and 134 DF, p-value: < 2.2e-16
ANSWER:From the data above obtained from 526 obs Coef. educ 0.090% Coef. expert 0.041% Coef. expersq -0.0007% With 2 = 30.03% This means that the variables educ exper and expersq are able to explain the lwage variable by 30.03%, the remaining 69.97% is explained by other variables outside the model. The data is not good because the value of 2 is less than 50% C2) Use the data in WAGE1 for this exercise. (i) Use OLS to estimate the equation log(wage) = B0 + B1educ + B2exper + B3exper^2 + u and report the results using the usual format.
data2 <- wooldridge::wage1
model5 <- glm(log(wage) ~ educ + exper + expersq, data = data2)
summary(model5)
##
## Call:
## glm(formula = log(wage) ~ educ + exper + expersq, data = data2)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1279975 0.1059323 1.208 0.227
## educ 0.0903658 0.0074680 12.100 < 2e-16 ***
## exper 0.0410089 0.0051965 7.892 1.77e-14 ***
## expersq -0.0007136 0.0001158 -6.164 1.42e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1988321)
##
## Null deviance: 148.33 on 525 degrees of freedom
## Residual deviance: 103.79 on 522 degrees of freedom
## AIC: 649.06
##
## Number of Fisher Scoring iterations: 2
ANSWER: From the data above obtained from 525 obs Coef. educ 0.090% Coef. expert 0.041% Coef. expersq -0.0007% With ^2 = 30.03% This means that the variables educ exper and exper^2 are able to explain the lwage variable by 30.03%, the remaining 69.97% is explained by other variables outside the model. The data is not good because the value of ^2 is less than 50% (ii) Is exper^2 statistically significant at the 1% level? The square of exper is not significant. Because t value is negative. (iii) Using the approximation %Delta wage = 100(b2+2b3exper)Delta exper, find the approximate return to the fifth year of experience. What is the approximate return to the twentieth year of experience?
modell <- glm(log(wage) ~ educ + exper, data = data2)
summary(modell)
##
## Call:
## glm(formula = log(wage) ~ educ + exper, data = data2)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.216854 0.108595 1.997 0.0464 *
## educ 0.097936 0.007622 12.848 < 2e-16 ***
## exper 0.010347 0.001555 6.653 7.24e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.2128962)
##
## Null deviance: 148.33 on 525 degrees of freedom
## Residual deviance: 111.34 on 523 degrees of freedom
## AIC: 684.02
##
## Number of Fisher Scoring iterations: 2