C1 Use the data in KIELMC, only for the year 1981, to answer the following questions. The data are for houses that sold during 1981 in North Andover, Massachusetts; 1981 was the year construction began on a local garbage incinerator.

library(wooldridge)
data<-wooldridge::kielmc
head(data,10)
##    year age agesq nbh  cbd intst lintst price rooms area  land baths  dist
## 1  1978  48  2304   4 3000  1000 6.9078 60000     7 1660  4578     1 10700
## 2  1978  83  6889   4 4000  1000 6.9078 40000     6 2612  8370     2 11000
## 3  1978  58  3364   4 4000  1000 6.9078 34000     6 1144  5000     1 11500
## 4  1978  11   121   4 4000  1000 6.9078 63900     5 1136 10000     1 11900
## 5  1978  48  2304   4 4000  2000 7.6009 44000     5 1868 10000     1 12100
## 6  1978  78  6084   4 3000  2000 7.6009 46000     6 1780  9500     3 10000
## 7  1978  22   484   4 4000  2000 7.6009 56000     6 1700 10878     2 11700
## 8  1978  78  6084   4 3000  2000 7.6009 38500     6 1556  3870     2 10200
## 9  1978  42  1764   4 3000  2000 7.6009 60500     8 1642  7000     2 10500
## 10 1978  41  1681   4 3000  2000 7.6009 55000     5 1443  7950     2 11000
##       ldist wind   lprice y81    larea    lland y81ldist lintstsq nearinc
## 1  9.277999    3 11.00210   0 7.414573 8.429017        0 47.71770       1
## 2  9.305651    3 10.59663   0 7.867871 9.032409        0 47.71770       1
## 3  9.350102    3 10.43412   0 7.042286 8.517193        0 47.71770       1
## 4  9.384294    3 11.06507   0 7.035269 9.210340        0 47.71770       1
## 5  9.400961    3 10.69195   0 7.532624 9.210340        0 57.77368       1
## 6  9.210340    3 10.73640   0 7.484369 9.159047        0 57.77368       1
## 7  9.367344    3 10.93311   0 7.438384 9.294497        0 57.77368       1
## 8  9.230143    3 10.55841   0 7.349874 8.261010        0 57.77368       1
## 9  9.259131    3 11.01040   0 7.403670 8.853665        0 57.77368       1
## 10 9.305651    3 10.91509   0 7.274479 8.980927        0 57.77368       1
##    y81nrinc rprice  lrprice
## 1         0  60000 11.00210
## 2         0  40000 10.59663
## 3         0  34000 10.43412
## 4         0  63900 11.06507
## 5         0  44000 10.69195
## 6         0  46000 10.73640
## 7         0  56000 10.93311
## 8         0  38500 10.55841
## 9         0  60500 11.01040
## 10        0  55000 10.91509
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
y1981 <-data %>% filter(year==1981) 
head(y1981,10)
##    year age agesq nbh   cbd intst lintst  price rooms area  land baths  dist
## 1  1981  81  6561   4  4000  1000 6.9078  49000     6 1554  6790     1 11800
## 2  1981  71  5041   4  3000  2000 7.6009  52000     5 1575  3485     1 10100
## 3  1981  31   961   4  3000  2000 7.6009  68000     6 3304 18731     2 10200
## 4  1981  41  1681   4  3000  2000 7.6009  54000     6 1700  7500     1 11200
## 5  1981  31   961   4  4000  2000 7.6009  70000     6 1454  5500     2 11800
## 6  1981  81  6561   4  3000  2000 7.6009  47000     6 1410  4500     1 10200
## 7  1981  19   361   0  5000  4000 8.2940  73900     6 1592 13068     1 13200
## 8  1981   1     1   0  9000  9000 9.1050 117500     9 1910 15246     3 18200
## 9  1981  41  1681   0 10000 10000 9.2103  60000     5  735 10454     1 19000
## 10 1981   1     1   0 16000 15000 9.6158  85000     6 1154 43560     2 23000
##        ldist wind   lprice y81    larea     lland  y81ldist lintstsq nearinc
## 1   9.375855    4 10.79958   1 7.348588  8.823206  9.375855 47.71770       1
## 2   9.220291    3 10.85900   1 7.362010  8.156223  9.220291 57.77368       1
## 3   9.230143    3 11.12726   1 8.102889  9.837935  9.230143 57.77368       1
## 4   9.323669    3 10.89674   1 7.438384  8.922658  9.323669 57.77368       1
## 5   9.375855    3 11.15625   1 7.282073  8.612503  9.375855 57.77368       1
## 6   9.230143    3 10.75790   1 7.251345  8.411833  9.230143 57.77368       1
## 7   9.487972    3 11.21047   1 7.372746  9.477921  9.487972 68.79043       1
## 8   9.809177    5 11.67419   1 7.554859  9.632072  9.809177 82.90102       0
## 9   9.852194    5 11.00210   1 6.599871  9.254740  9.852194 84.82964       0
## 10 10.043250    5 11.35041   1 7.050990 10.681894 10.043250 92.46361       0
##    y81nrinc   rprice  lrprice
## 1         1 37634.41 10.53567
## 2         1 39938.55 10.59510
## 3         1 52227.34 10.86336
## 4         1 41474.66 10.63284
## 5         1 53763.44 10.89235
## 6         1 36098.31 10.49400
## 7         1 56758.83 10.94657
## 8         0 90245.77 11.41029
## 9         0 46082.95 10.73820
## 10        0 65284.18 11.08650
  1. To study the effects of the incinerator location on housing price, consider the simple regression model log(price)=B0+B1*log(dist)+u, where price is housing price in dollars and dist is distance from the house to the incinerator measured in feet. Interpreting this equation causally, what sign do you expect for B1 if the presence of the incinerator depresses housing prices? ANSWER: I expect the sign for B1 is “+” because there is a positive relation between log(dist) and log(price), meaning the farther you are from a garbage incinerator the more your house will sell for. Estimate this equation and interpret the results.
attach(y1981)
mean("price")
## Warning in mean.default("price"): argument is not numeric or logical: returning
## NA
## [1] NA
mean("dist")
## Warning in mean.default("dist"): argument is not numeric or logical: returning
## NA
## [1] NA
sd("dist")
## Warning in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm =
## na.rm): NAs introduced by coercion
## [1] NA
model<-lm(log(price)~log(dist), data= y1981)
summary(model)
## 
## Call:
## lm(formula = log(price) ~ log(dist), data = y1981)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.87318 -0.22657 -0.01985  0.25687  0.95045 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.04716    0.64624  12.452  < 2e-16 ***
## log(dist)    0.36488    0.06576   5.548 1.39e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3543 on 140 degrees of freedom
## Multiple R-squared:  0.1803, Adjusted R-squared:  0.1744 
## F-statistic: 30.79 on 1 and 140 DF,  p-value: 1.395e-07

ANSWER: 8.047+0.365log(dist)+u The result shows that when the distance increases by 1 point, the price increases by 0.365 point. (ii) To the simple regression model in part (i), add the variables log(intst), log(area), log(land), rooms, baths, and age, where intst is distance from the home to the interstate, area is square footage of the house, land is the lot size in square feet, rooms is total number of rooms, baths is number of bathrooms, and age is age of the house in years. Now, what do you conclude about the effects of the incinerator? Explain why (i) and (ii) give conflicting results.

model2 <- lm(log(price)~log(dist)+log(intst)+log(area)+log(land)+rooms+baths+age, data= y1981)
summary(model2)
## 
## Call:
## lm(formula = log(price) ~ log(dist) + log(intst) + log(area) + 
##     log(land) + rooms + baths + age, data = y1981)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.74072 -0.10669  0.00932  0.11817  0.61387 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.592332   0.641711  11.831  < 2e-16 ***
## log(dist)    0.055389   0.057621   0.961 0.338153    
## log(intst)  -0.039032   0.051662  -0.756 0.451261    
## log(area)    0.319294   0.076418   4.178 5.27e-05 ***
## log(land)    0.076824   0.039505   1.945 0.053908 .  
## rooms        0.042528   0.028251   1.505 0.134588    
## baths        0.166923   0.041944   3.980 0.000113 ***
## age         -0.003567   0.001059  -3.369 0.000985 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.201 on 134 degrees of freedom
## Multiple R-squared:  0.7475, Adjusted R-squared:  0.7344 
## F-statistic: 56.68 on 7 and 134 DF,  p-value: < 2.2e-16

ANSWER: The coefficient for distance decreased because of the other factors added. When the variables lintst, larea, land, rooms, baths, and age are added to the regression, the coefficient on ldist becomes approximately 0.055%. The effect is much smaller now, and statistically insignificant. This is because we have explicitly controlled for several other factors that determine a home’s quality (such as size and number of baths) and its location (distance to state). This is consistent with the hypothesis that the bins were located near less desirable houses to begin with (iii) Add [log(intst)]^2 to the model from part (ii). Now what happens? What do you conclude about the importance of functional form?

model3 <- lm(log(price)~log(dist)+(log(intst))^2+log(area)+log(land)+rooms+baths+age, data= y1981)
summary(model3)
## 
## Call:
## lm(formula = log(price) ~ log(dist) + (log(intst))^2 + log(area) + 
##     log(land) + rooms + baths + age, data = y1981)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.74072 -0.10669  0.00932  0.11817  0.61387 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.592332   0.641711  11.831  < 2e-16 ***
## log(dist)    0.055389   0.057621   0.961 0.338153    
## log(intst)  -0.039032   0.051662  -0.756 0.451261    
## log(area)    0.319294   0.076418   4.178 5.27e-05 ***
## log(land)    0.076824   0.039505   1.945 0.053908 .  
## rooms        0.042528   0.028251   1.505 0.134588    
## baths        0.166923   0.041944   3.980 0.000113 ***
## age         -0.003567   0.001059  -3.369 0.000985 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.201 on 134 degrees of freedom
## Multiple R-squared:  0.7475, Adjusted R-squared:  0.7344 
## F-statistic: 56.68 on 7 and 134 DF,  p-value: < 2.2e-16

ANSWER: The coefficient on ldist is now highly statistically significant, with a T-statistic of around three. The coefficients on log(inst) and [log(inst)]2 are both highly statistically significant, each with a T-statistic above four in absolute value. Just adding [log(inst)]2 has a very large effect on the coefficients important for policy purposes. This means that distance from the incinerator and distance from the interstate are correlated in some nonlinear way that also affects house prices. We can find a value of log(inst) where the effect on log(price) actually becomes negative: 2.073/[2(.1193)] = 8.69. When we calculate this, we get about 5,943 feet from the state. Therefore, it is best to have your home away from the highway for a distance of less than one mile. After that, moving further from the interstate will lower predicted home prices. (iv) Is the square of log(dist) significant when you add it to the model from part (iii)?

model4 <- lm(log(price)~log(dist)^2+log(intst)^2+log(area)+log(land)+rooms+baths+age, data= y1981)
summary(model4)
## 
## Call:
## lm(formula = log(price) ~ log(dist)^2 + log(intst)^2 + log(area) + 
##     log(land) + rooms + baths + age, data = y1981)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.74072 -0.10669  0.00932  0.11817  0.61387 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.592332   0.641711  11.831  < 2e-16 ***
## log(dist)    0.055389   0.057621   0.961 0.338153    
## log(intst)  -0.039032   0.051662  -0.756 0.451261    
## log(area)    0.319294   0.076418   4.178 5.27e-05 ***
## log(land)    0.076824   0.039505   1.945 0.053908 .  
## rooms        0.042528   0.028251   1.505 0.134588    
## baths        0.166923   0.041944   3.980 0.000113 ***
## age         -0.003567   0.001059  -3.369 0.000985 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.201 on 134 degrees of freedom
## Multiple R-squared:  0.7475, Adjusted R-squared:  0.7344 
## F-statistic: 56.68 on 7 and 134 DF,  p-value: < 2.2e-16

ANSWER:From the data above obtained from 526 obs Coef. educ 0.090% Coef. expert 0.041% Coef. expersq -0.0007% With 2 = 30.03% This means that the variables educ exper and expersq are able to explain the lwage variable by 30.03%, the remaining 69.97% is explained by other variables outside the model. The data is not good because the value of 2 is less than 50% C2) Use the data in WAGE1 for this exercise. (i) Use OLS to estimate the equation log(wage) = B0 + B1educ + B2exper + B3exper^2 + u and report the results using the usual format.

data2 <- wooldridge::wage1
model5 <- glm(log(wage) ~ educ + exper + expersq, data = data2)
summary(model5)
## 
## Call:
## glm(formula = log(wage) ~ educ + exper + expersq, data = data2)
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.1279975  0.1059323   1.208    0.227    
## educ         0.0903658  0.0074680  12.100  < 2e-16 ***
## exper        0.0410089  0.0051965   7.892 1.77e-14 ***
## expersq     -0.0007136  0.0001158  -6.164 1.42e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1988321)
## 
##     Null deviance: 148.33  on 525  degrees of freedom
## Residual deviance: 103.79  on 522  degrees of freedom
## AIC: 649.06
## 
## Number of Fisher Scoring iterations: 2

ANSWER: From the data above obtained from 525 obs Coef. educ 0.090% Coef. expert 0.041% Coef. expersq -0.0007% With ^2 = 30.03% This means that the variables educ exper and exper^2 are able to explain the lwage variable by 30.03%, the remaining 69.97% is explained by other variables outside the model. The data is not good because the value of ^2 is less than 50% (ii) Is exper^2 statistically significant at the 1% level? The square of exper is not significant. Because t value is negative. (iii) Using the approximation %Delta wage = 100(b2+2b3exper)Delta exper, find the approximate return to the fifth year of experience. What is the approximate return to the twentieth year of experience?

modell <- glm(log(wage) ~ educ + exper, data = data2)
summary(modell)
## 
## Call:
## glm(formula = log(wage) ~ educ + exper, data = data2)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.216854   0.108595   1.997   0.0464 *  
## educ        0.097936   0.007622  12.848  < 2e-16 ***
## exper       0.010347   0.001555   6.653 7.24e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.2128962)
## 
##     Null deviance: 148.33  on 525  degrees of freedom
## Residual deviance: 111.34  on 523  degrees of freedom
## AIC: 684.02
## 
## Number of Fisher Scoring iterations: 2
  1. At what value of exper does additional experience actually lower predicted log(wage)? How many people have more experience in this sample? The completion point is approximately 0.041/[2(0.000714)] ≈ 28.7 years of experience. In the sample there were 121 people with a minimum of 29 years of experience. This is a fairly large sample fraction.