library(dplyr)
library(dslabs)
library(ISLR2)
library(matlib)
library(wooldridge)
data("twoyear")
4.
- For the CEV assumptions to hold, we must be able to write tvhours =
tvhours* + e0, where the measurement error e0 has zero mean and is
uncorrelated with tvhours* and each explanatory variable in the
equation. (Note that for OLS to consistently estimate the parameters we
do not need e0 to be uncorrelated with tvhours*.)
- The CEV assumptions are unlikely to hold in this example. For
children who do not watch TV at all, tvhours* = 0, and it is very likely
that reported TV hours is zero. So if tvhours* = 0 then e0 = 0 with high
probability. If tvhours* > 0, the measurement error can be positive
or negative, but, since tvhours ≥ 0, e0 must satisfy e0 ≥ −tvhours.
So e0 and tvhours are likely to be correlated. As mentioned in part
(i), because it is the dependent variable that is measured with error,
what is important is that e0 is uncorrelated with the explanatory
variables. But this is unlikely to be the case, because tvhours* depends
directly on the explanatory variables. Or, we might argue directly that
more highly educated parents tend to underreport how much television
their children watch, which means e0 and the education variables are
negatively correlated.
9.2
9.2 (i) We estimate the model from column (2) but with KWW in place
of IQ. The coefficient on educ becomes about .058 (se≈ .006), so this is
similar to the estimate obtained with IQ, although slightly larger and
more precisely estimated. (ii) When KWW and IQ are both used as proxies,
the coefficient on educ becomes about .049 (se ≈ .007). Compared with
the estimate when only KWW is used as a proxy, the return to education
has fallen by almost a full percentage point. (iii) The t statistic on
IQ is about 3.08 while that on KWW is about 2.07, so each is significant
at the 5% level against a two-sided alternative. They are jointly very
significant, with F2,925≈ 8.59 and p-value≈ .0002.
9.8
attach(twoyear)
mean(stotal)
## [1] 0.04748291
sd(stotal)
## [1] 0.8535441
summary(stotal)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.32480 -0.32734 0.00000 0.04748 0.61079 2.23537
- The mean of stotal is .047, its standard deviation is .854, the
minimum value is –3.32, and the maximum value is 2.24.
model1 <- lm(stotal ~ jc +univ)
summary(model1)
##
## Call:
## lm(formula = stotal ~ jc + univ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0298 -0.4457 0.1220 0.4522 2.4846
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.295005 0.013196 -22.356 < 2e-16 ***
## jc 0.074767 0.012170 6.143 8.53e-10 ***
## univ 0.164644 0.004091 40.246 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7667 on 6760 degrees of freedom
## Multiple R-squared: 0.1934, Adjusted R-squared: 0.1932
## F-statistic: 810.5 on 2 and 6760 DF, p-value: < 2.2e-16
- In the regression jc on stotal, the slope coefficient is .07 (se =
.013). Therefore, while the estimated relationship is positive, the t
statistic is only one: the correlation between jc and stotal is weak at
best. In the regression univ on stotal, the slope coefficient is 0.165
(se = .004), for a t statistic of 40.2. Therefore, univ and stotal are
positively correlated.
- When we add stotal to (4.17) and estimate the resulting equation by
OLS, we get log( nwage) = 1.495 + .0631 jc + .0686 univ + .00488 exper +
.0494 stotal (.021) (.0068) (.0026) (.00016) (.0068) n = 6,758, R^2 =
.228 For testing βjc = βuniv, we can use the same trick as in Section
4.4 to get the standard error of the difference: replace univ with
totcoll = jc + univ, and then the coefficient on jc is the difference in
the estimated returns, along with its standard error. Let θ1 = βjc −
βuniv. Then ˆθ1 =− = .0055 (se .0069) . Compared with what we found
without stotal, the evidence is even weaker against H1: βjc < βuniv.
The t statistic from equation (4.27) is about –1.48, while here we have
obtained only −.80.
- When stotal2 is added to the equation, its coefficient is .0019 (t
statistic = .40). Therefore, there is no reason to add the quadratic
term.
- The F statistic for testing exclusion of the interaction terms
stotal⋅jc and stotal⋅univ is about 1.96; with 2 and 6,756 df, this gives
p-value = .141. So, even at the 10% level, the interaction terms are
jointly insignificant. It is probably not worth complicating the basic
model estimated in part (iii).
- I would just use the model from part (iii), where stotal appears
only in level form. The other embellishments were not statistically
significant at small enough significance levels to warrant the
additional complications.