The following equation models weekly hours of television viewing by a child:
\[ thours^* = \beta_0 + \beta_{\text{1age}} + \beta_{\text{2age}^2} + \beta_{\text{3motheduc}} + \beta_{\text{4fatheduc}} + \beta_{\text{5sibs}} + u \]
where \(thours^*\) represents the true hours of television viewing, and \(u\) is an error term. The concern is that \(thours^*\) is measured with error in the survey, and observe the reported hours \(thours\).
The classical errors-in-variables model assumes that there is no measurement error in the dependent variable (\( thours^* \)). The observed \( thours \) is considered the true value without measurement error.
The relationship between the true values of the variables is assumed to be linear.
The measurement error (\( u \)) is not correlated with the true values of the independent variables.
The variance of the measurement error is constant across all levels of the independent variables.
There is no measurement error in the explanatory variables (age, motheduc, fatheduc, ssibs).
This assumption may be challenging to ensure. The accuracy of reported hours (\( thours \)) depends on respondents' honesty and recall accuracy.
The assumption of a linear relationship is reasonable and often assumed in regression models.
It depends on the circumstances of the survey. If factors influencing both reported and true hours are correlated, the assumption may be violated.
The homoscedasticity assumption may depend on the nature of reporting errors. If reporting errors vary systematically, homoscedasticity may not hold.
This assumption may be more reasonable for some variables (e.g., age) than others.
Using KWW as a proxy, the expected return to education is represented by the coefficient for the “educ” variable in this model.
##
## Call:
## lm(formula = wage ~ educ + KWW, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -906.43 -245.00 -34.28 197.40 2298.61
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -71.091 81.574 -0.871 0.384
## educ 43.460 6.019 7.221 1.07e-12 ***
## KWW 12.413 1.731 7.172 1.51e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 372.4 on 932 degrees of freedom
## Multiple R-squared: 0.1537, Adjusted R-squared: 0.1519
## F-statistic: 84.63 on 2 and 932 DF, p-value: < 2.2e-16
When KWW and a polynomial term are used as proxy variables, the estimated return to education is represented by the “educ” coefficient in this model.
##
## Call:
## lm(formula = wage ~ educ + KWW + I(KWW^2), data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -953.39 -239.83 -30.71 201.54 2253.05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 648.6825 207.7162 3.123 0.001846 **
## educ 42.0442 5.9887 7.021 4.26e-12 ***
## KWW -30.3570 11.4948 -2.641 0.008406 **
## I(KWW^2) 0.6198 0.1647 3.763 0.000178 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 369.8 on 931 degrees of freedom
## Multiple R-squared: 0.1664, Adjusted R-squared: 0.1637
## F-statistic: 61.94 on 3 and 931 DF, p-value: < 2.2e-16
## Estimate Pr(>|t|)
## educ 42.0442036 4.255736e-12
## KWW -30.3569558 8.406367e-03
## I(KWW^2) 0.6198462 1.783083e-04
## Loading required package: carData
## Sample Mean of stotal: 0.04748291
## Standard Deviation of stotal: 0.8535441
##
## Call:
## lm(formula = jc ~ stotal, data = twoyear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3633 -0.3424 -0.3384 -0.3113 3.5196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.338364 0.009403 35.983 <2e-16 ***
## stotal 0.011177 0.011001 1.016 0.31
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7721 on 6761 degrees of freedom
## Multiple R-squared: 0.0001527, Adjusted R-squared: 4.767e-06
## F-statistic: 1.032 on 1 and 6761 DF, p-value: 0.3097
##
## Call:
## lm(formula = univ ~ stotal, data = twoyear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4319 -1.8707 -0.4968 1.6909 7.8927
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.87073 0.02520 74.25 <2e-16 ***
## stotal 1.16968 0.02948 39.68 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.069 on 6761 degrees of freedom
## Multiple R-squared: 0.1889, Adjusted R-squared: 0.1888
## F-statistic: 1575 on 1 and 6761 DF, p-value: < 2.2e-16
## Linear hypothesis test
##
## Hypothesis:
## jc - univ = 0
##
## Model 1: restricted model
## Model 2: lwage ~ stotal + jc + univ
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 6760 1417.9
## 2 6759 1417.8 1 0.16883 0.8049 0.3697
## female phsrank BA AA black hispanic id exper jc univ lwage
## 1 1 65 0 0 0 0 19 161 0.0000000 0.000000 1.925291
## 2 1 97 0 0 0 0 93 119 0.0000000 7.033333 2.796494
## 3 1 44 0 0 0 0 96 81 0.0000000 0.000000 1.625600
## 4 1 34 0 0 0 1 119 39 0.2666667 0.000000 2.223312
## 5 1 80 0 0 0 0 132 141 0.0000000 0.000000 1.642083
## 6 0 59 0 0 0 0 156 165 0.0000000 0.000000 2.079442
## stotal smcity medcity submed lgcity sublg vlgcity subvlg ne nc south
## 1 -0.4417497 0 0 0 0 1 0 0 1 0 0
## 2 0.0000000 1 0 0 0 0 0 0 0 1 0
## 3 -1.3570027 0 0 0 0 1 0 0 1 0 0
## 4 -0.1900551 1 0 0 0 0 0 0 0 0 0
## 5 0.0000000 0 0 0 0 0 0 0 0 0 1
## 6 1.3887565 1 0 0 0 0 0 0 0 0 1
## totcoll
## 1 0.0000000
## 2 7.0333333
## 3 0.0000000
## 4 0.2666667
## 5 0.0000000
## 6 0.0000000
## [1] "female" "phsrank" "BA" "AA" "black" "hispanic"
## [7] "id" "exper" "jc" "univ" "lwage" "stotal"
## [13] "smcity" "medcity" "submed" "lgcity" "sublg" "vlgcity"
## [19] "subvlg" "ne" "nc" "south" "totcoll"
## Linear hypothesis test
##
## Hypothesis:
## I(stotal^2) = 0
##
## Model 1: restricted model
## Model 2: lwage ~ stotal + I(stotal^2) + jc + univ
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 6759 1417.8
## 2 6758 1417.8 1 0.013251 0.0632 0.8016
## Linear hypothesis test
##
## Hypothesis:
## stotal:jc = 0
## stotal:univ = 0
##
## Model 1: restricted model
## Model 2: lwage ~ stotal + jc + univ + stotal:jc + stotal:univ
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 6759 1417.8
## 2 6757 1417.5 2 0.28045 0.6684 0.5125
##
## Call:
## lm(formula = lwage ~ stotal + I(stotal^2) + jc + univ + stotal:jc +
## stotal:univ, data = twoyear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.87058 -0.30239 0.01589 0.32046 1.80138
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.107884 0.008862 237.861 < 2e-16 ***
## stotal 0.074176 0.011169 6.641 3.35e-11 ***
## I(stotal^2) 0.005856 0.006196 0.945 0.345
## jc 0.065377 0.007374 8.866 < 2e-16 ***
## univ 0.059737 0.002928 20.405 < 2e-16 ***
## stotal:jc -0.008969 0.009949 -0.901 0.367
## stotal:univ -0.004873 0.003899 -1.250 0.211
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.458 on 6756 degrees of freedom
## Multiple R-squared: 0.1188, Adjusted R-squared: 0.118
## F-statistic: 151.7 on 6 and 6756 DF, p-value: < 2.2e-16