library(AER)
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
data("CPSSWEducation")
CPSSWEducation[1:6,]
## age gender earnings education
## 1 30 male 34.61538 16
## 2 30 female 19.23077 16
## 3 30 female 13.73626 12
## 4 30 female 13.94231 13
## 5 30 female 19.23077 16
## 6 30 female 8.00000 12
regout <-lm(earnings ~ education, data=CPSSWEducation)
summary(regout)
##
## Call:
## lm(formula = earnings ~ education, data = CPSSWEducation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.270 -5.355 -1.513 3.194 77.164
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.13437 0.95925 -3.268 0.0011 **
## education 1.46693 0.06978 21.021 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.769 on 2948 degrees of freedom
## Multiple R-squared: 0.1304, Adjusted R-squared: 0.1301
## F-statistic: 441.9 on 1 and 2948 DF, p-value: < 2.2e-16
With this regression, we can estimate that adding one additional year of education is associated with an average increase in 1.46693 units of average hourly earnings for an individual over the year.
Sigma =vcov(regout)
Sigma # view the covariance matrix of the estimated coefficients
## (Intercept) education
## (Intercept) 0.92016882 -0.06598445
## education -0.06598445 0.00486964
coef <- regout$coefficients
n <- nrow(CPSSWEducation)
t=(coef[[2]]-0)/(sqrt(Sigma[2,2]))
t
## [1] 21.0213
pnorm(t)
## [1] 1
qnorm(0.01)
## [1] -2.326348
We have z=-2.326348, which is smaller than t*=21.0213. We have p=1, which is larger than alpha=0.01. We fail to reject the null hypothesis.
c(
coef[[2]]+qnorm(0.99)*sqrt(Sigma[2,2]) #upperbound
)
## [1] 1.629265
Our one-sided 99% confidence interval is (-∞,1.629265). We interpret this as: we are 99% confident that the true value of beta1 lies in this interval.
SST=SSR+SSE
var(CPSSWEducation$earnings)
## [1] 88.39904
var(regout$fitted.values)
## [1] 11.5234
var(regout$residuals)
## [1] 76.87564
var(regout$fitted.values)+var(regout$residuals)
## [1] 88.39904
#calculate R^2:
var(regout$fitted.values)/var(CPSSWEducation$earnings)
## [1] 0.1303566
summary(regout)
##
## Call:
## lm(formula = earnings ~ education, data = CPSSWEducation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.270 -5.355 -1.513 3.194 77.164
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.13437 0.95925 -3.268 0.0011 **
## education 1.46693 0.06978 21.021 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.769 on 2948 degrees of freedom
## Multiple R-squared: 0.1304, Adjusted R-squared: 0.1301
## F-statistic: 441.9 on 1 and 2948 DF, p-value: < 2.2e-16
We interpret R^2 as: 13.03566% of total variance in Y(average hourly earnings for an individual over the year) can be explained by the liner model with X(years of educations)
correlation <- cor(CPSSWEducation$earnings, CPSSWEducation$education)
correlation
## [1] 0.3610493
We found that R^2 is equal to the square of correlation, (0.3610493)^2=0.13035, approximately 0.1304
regout_log <-lm(log(earnings) ~ education, data=CPSSWEducation)
summary(regout_log)
##
## Call:
## lm(formula = log(earnings) ~ education, data = CPSSWEducation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.84957 -0.28552 0.00672 0.28453 1.88353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.482326 0.051449 28.81 <2e-16 ***
## education 0.088880 0.003743 23.75 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4703 on 2948 degrees of freedom
## Multiple R-squared: 0.1606, Adjusted R-squared: 0.1603
## F-statistic: 563.9 on 1 and 2948 DF, p-value: < 2.2e-16
We interpret the gamma 1 in log model by: as each additional year of education (X), there is a positive 8.888% increase in earnings for an individual.