(a)

library(AER)
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
data("CPSSWEducation")
CPSSWEducation[1:6,]
##   age gender earnings education
## 1  30   male 34.61538        16
## 2  30 female 19.23077        16
## 3  30 female 13.73626        12
## 4  30 female 13.94231        13
## 5  30 female 19.23077        16
## 6  30 female  8.00000        12
regout <-lm(earnings ~ education, data=CPSSWEducation)
summary(regout)
## 
## Call:
## lm(formula = earnings ~ education, data = CPSSWEducation)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.270  -5.355  -1.513   3.194  77.164 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.13437    0.95925  -3.268   0.0011 ** 
## education    1.46693    0.06978  21.021   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.769 on 2948 degrees of freedom
## Multiple R-squared:  0.1304, Adjusted R-squared:  0.1301 
## F-statistic: 441.9 on 1 and 2948 DF,  p-value: < 2.2e-16

With this regression, we can estimate that adding one additional year of education is associated with an average increase in 1.46693 units of average hourly earnings for an individual over the year.

(b)

Sigma =vcov(regout)
Sigma # view the covariance matrix of the estimated coefficients
##             (Intercept)   education
## (Intercept)  0.92016882 -0.06598445
## education   -0.06598445  0.00486964
coef <- regout$coefficients
n <- nrow(CPSSWEducation)
t=(coef[[2]]-0)/(sqrt(Sigma[2,2]))
t
## [1] 21.0213
pnorm(t)
## [1] 1
qnorm(0.01)
## [1] -2.326348

We have z=-2.326348, which is smaller than t*=21.0213. We have p=1, which is larger than alpha=0.01. We fail to reject the null hypothesis.

(c)

c(
  coef[[2]]+qnorm(0.99)*sqrt(Sigma[2,2]) #upperbound
)
## [1] 1.629265

Our one-sided 99% confidence interval is (-∞,1.629265). We interpret this as: we are 99% confident that the true value of beta1 lies in this interval.

(d)

SST=SSR+SSE

var(CPSSWEducation$earnings)
## [1] 88.39904
var(regout$fitted.values)
## [1] 11.5234
var(regout$residuals)
## [1] 76.87564
var(regout$fitted.values)+var(regout$residuals)
## [1] 88.39904
#calculate R^2:
var(regout$fitted.values)/var(CPSSWEducation$earnings)
## [1] 0.1303566
summary(regout)
## 
## Call:
## lm(formula = earnings ~ education, data = CPSSWEducation)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.270  -5.355  -1.513   3.194  77.164 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.13437    0.95925  -3.268   0.0011 ** 
## education    1.46693    0.06978  21.021   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.769 on 2948 degrees of freedom
## Multiple R-squared:  0.1304, Adjusted R-squared:  0.1301 
## F-statistic: 441.9 on 1 and 2948 DF,  p-value: < 2.2e-16

We interpret R^2 as: 13.03566% of total variance in Y(average hourly earnings for an individual over the year) can be explained by the liner model with X(years of educations)

(e)

correlation <- cor(CPSSWEducation$earnings, CPSSWEducation$education)
correlation
## [1] 0.3610493

We found that R^2 is equal to the square of correlation, (0.3610493)^2=0.13035, approximately 0.1304

(f)

regout_log <-lm(log(earnings) ~ education, data=CPSSWEducation)
summary(regout_log)
## 
## Call:
## lm(formula = log(earnings) ~ education, data = CPSSWEducation)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.84957 -0.28552  0.00672  0.28453  1.88353 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.482326   0.051449   28.81   <2e-16 ***
## education   0.088880   0.003743   23.75   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4703 on 2948 degrees of freedom
## Multiple R-squared:  0.1606, Adjusted R-squared:  0.1603 
## F-statistic: 563.9 on 1 and 2948 DF,  p-value: < 2.2e-16

We interpret the gamma 1 in log model by: as each additional year of education (X), there is a positive 8.888% increase in earnings for an individual.