103 HW2 Computer Exercise

(a)

library(AER)

## Loading required package: car

## Loading required package: carData

## Loading required package: lmtest

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## Loading required package: survival

data("CPSSWEducation")
CPSSWEducation[1:6,]

##   age gender earnings education
## 1  30   male 34.61538        16
## 2  30 female 19.23077        16
## 3  30 female 13.73626        12
## 4  30 female 13.94231        13
## 5  30 female 19.23077        16
## 6  30 female  8.00000        12

regout <-lm(earnings ~ education, data=CPSSWEducation)
summary(regout)

## 
## Call:
## lm(formula = earnings ~ education, data = CPSSWEducation)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.270  -5.355  -1.513   3.194  77.164 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.13437    0.95925  -3.268   0.0011 ** 
## education    1.46693    0.06978  21.021   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.769 on 2948 degrees of freedom
## Multiple R-squared:  0.1304, Adjusted R-squared:  0.1301 
## F-statistic: 441.9 on 1 and 2948 DF,  p-value: < 2.2e-16

With this regression, we can estimate that adding one additional year of education is associated with an average increase in 1.46693 units of average hourly earnings for an individual over the year.

(b)

Sigma =vcov(regout)
Sigma # view the covariance matrix of the estimated coefficients

##             (Intercept)   education
## (Intercept)  0.92016882 -0.06598445
## education   -0.06598445  0.00486964

coef <- regout$coefficients
n <- nrow(CPSSWEducation)
t=(coef[[2]]-0)/(sqrt(Sigma[2,2]))
t

## [1] 21.0213

pnorm(t)

## [1] 1

qnorm(0.01)

## [1] -2.326348

We have z=-2.326348, which is smaller than t*=21.0213. We have p=1, which is larger than alpha=0.01. We fail to reject the null hypothesis.

(c)

c(
  coef[[2]]+qnorm(0.99)*sqrt(Sigma[2,2]) #upperbound
)

## [1] 1.629265

Our one-sided 99% confidence interval is (-∞,1.629265). We interpret this as: we are 99% confident that the true value of beta1 lies in this interval.

(d)

SST=SSR+SSE

var(CPSSWEducation$earnings)

## [1] 88.39904

var(regout$fitted.values)

## [1] 11.5234

var(regout$residuals)

## [1] 76.87564

var(regout$fitted.values)+var(regout$residuals)

## [1] 88.39904

#calculate R^2:
var(regout$fitted.values)/var(CPSSWEducation$earnings)

## [1] 0.1303566

summary(regout)

## 
## Call:
## lm(formula = earnings ~ education, data = CPSSWEducation)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.270  -5.355  -1.513   3.194  77.164 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.13437    0.95925  -3.268   0.0011 ** 
## education    1.46693    0.06978  21.021   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.769 on 2948 degrees of freedom
## Multiple R-squared:  0.1304, Adjusted R-squared:  0.1301 
## F-statistic: 441.9 on 1 and 2948 DF,  p-value: < 2.2e-16

We interpret R^2 as: 13.03566% of total variance in Y(average hourly earnings for an individual over the year) can be explained by the liner model with X(years of educations)

(e)

correlation <- cor(CPSSWEducation$earnings, CPSSWEducation$education)
correlation

## [1] 0.3610493

We found that R^2 is equal to the square of correlation, (0.3610493)^2=0.13035, approximately 0.1304

(f)

regout_log <-lm(log(earnings) ~ education, data=CPSSWEducation)
summary(regout_log)

## 
## Call:
## lm(formula = log(earnings) ~ education, data = CPSSWEducation)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.84957 -0.28552  0.00672  0.28453  1.88353 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.482326   0.051449   28.81   <2e-16 ***
## education   0.088880   0.003743   23.75   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4703 on 2948 degrees of freedom
## Multiple R-squared:  0.1606, Adjusted R-squared:  0.1603 
## F-statistic: 563.9 on 1 and 2948 DF,  p-value: < 2.2e-16

We interpret the gamma 1 in log model by: as each additional year of education (X), there is a positive 8.888% increase in earnings for an individual.

103 HW2 Computer Exercise

Manshu Huang

(a)

(b)

(c)

(d)

(e)

(f)