ECON2206 Assignment2 2018 2nd Sem
(Problem1) * Originally, STATA assignment * Adpated to R

The file beauty.dta collects information on wages hourly wage, years of experience and gender for 1; 260 individuals. We also have information on a subjective index of beauty (looks) measured from 1 to 5.

\[ lwage = \beta_0 + exper + exper^2 + \beta_3 looks + \beta_4female + u \ (1) \]

  1. Required packages
library(tidyverse) #data wrangling
library(foreign)   #read.dta()
library(modelr)    #modelling
library(car)       #linearHypothesis()
library(MASS)      #robust regression
  1. Estimate the parameter of the econometric model in equation (1). What is the estimated effect of experience on wages?

Importing STATA data file(.dta) into R
1) open STATA
2) open the corresponding data file(.dta)
3) type “saveold (file_name).dta, version(12)”
4) import the saved file into R using read.dta function

beauty <- read.dta("C:/Users/Kang/Desktop/UNSW/S2/ECON2206/STATA and Assignments/beauty(1).dta")
colnames(beauty)
##  [1] "wage"     "lwage"    "belavg"   "abvavg"   "exper"    "looks"   
##  [7] "union"    "goodhlth" "black"    "female"   "married"  "south"   
## [13] "bigcity"  "smllcity" "service"  "expersq"  "educ"
beauty %>% head()
##    wage    lwage belavg abvavg exper looks union goodhlth black female
## 1  5.73 1.745715      0      1    30     4     0        1     0      1
## 2  4.28 1.453953      0      0    28     3     0        1     0      1
## 3  7.96 2.074429      0      1    35     4     0        1     0      1
## 4 11.57 2.448416      0      0    38     3     0        1     0      0
## 5 11.42 2.435366      0      0    27     3     0        1     0      0
## 6  3.91 1.363537      0      0    20     3     0        0     0      1
##   married south bigcity smllcity service expersq educ
## 1       1     0       0        1       1     900   14
## 2       1     1       0        1       0     784   12
## 3       0     0       0        1       0    1225   10
## 4       1     0       1        0       1    1444   16
## 5       1     0       0        1       0     729   16
## 6       1     0       1        0       0     400   12

A. estimated parameters

mod1 <- lm(lwage ~ exper + expersq + looks + female, data = beauty)
mod1
## 
## Call:
## lm(formula = lwage ~ exper + expersq + looks + female, data = beauty)
## 
## Coefficients:
## (Intercept)        exper      expersq        looks       female  
##   1.0783855    0.0462051   -0.0008054    0.0887511   -0.4664837

B. estimated effect of experience on avg

impact_exper <- function(current_exper) {
  coef(mod1)[2] + 2*coef(mod1)[3] * current_exper #coef(mod1)[2]: exper regressor
}                                                 #coef(mod1)[3]: expersq regressor

impact_exper(5) #ex) estimated effect on log(wage) of one more yr of exper for a person with 5 yrs of exper
##      exper 
## 0.03815152
  1. What can we say about the effect of an increase of one point in the looks scale on wages?
coef(mod1)[4]
##      looks 
## 0.08875113
  1. Is there any evidence of functional form misspecification for equation (1)?

\[ \text{RESET test} \\ \begin{align} &lwage = \gamma_1exper+\gamma_2exper^2 + \gamma_3looks + \gamma_4female + \gamma_5fitted^2 + \gamma_6fitted^3 \ \text{(unrestricted model)} \\ &lwage = \beta_0 + exper + exper^2 + \beta_3 looks + \beta_4female + u \ (1) \ \text{(restricted model)} \\ \end{align} \\ \] \[ \begin{align} &H_0: \gamma_5 = \gamma_6 = 0 \ &\text{(no misspecification)}\\ &H_1: \text{~} H_0 \ &\text{(misspecifcation)} \end{align} \]

1) create fitted^2, fitted^3 values (column)

beauty$fitted1 <- predict(mod1)
beauty$fitted2 <- predict(mod1)^2
beauty$fitted3 <- predict(mod1)^3

2) run the unrestricted model

mod1_1 <- lm(lwage ~ exper + expersq + looks + female + fitted2 + fitted3, data = beauty)

3) joint F-test

linearHypothesis(mod1_1, c("fitted2=0", "fitted3=0")) #H0: cannot be rejected
## Linear hypothesis test
## 
## Hypothesis:
## fitted2 = 0
## fitted3 = 0
## 
## Model 1: restricted model
## Model 2: lwage ~ exper + expersq + looks + female + fitted2 + fitted3
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1   1255 321.21                           
## 2   1253 320.39  2   0.81863 1.6008 0.2022
                                                      #No significant misspecification
  1. How would you test the hypothesis that beauty has the same effect on wages both for males and females? Provide a p-value for such test.

1) create an interaction term * between female and looks
* if this term is significant, different effects of looks on each different gender confirmed

beauty <-
  beauty %>% 
  mutate(femlooks = female*looks)

2) model estimates + hypothesis testing

mod3 <- lm(lwage ~ exper + expersq + looks + female + femlooks, data = beauty)
summary(mod3) 
## 
## Call:
## lm(formula = lwage ~ exper + expersq + looks + female + femlooks, 
##     data = beauty)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9626 -0.3068  0.0111  0.3068  3.0112 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.1347492  0.0977813  11.605  < 2e-16 ***
## exper        0.0466609  0.0046682   9.995  < 2e-16 ***
## expersq     -0.0008166  0.0001039  -7.862 8.12e-15 ***
## looks        0.0700875  0.0267244   2.623  0.00883 ** 
## female      -0.6228310  0.1410797  -4.415 1.10e-05 ***
## femlooks     0.0489631  0.0431046   1.136  0.25621    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5059 on 1254 degrees of freedom
## Multiple R-squared:  0.2789, Adjusted R-squared:  0.276 
## F-statistic: 96.99 on 5 and 1254 DF,  p-value: < 2.2e-16
#not significant (p-value: 25.6%)
  1. Is there any evidence of heteroskedasticity in equation (1)? Provide an heteroskedas- ticity robust estimate of the parameters in equation (1).

A. BP test

[ \[\begin{align} \text{Theoratical background}\\ &Var(u|X) = E(u^2|X) - [E(u|X)]^2 = E(u^2|X) \ \text{(zero conditional mean)} \\ \text{In case of homoskedasticity,} \\ &Var(u|X) = Var(u) = \sigma^2 \\ \text{and accordingly} \\ &E(u^2|X) = E(u^2) = \sigma^2 \\ \text{Therefore,} \\ &H_0: u^2\ \text{has no linear relationship with respect to x (homoskedasticity)} \\ &H_1: \text{~} H_0 \end{align}\] ]

beauty$resid <- resid(mod1)
beauty$resid2 <- resid(mod1)^2

hetero1 <- lm(resid2 ~ exper + expersq + looks + female, data = beauty)
summary(hetero1)
## 
## Call:
## lm(formula = resid2 ~ exper + expersq + looks + female, data = beauty)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3082 -0.2241 -0.1507  0.0353  8.9580 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  1.218e-01  7.911e-02   1.539  0.12399   
## exper        1.133e-02  4.367e-03   2.594  0.00961 **
## expersq     -2.250e-04  9.709e-05  -2.317  0.02064 * 
## looks        8.844e-03  1.979e-02   0.447  0.65505   
## female       1.595e-02  2.907e-02   0.549  0.58338   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.475 on 1255 degrees of freedom
## Multiple R-squared:  0.005741,   Adjusted R-squared:  0.002572 
## F-statistic: 1.812 on 4 and 1255 DF,  p-value: 0.1242
#no heteroskedasticity by BP test (F-statistic)

B. White test

\[ u^2 = \delta_0 + \delta_1\hat{y} + \delta_2\hat{y}^2 + \epsilon \\ H_0: \delta_1 = \delta_2 =0 \]

hetero2 <- lm(resid2 ~ fitted1 + fitted2, data = beauty)
summary(hetero2) 
## 
## Call:
## lm(formula = resid2 ~ fitted1 + fitted2, data = beauty)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2994 -0.2261 -0.1586  0.0351  8.9843 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.4714     0.3467   1.360    0.174
## fitted1      -0.3549     0.4521  -0.785    0.433
## fitted2       0.1306     0.1430   0.913    0.361
## 
## Residual standard error: 0.4755 on 1257 degrees of freedom
## Multiple R-squared:  0.002035,   Adjusted R-squared:  0.000447 
## F-statistic: 1.282 on 2 and 1257 DF,  p-value: 0.278
#no heteroskedasticity by White test (F-statistic)

C. Robust Regression

mod4 <- rlm(lwage ~ exper + expersq + looks + female, data = beauty)
  1. Bonus: Residual Analysis
par(mfrow = c(2,2))
plot(mod1) #No significant problem observed

par(mfrow = c(2,2))
plot(mod4) #robust model