This exercise is based on Stock, J.H. and Watson, M.W., 2015. Introduction to econometrics
#---set working directory and load packages
setwd("D:/CASED2021/Exercise/Chapter7")
#setwd("H:\\My Drive\\Econometrics_CASED2021\\Data\\Earnings_and_Height")
# install.packages('car')
require(car)
require(readxl)
require(dplyr)
#---reading the data
data <- read_excel("Earnings_and_Height.xlsx")Dat: If omitted factor ‘cognitive ability’ affects ‘earnings’ and correlates with ‘height’, the LSA1 assumption will be violated. This leads to bias in the OLS estimator of ‘height’ as this estimator also captures the effect of ‘cognitive ability’. Strength and direction of this bias are determined by the correlation between ‘cognitive ability’ and ‘height’. It is likely that this correlation is positive, so there is a positively biased estimate of ‘height’. It means that this bias leads the estimated slope of ‘heigh’ to be to large.
Use the years of education variable (educ) to construct four indicator variables for whether a worker has less than a high school diploma (LT_HS = 1 if educ < 12, 0 otherwise), a high school diploma (HS = 1 if educ = 12, 0 otherwise), some college (Some_Col = 1 if 12 < educ < 16, 0 otherwise), or a bachelor’s degree or higher (College = 1 if educ <= 16, 0 otherwise).
###.a1 Create variables as request
data <- data %>% mutate(LT_HS = ifelse(educ < 12, 1, 0),
HS = ifelse(educ == 12, 1, 0) ,
Some_Col = ifelse((educ < 16) & (educ > 12), 1, 0),
College = ifelse(educ >= 16, 1, 0))
table(data$LT_HS)##
## 0 1
## 16072 1798
##
## 0 1
## 11375 6495
##
## 0 1
## 13427 4443
##
## 0 1
## 12736 5134
###.a2 Create a categorical Variables to compare with these above dummy variables
data <- data %>% mutate(educCAT = case_when(LT_HS == 1 ~ "LT_HS",
HS == 1 ~ "HS",
Some_Col == 1 ~ "Some_Col",
TRUE ~ "College"))
summary(data$educCAT)## Length Class Mode
## 17870 character character
## College HS LT_HS Some_Col
## 5134 6495 1798 4443
reg1 <- lm(earnings ~ height, data = data%>% filter(sex == 0))
reg2 <- lm(earnings ~ height + LT_HS + HS + Some_Col , data = data%>% filter(sex == 0))
summary(reg1)##
## Call:
## lm(formula = earnings ~ height, data = data %>% filter(sex ==
## 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -42748 -22006 -7466 36641 46865
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12650.9 6383.7 1.982 0.0475 *
## height 511.2 98.9 5.169 2.4e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26800 on 9972 degrees of freedom
## Multiple R-squared: 0.002672, Adjusted R-squared: 0.002572
## F-statistic: 26.72 on 1 and 9972 DF, p-value: 2.396e-07
##
## Call:
## lm(formula = earnings ~ height + LT_HS + HS + Some_Col, data = data %>%
## filter(sex == 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -54537 -19082 -5808 24386 58676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50749.52 6013.68 8.439 <2e-16 ***
## height 135.14 92.55 1.460 0.144
## LT_HS -31857.81 963.77 -33.055 <2e-16 ***
## HS -20417.89 626.18 -32.607 <2e-16 ***
## Some_Col -12649.07 685.30 -18.458 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24920 on 9969 degrees of freedom
## Multiple R-squared: 0.1382, Adjusted R-squared: 0.1378
## F-statistic: 399.6 on 4 and 9969 DF, p-value: < 2.2e-16
reg2 <- lm(earnings ~ height + LT_HS + HS + Some_Col, data = data%>% filter(sex == 0))
summary(reg2)##
## Call:
## lm(formula = earnings ~ height + LT_HS + HS + Some_Col, data = data %>%
## filter(sex == 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -54537 -19082 -5808 24386 58676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50749.52 6013.68 8.439 <2e-16 ***
## height 135.14 92.55 1.460 0.144
## LT_HS -31857.81 963.77 -33.055 <2e-16 ***
## HS -20417.89 626.18 -32.607 <2e-16 ***
## Some_Col -12649.07 685.30 -18.458 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24920 on 9969 degrees of freedom
## Multiple R-squared: 0.1382, Adjusted R-squared: 0.1378
## F-statistic: 399.6 on 4 and 9969 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = earnings ~ height + educCAT, data = data %>% filter(sex ==
## 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -54537 -19082 -5808 24386 58676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50749.52 6013.68 8.439 <2e-16 ***
## height 135.14 92.55 1.460 0.144
## educCATHS -20417.89 626.18 -32.607 <2e-16 ***
## educCATLT_HS -31857.81 963.77 -33.055 <2e-16 ***
## educCATSome_Col -12649.07 685.30 -18.458 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24920 on 9969 degrees of freedom
## Multiple R-squared: 0.1382, Adjusted R-squared: 0.1378
## F-statistic: 399.6 on 4 and 9969 DF, p-value: < 2.2e-16
reg4 <- lm(earnings ~ height + HS + Some_Col + College, data = data%>% filter(sex == 0))
summary(reg4)##
## Call:
## lm(formula = earnings ~ height + HS + Some_Col + College, data = data %>%
## filter(sex == 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -54537 -19082 -5808 24386 58676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18891.71 5952.00 3.174 0.00151 **
## height 135.14 92.55 1.460 0.14425
## HS 11439.92 927.55 12.334 < 2e-16 ***
## Some_Col 19208.74 971.36 19.775 < 2e-16 ***
## College 31857.81 963.77 33.055 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24920 on 9969 degrees of freedom
## Multiple R-squared: 0.1382, Adjusted R-squared: 0.1378
## F-statistic: 399.6 on 4 and 9969 DF, p-value: < 2.2e-16
data$educCAT <- as.factor(data$educCAT)
data$educCAT<- relevel(data$educCAT, ref = "LT_HS")
#data <- within(data, educCAT<- relevel(educCAT, ref = "LT_HS"))
reg5 <- lm(earnings ~ height + educCAT, data = data%>% filter(sex == 0))
summary(reg5)##
## Call:
## lm(formula = earnings ~ height + educCAT, data = data %>% filter(sex ==
## 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -54537 -19082 -5808 24386 58676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18891.71 5952.00 3.174 0.00151 **
## height 135.14 92.55 1.460 0.14425
## educCATCollege 31857.81 963.77 33.055 < 2e-16 ***
## educCATHS 11439.92 927.55 12.334 < 2e-16 ***
## educCATSome_Col 19208.74 971.36 19.775 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24920 on 9969 degrees of freedom
## Multiple R-squared: 0.1382, Adjusted R-squared: 0.1378
## F-statistic: 399.6 on 4 and 9969 DF, p-value: < 2.2e-16
reg6 <- lm(earnings ~ height + height^2, data = data)
summary(reg6) # the same as: reg6 <- lm(earnings ~ height, data = data)##
## Call:
## lm(formula = earnings ~ height + height^2, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47836 -21879 -7976 34323 50599
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -512.73 3386.86 -0.151 0.88
## height 707.67 50.49 14.016 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26780 on 17868 degrees of freedom
## Multiple R-squared: 0.01088, Adjusted R-squared: 0.01082
## F-statistic: 196.5 on 1 and 17868 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = earnings ~ height + I(height^2), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47918 -21892 -7947 34326 49942
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7839.609 47084.795 0.166 0.868
## height 458.416 1402.399 0.327 0.744
## I(height^2) 1.853 10.419 0.178 0.859
##
## Residual standard error: 26780 on 17867 degrees of freedom
## Multiple R-squared: 0.01088, Adjusted R-squared: 0.01077
## F-statistic: 98.24 on 2 and 17867 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = earnings ~ height + HS + height:HS, data = data %>%
## filter(sex == 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -46325 -20753 -6320 33003 46515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12232.8 7894.5 1.550 0.1213
## height 579.4 122.1 4.743 2.13e-06 ***
## HS 13774.4 12996.4 1.060 0.2892
## height:HS -377.1 201.5 -1.871 0.0613 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26310 on 9970 degrees of freedom
## Multiple R-squared: 0.03916, Adjusted R-squared: 0.03887
## F-statistic: 135.4 on 3 and 9970 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = earnings ~ height * HS, data = data %>% filter(sex ==
## 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -46325 -20753 -6320 33003 46515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12232.8 7894.5 1.550 0.1213
## height 579.4 122.1 4.743 2.13e-06 ***
## HS 13774.4 12996.4 1.060 0.2892
## height:HS -377.1 201.5 -1.871 0.0613 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26310 on 9970 degrees of freedom
## Multiple R-squared: 0.03916, Adjusted R-squared: 0.03887
## F-statistic: 135.4 on 3 and 9970 DF, p-value: < 2.2e-16
## (Intercept) height
## 12650.8577 511.2222
## (Intercept) height LT_HS HS Some_Col
## 50749.5220 135.1421 -31857.8086 -20417.8879 -12649.0655
## height educ
## height 1.0000000 0.1195458
## educ 0.1195458 1.0000000