Assignment 1 test

Author

Group Number

Question 1

Question 1a

library(wooldridge)
m1 <- lm(wage~IQ, wage2)
summary(m1)

Call:
lm(formula = wage ~ IQ, data = wage2)

Residuals:
   Min     1Q Median     3Q    Max 
-898.7 -256.5  -47.3  201.1 2072.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 116.9916    85.6415   1.366    0.172    
IQ            8.3031     0.8364   9.927   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 384.8 on 933 degrees of freedom
Multiple R-squared:  0.09554,   Adjusted R-squared:  0.09457 
F-statistic: 98.55 on 1 and 933 DF,  p-value: < 2.2e-16

\[ \widehat{wage}=\underset{(85.642)}{116.992}+\underset{(0.836)}{8.303}IQ \]

there are 935 observations.

Question 1b

Interpretation: on average, an increase in 1 unit of IQ while holding everthing else constant will result in an increase of $8.303 in wage. The intercept tells us that if a worker has 0IQ, then his/her wage will be $116.992 monthly. It is meaningful to understand the coeffiecient. It presents a wage gap between a worker with xIQ and yQ will be |x-y|*8.303 on average. As for my intuitive, someone with a higher IQ likely to work more effectively, think logically and process at higher rate. That ability surely reflect on wage to be higher, live up to their quality.

Question 1c

sd_IQ <- sd(wage2$IQ)
coef_IQ <- 8.303
change_2SD <- coef_IQ * 2 * sd_IQ
change_2SD
[1] 249.9641

so the predicted change in wage for a ncrease in 2 standard deviation in IQ is $249.964.

If a person with IQ=90, then increase 2sd in IQ would make their IQ=90 +2sd. Same happen to person with IQ=110. Because this is a linear model, the predicted change in wage would stay the same as calculated, an increase of $249.964 respond to a 2sd IQ boost no matter what their IQ is. We conclude that the predicted change in wage level does not differ for an individual with IQ=90 and an individual with IQ=110.

Question 1d

R-squared closer to 0 , the less proportion of sample variance in y explained by independent variable. In this case, it is 0.09554 means that IQ explain only 9.554% of variation in wage. Meaning that while IQ does explain the differences in wage, it does not explain most of the variation consider other factors. The F-statistics 98.55 tells us how strong the overall relationship between IQ and wage. It is very high, implies that IQ does help explain it. The p-value <2.2e^-16, which is very very small, indicates that probability of observing result which we set the null hypothesis is true (that IQ has no effect on wage) is pretty much 0. Together, IQ and wage is statistically significant and highly unlikely to be due to chance

Question 1e

Yes it does. The F-statistics 98.55 prove that IQ has an effect on wage. In this case, the coefficient of IQ is positive, meaning higher IQ on avergae cause higher wage.

Question 2

Question 2a

library(wooldridge)
m2 <- lm(wage~IQ+feduc+meduc, wage2)
summary(m2)

Call:
lm(formula = wage ~ IQ + feduc + meduc, data = wage2)

Residuals:
    Min      1Q  Median      3Q     Max 
-904.33 -255.18  -44.87  214.46 2030.56 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   24.764    102.387   0.242  0.80895    
IQ             6.805      1.047   6.500 1.51e-10 ***
feduc         14.172      5.396   2.627  0.00881 ** 
meduc         10.373      6.281   1.651  0.09909 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 383.6 on 718 degrees of freedom
  (213 observations deleted due to missingness)
Multiple R-squared:  0.1199,    Adjusted R-squared:  0.1162 
F-statistic:  32.6 on 3 and 718 DF,  p-value: < 2.2e-16

\[ \widehat{wage}=\underset{(102.387)}{24.764}+\underset{(1.047)}{6.805}IQ+\underset{(5.396)}{14.172}feduc+\underset{(6.281)}{10.373}meduc \]

Stating null and alternative hypothesis:

\[ H_0:\beta_3=0\\H_1:\beta_3>0 \]

Test statistic

\[ t = \frac{\hat{\beta}_{\text{3}} - 0}{\text{SE}(\hat{\beta}_{\text{3}})}~\sim t_{718}\quad \text{under } H_0 \\t_{\text{calc}}=10.373/6.281=1.651 \]

tcrit<-qt(0.95, df=718)
tcrit
[1] 1.646979

Decision

We see that t-calc is greater than t-crit (1.651>1.647) so we reject the null hypothesis

Conclusion

The t-calc is greater than t-crit at 5% significant level so we reject null hypothesis that, controlling for IQ and father’s education, mother’s education has no effect on wage in favour of the alternative hypothesis that mother’s education has a positive effect on wage. We conclude that there is a sufficient evidence that mother’s education has a positive effect on wage once we control for IQ and father’s education.

Question 2b

Unrestricted model:

\[ \widehat{wage}=\underset{(102.387)}{24.764}+\underset{(1.047)}{6.805}IQ+\underset{(5.396)}{14.172}feduc+\underset{(6.281)}{10.373}meduc \]

Restricted model:

\[ \widehat{wage}=\underset{(102.387)}{24.764}+\underset{(1.047)}{6.805}IQ \]

Stating null and alternative hypothesis

\[ H_0: \beta_2=\beta_3=0\\H_1:\beta_2 \neq0\quad{\text{or}}\quad\beta_3\neq0 \]

Sampling distribution under null hypothesis

\[ F_{2,722-3-1} \]

Test Statistic (F-test)

model_unrestricted <- lm(wage~IQ+feduc+meduc, wage2)
model_restricted <- lm(wage~IQ, wage2)
SSR_unrestricted <- deviance(model_unrestricted)
SSR_restricted <- deviance(model_restricted)
SSR_unrestricted
[1] 105678632
SSR_restricted
[1] 138126386
q <- 2
n <- 722
k <- 3
Fcalc <- ((138126385.629 - 105678632.430)/q) / (105678632.430/(n-k-1))
Fcalc
[1] 110.228
Fcrit <- qf(0.95,df1=q,df2=n-k-1)
Fcrit
[1] 3.008266

Decision

We see that Fcalc >Fcrit (110.228>3.008) so we reject the null hypothesis that, controlling for IQ, parent education has no effect on wage in favour of the alternative hypothesis that parent education has an effect on wage.

Conclusion

The F-test here is used for joint hypothesis test, instead of doing each model one by one. Meaning we test whether both parent (father and mother) education has no effect on wage once we control for IQ. In this case, there is a sufficient evidence that parent education has effect on wage. Therefore father and mother education are jointly significant and help explain variation in wage.

Question 2c

Stating null and alternative hypothesis

\[ H_0:\beta_1=\beta_2=\beta_3=0\\H_1:\beta_1\neq0\quad\text{or}\quad\beta_2\neq0\quad\text{or}\quad\beta_3\neq0 \]

Sampling distribution under null hypothesis

\[ F_{3,722-3-1} \]

Test statistic

R_squared <- 0.1199
Fcalc <- (R_squared/k) / ((1-R_squared)/(n-k-1))
Fcalc
[1] 32.60546
Fcrit <- qf(0.95, df1=3, df2=718)
Fcrit
[1] 2.617307

Decision rule

We see that Fcalc>Fcrit (32.605>2.617) so we reject null hypothesis that IQ, mother’s education and father’s education has no effect on wage in favour of the alternative hypothesis that at least one of them has effect on wage.

Conclusion

The F-test of all three predictors at the 5% significance level provides sufficient evidence that IQ, mother’s education, and father’s education are jointly significant in explaining the variation in wage. This means that, when considered together, these variables contribute meaningfully to the model, and at least one of them has a statistically significant effect on wage. Therefore, we reject the null hypothesis that none of the predictors has effect on wage.

Question 3

library(wooldridge)
m3 <- lm(wage~IQ+feduc+meduc, wage2)
summary(m3)

Call:
lm(formula = wage ~ IQ + feduc + meduc, data = wage2)

Residuals:
    Min      1Q  Median      3Q     Max 
-904.33 -255.18  -44.87  214.46 2030.56 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   24.764    102.387   0.242  0.80895    
IQ             6.805      1.047   6.500 1.51e-10 ***
feduc         14.172      5.396   2.627  0.00881 ** 
meduc         10.373      6.281   1.651  0.09909 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 383.6 on 718 degrees of freedom
  (213 observations deleted due to missingness)
Multiple R-squared:  0.1199,    Adjusted R-squared:  0.1162 
F-statistic:  32.6 on 3 and 718 DF,  p-value: < 2.2e-16

\[ \widehat{wage}=\underset{(102.387)}{24.764}+\underset{(1.047)}{6.805}IQ+\underset{(5.396)}{14.172}feduc+\underset{(6.281)}{10.373}meduc \]

Question 3a

Deriving model

\[ \beta_2=\beta_3 \quad\text{and}\quad\beta_2=2\beta_1\ \]

so we have

\[ \widehat{wage}=\underset{(102.387)}{24.764}+\underset{(1.047)}{6.805}(IQ+2feduc+2meduc) \]

Question 3b

Stating null and alternative hypothesis

\[ H_0: \beta_2=\beta_3 \quad\text{and}\quad\beta_2=2\beta_1\\H_1:\beta_2\neq\beta_3\quad\text{or}\quad\beta_2\neq2\beta_1 \]

Sampling distribution under null hypothesis

\[ F_{2,722-3-1} \]

Test statistic (F-test)

model_unrestricted <- lm(wage~IQ+feduc+meduc, wage2)
model_restricted <- lm(wage~I(IQ+2*feduc+2*meduc), wage2)
SSR_unrestricted <- deviance(model_unrestricted)
SSR_restricted <- deviance(model_restricted)
SSR_unrestricted 
[1] 105678632
SSR_restricted 
[1] 105710448
q <- 2
n <- 722
k <- 3
Fcalc <- ((105710447.694 - 105678632.430)/q)/(105678632.430/(n-k-1))
Fcalc
[1] 0.1080794
Fcrit <- qf(0.95,2,718)
Fcrit
[1] 3.008266

Decision

We see that Fcalc<Fcrit (0.108<3.008) so we fail to reject null hypothesis

Conclusion

The F-test result in the Fcalc being less than critical value, so we fail to reject null hypothesis at 5% significant level. Meaning we do not have enough evidence to say that the null hypothesis is false. There is insufficient evidence to claim that an extra year of mother’s education has a different effect from father’s education, or that it does not have exactly twice the effect on wage as an extra IQ point. Therefore, the data does not against the assumption that mother’s education has the same effect as father’s education and twice the effect of IQ on wage.

Question 3c

Deriving model

\[ \beta_2=\beta_3 \]

so we define, using reparametrisation trick

\[ \delta=\beta_2-\beta_3\\\beta_2=\delta+\beta_3 \]

we got the derived model

\[ \widehat{wage}=\underset{(102.387)}{24.764}+\underset{(1.047)}{6.805}IQ+\delta{feduc}+\underset{(6.281)}{10.373}({feduc}+{meduc}) \]

Question 3d

Stating null and alternative hypothesis

\[ H_0:\delta=0\\H_1:\delta>0 \]

Test statistic

model_reparam <- lm(wage~IQ+feduc+I(feduc+meduc), wage2)
summary(model_reparam)

Call:
lm(formula = wage ~ IQ + feduc + I(feduc + meduc), data = wage2)

Residuals:
    Min      1Q  Median      3Q     Max 
-904.33 -255.18  -44.87  214.46 2030.56 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        24.764    102.387   0.242   0.8090    
IQ                  6.805      1.047   6.500 1.51e-10 ***
feduc               3.799     10.193   0.373   0.7094    
I(feduc + meduc)   10.373      6.281   1.651   0.0991 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 383.6 on 718 degrees of freedom
  (213 observations deleted due to missingness)
Multiple R-squared:  0.1199,    Adjusted R-squared:  0.1162 
F-statistic:  32.6 on 3 and 718 DF,  p-value: < 2.2e-16

\[ t = \frac{\hat{\delta} - 0}{\text{SE}(\hat{\delta})}~\sim t_{718}\quad \text{under } H_0\\t_{\text{calc}}=3.799/10.193=0.373 \]

tcrit <- qt(0.95,718)
tcrit
[1] 1.646979

Decision

We see that tcalc<tcrit (0.373<1.647) so we fail to reject null hypothesis

Conclusion

The t-statistic is less than the critical value at 5% significant level so we fail to reject the null hypothesis. We say that we do not have enough evidence to state that the null hypothesis is false. Therefore, there is an insufficient evidence that an extra year of mother’s education has different effect on wage as an extra year of father’s education. The data does not contradict the null hypothesis.

Question 4

Question 4a

library(wooldridge)
m4 <- lm(wage~IQ+educ, wage2)
summary(m4)

Call:
lm(formula = wage ~ IQ + educ, data = wage2)

Residuals:
    Min      1Q  Median      3Q     Max 
-860.29 -251.00  -35.31  203.98 2110.38 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -128.8899    92.1823  -1.398    0.162    
IQ             5.1380     0.9558   5.375 9.66e-08 ***
educ          42.0576     6.5498   6.421 2.15e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 376.7 on 932 degrees of freedom
Multiple R-squared:  0.1339,    Adjusted R-squared:  0.132 
F-statistic: 72.02 on 2 and 932 DF,  p-value: < 2.2e-16

\[ \widehat{wage}=\underset{(92.182)}{-128.890}+\underset{(0.956)}{5.138}IQ+\underset{(6.550)}{42.058}educ \]

Extended model

we are given urban=1, otherwise =0, so

\[ \widehat{wage}=\widehat\beta_0+\widehat\beta_1IQ+\widehat\beta_2educ+\widehat\delta_0urban+\widehat\delta_1{({educ}\times{urban})} \]

Question 4b

library(wooldridge)
model_extended <- lm(wage~IQ+educ*urban, wage2)
summary(model_extended)

Call:
lm(formula = wage ~ IQ + educ * urban, data = wage2)

Residuals:
   Min     1Q Median     3Q    Max 
-852.0 -245.6  -35.9  192.3 2068.6 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   72.7583   147.4571   0.493   0.6218    
IQ             5.2622     0.9384   5.607 2.71e-08 ***
educ          17.3502    11.2055   1.548   0.1219    
urban       -245.1125   166.9702  -1.468   0.1424    
educ:urban    30.2400    12.3783   2.443   0.0148 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 369.2 on 930 degrees of freedom
Multiple R-squared:  0.1698,    Adjusted R-squared:  0.1662 
F-statistic: 47.55 on 4 and 930 DF,  p-value: < 2.2e-16

\[ \widehat{wage}=\underset{(147.457)}{72.758}+\underset{(0.938)}{5.262}IQ+\underset{(11.206)}{17.350}educ\underset{(166.970)}{-245.113}urban+\underset{(12.378)}{30.240}(educ\times{urban}) \]

Question 4c

Unrestricted model

\[ \widehat{wage}=\widehat\beta_0+\widehat\beta_1IQ+\widehat\beta_2educ+\widehat\delta_0urban+\widehat\delta_1{({educ}\times{urban})} \]

Restricted model

\[ \widehat{wage}=\widehat\beta_0+\widehat\beta_1IQ+\widehat\beta_2educ+\widehat\delta_1{({educ}\times{urban})} \]

Stating null and alternative hypothesis

\[ H_0:\delta_0=0\\H_1:\delta_0\neq0 \]

Test statistic

\[ t = \frac{\hat{\delta_0} - 0}{\text{SE}(\hat{\delta_0})}~\sim t_{930}\quad \text{under } H_0\\t_\text{calc}=-245.113/166.970=-1.468 \]

tcrit <- qt(0.975,930)
tcrit
[1] 1.962518

Decision

We doing a two-tailed test, and see that |tcalc|<tcrit (|-1.468|<1.963) so we fail to reject null hypothesis

Conclusion

The t-statistic is less than the critical value at 5% significant so we fail to reject the null hypothesis. Meaning that we do not have enough evidence to say that null hypothesis is false. Therefore, there is insufficient evidence that living in the city has an effect on wage and that the data does not against the null hypothesis that living in city has no effect on wage.

Question 4d

mean(wage2$IQ, na.rm=TRUE)
[1] 101.2824
mean(wage2$educ, na.rm=TRUE)
[1] 13.46845

predicted wage for individual living in city with average IQ and average education

\[ \widehat{wage}=\underset{(147.457)}{72.758}+\underset{(0.938)}{5.262}\times{101.282}+\underset{(11.206)}{17.350}\times{13.468}\underset{(166.970)}{-245.113}+\underset{(12.378)}{30.240}(13.468\times{1}) \]

predicted wage for individual living in rural with average IQ and average education

\[ \widehat{wage}=\underset{(147.457)}{72.758}+\underset{(0.938)}{5.262}\times{101.282}+\underset{(11.206)}{17.350}\times{13.468} \]

IQ_avg <- 101.282
educ_avg <- 13.468
wage_city <- 72.758+5.262*IQ_avg+17.350*educ_avg-245.113+30.240*educ_avg
wage_rural <- 72.758+5.262*IQ_avg+17.350*educ_avg
wage_city
[1] 1001.533
wage_rural
[1] 839.3737

education level at which there is no difference between these two individuals:

\[ 72.758+5.262IQ+17.350educ-245.113+30.240educ=72.758+5.262IQ+17.350educ\\\text{simplify, we have}\\47.59educ-245.113=17.350educ\\=>educ\sim8.106 \]