library(dplyr)
library(wooldridge)
library(car)
library(quantreg)

Chapter 7

1.

data("sleep75")
summary(lm(sleep ~ totwrk + educ + age + agesq + male,data = sleep75))

## 
## Call:
## lm(formula = sleep ~ totwrk + educ + age + agesq + male, data = sleep75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2378.00  -243.29     6.74   259.24  1350.19 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3840.83197  235.10870  16.336   <2e-16 ***
## totwrk        -0.16342    0.01813  -9.013   <2e-16 ***
## educ         -11.71332    5.86689  -1.997   0.0463 *  
## age           -8.69668   11.20746  -0.776   0.4380    
## agesq          0.12844    0.13390   0.959   0.3378    
## male          87.75243   34.32616   2.556   0.0108 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 417.7 on 700 degrees of freedom
## Multiple R-squared:  0.1228, Adjusted R-squared:  0.1165 
## F-statistic: 19.59 on 5 and 700 DF,  p-value: < 2.2e-16

i)

Yes, men sleep more than women, with a coefficient of 87.75 minutes. The p-value for “male” is 0.0108, which is statistically significant, indicating strong evidence that men sleep more.

ii)

Yes, there is a significant tradeoff. The coefficient for totwrk is -0.16342, meaning for each additional hour of work, a person sleeps about 0.16 minutes less. The p-value is extremely small (< 2e-16), indicating strong evidence.

iii)

To test if age affects sleep, run a regression excluding age and agesq

3.

data("gpa2")
summary(lm(sat ~ hsize + hsizesq + female + black + female:black,data = gpa2))

## 
## Call:
## lm(formula = sat ~ hsize + hsizesq + female + black + female:black, 
##     data = gpa2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -570.45  -89.54   -5.24   85.41  479.13 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1028.0972     6.2902 163.445  < 2e-16 ***
## hsize          19.2971     3.8323   5.035 4.97e-07 ***
## hsizesq        -2.1948     0.5272  -4.163 3.20e-05 ***
## female        -45.0915     4.2911 -10.508  < 2e-16 ***
## black        -169.8126    12.7131 -13.357  < 2e-16 ***
## female:black   62.3064    18.1542   3.432 0.000605 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 133.4 on 4131 degrees of freedom
## Multiple R-squared:  0.08578,    Adjusted R-squared:  0.08468 
## F-statistic: 77.52 on 5 and 4131 DF,  p-value: < 2.2e-16

i)

Since the p-value is very small (p-value<0.001), there is strong evidence that hsize^2 should be included in the model. This indicates that the relationship between SAT scores and high school size is nonlinear.

To find the optimal high school size: The optimal size is determined by finding the turning point of the quadratic equation:

\(SAT = \beta_0+\beta_1hsize+\beta_2hsizesq\)

The turning point is given by:

\(hsize_\text{optimal} = \frac{-\beta_1}{2\times\beta_2}\)

Substituting the values:

\(hsize_\text{optimal} = \frac{-19.2971}{2\times-2.1948} \approx4.4\)

The optimal high school size is approximately 440 students (since hsize is measured in hundreds).

ii)

The difference in SAT scores between nonblack females and nonblack males is captured by the coefficient of female: Nonblack females score, on average, 45.09 points lower than nonblack males, holding other factors constant.

The p-value for the female coefficient is very small, indicating that this difference is highly statistically significant.

iii)

The difference in SAT scores between nonblack males and black males is captured by the coefficient of black: Black males score, on average, 169.81 points lower than nonblack males, holding other factors constant.

The very small p-value in the difference indicates it is statistically significant.

iv)

The difference in SAT scores between black females and nonblack females is the sum of the coefficients for black and female:black \(Difference = \beta_\text{black}+\beta_\text{female:black}\)

\(Difference = -169.8126 + 62.3064=−107.5062\)

Black females score, on average, 107.51 points lower than nonblack females, holding other factors constant.

To test the significance of this difference, perform a hypothesis test for the sum of the coefficients:

Null hypothesis: \(\beta_\text{black}+\beta_\text{female:black}=0\)
Alternative hypothesis: \(\beta_\text{black}+\beta_\text{female:black}\neq0\)

C1

i)

data("gpa1")
summary(lm(colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll ,data = gpa1))

## 
## Call:
## lm(formula = colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll, 
##     data = gpa1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78149 -0.25726 -0.02121  0.24691  0.74432 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.255554   0.335392   3.744 0.000268 ***
## PC           0.151854   0.058716   2.586 0.010762 *  
## hsGPA        0.450220   0.094280   4.775 4.61e-06 ***
## ACT          0.007724   0.010678   0.723 0.470688    
## mothcoll    -0.003758   0.060270  -0.062 0.950376    
## fathcoll     0.041800   0.061270   0.682 0.496265    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3344 on 135 degrees of freedom
## Multiple R-squared:  0.2222, Adjusted R-squared:  0.1934 
## F-statistic: 7.713 on 5 and 135 DF,  p-value: 2.083e-06

The estimated effect of PC ownership remains positive (0.151854) and is still statistically significant (p-value = 0.010762), meaning that students who own a PC tend to have higher GPAs on average, after accounting for the other variables in the model.

ii)

full_model <- lm(colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll, data = gpa1)
reduced_model <- lm(colGPA ~ PC + hsGPA + ACT, data = gpa1)
anova(reduced_model, full_model)

## Analysis of Variance Table
## 
## Model 1: colGPA ~ PC + hsGPA + ACT
## Model 2: colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    137 15.149                           
## 2    135 15.094  2  0.054685 0.2446 0.7834

The p-value for the test of 0.7834, indicating that here is no statistically significant evidence that the variables mothcoll and fathcoll together affect the college GPA. Thus, we conclude that, based on this test, mothcoll and fathcoll do not contribute significantly to explaining the variation in college GPA.

iii)

gpa1_iii <- gpa1 %>% 
  mutate(hsGPAsq = hsGPA^2)
summary(lm(colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll + hsGPAsq,data = gpa1_iii))

## 
## Call:
## lm(formula = colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll + 
##     hsGPAsq, data = gpa1_iii)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78998 -0.24327 -0.00648  0.26179  0.72231 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  5.040328   2.443038   2.063   0.0410 *
## PC           0.140446   0.058858   2.386   0.0184 *
## hsGPA       -1.802520   1.443552  -1.249   0.2140  
## ACT          0.004786   0.010786   0.444   0.6580  
## mothcoll     0.003091   0.060110   0.051   0.9591  
## fathcoll     0.062761   0.062401   1.006   0.3163  
## hsGPAsq      0.337341   0.215711   1.564   0.1202  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3326 on 134 degrees of freedom
## Multiple R-squared:  0.2361, Adjusted R-squared:  0.2019 
## F-statistic: 6.904 on 6 and 134 DF,  p-value: 2.088e-06

Adding hsGPAsq (high school GPA squared) to the model does not significantly improve the fit. The p-value for hsGPAsq is 0.1202, which is greater than 0.05, indicating it is not statistically significant.

C2

i)

data("wage2")
model1 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban, data = wage2)
summary(model1)

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98069 -0.21996  0.00707  0.24288  1.22822 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.395497   0.113225  47.653  < 2e-16 ***
## educ         0.065431   0.006250  10.468  < 2e-16 ***
## exper        0.014043   0.003185   4.409 1.16e-05 ***
## tenure       0.011747   0.002453   4.789 1.95e-06 ***
## married      0.199417   0.039050   5.107 3.98e-07 ***
## black       -0.188350   0.037667  -5.000 6.84e-07 ***
## south       -0.090904   0.026249  -3.463 0.000558 ***
## urban        0.183912   0.026958   6.822 1.62e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3655 on 927 degrees of freedom
## Multiple R-squared:  0.2526, Adjusted R-squared:  0.2469 
## F-statistic: 44.75 on 7 and 927 DF,  p-value: < 2.2e-16

Holding other factors fixed, the approximate difference in monthly salary between blacks and nonblacks is -18.85%. In other words, black people approximately received 18.85% less in salary in comparison with nonblack people, holding other factors fixed. The p-value indicate that this is a statistically significant difference.

ii)

model2 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban + I(exper^2) + I(tenure^2), data = wage2)
summary(model2)

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban + I(exper^2) + I(tenure^2), data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98236 -0.21972 -0.00036  0.24078  1.25127 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.3586756  0.1259143  42.558  < 2e-16 ***
## educ         0.0642761  0.0063115  10.184  < 2e-16 ***
## exper        0.0172146  0.0126138   1.365 0.172665    
## tenure       0.0249291  0.0081297   3.066 0.002229 ** 
## married      0.1985470  0.0391103   5.077 4.65e-07 ***
## black       -0.1906636  0.0377011  -5.057 5.13e-07 ***
## south       -0.0912153  0.0262356  -3.477 0.000531 ***
## urban        0.1854241  0.0269585   6.878 1.12e-11 ***
## I(exper^2)  -0.0001138  0.0005319  -0.214 0.830622    
## I(tenure^2) -0.0007964  0.0004710  -1.691 0.091188 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3653 on 925 degrees of freedom
## Multiple R-squared:  0.255,  Adjusted R-squared:  0.2477 
## F-statistic: 35.17 on 9 and 925 DF,  p-value: < 2.2e-16

anova(model2, lm(log(wage) ~ educ + exper + tenure + married + black + south + urban, data = wage2))

## Analysis of Variance Table
## 
## Model 1: log(wage) ~ educ + exper + tenure + married + black + south + 
##     urban + I(exper^2) + I(tenure^2)
## Model 2: log(wage) ~ educ + exper + tenure + married + black + south + 
##     urban
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    925 123.42                           
## 2    927 123.82 -2  -0.39756 1.4898  0.226

iii)

model3 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban + educ*black, data = wage2)
summary(model3)

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban + educ * black, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.97782 -0.21832  0.00475  0.24136  1.23226 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.374817   0.114703  46.859  < 2e-16 ***
## educ         0.067115   0.006428  10.442  < 2e-16 ***
## exper        0.013826   0.003191   4.333 1.63e-05 ***
## tenure       0.011787   0.002453   4.805 1.80e-06 ***
## married      0.198908   0.039047   5.094 4.25e-07 ***
## black        0.094809   0.255399   0.371 0.710561    
## south       -0.089450   0.026277  -3.404 0.000692 ***
## urban        0.183852   0.026955   6.821 1.63e-11 ***
## educ:black  -0.022624   0.020183  -1.121 0.262603    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3654 on 926 degrees of freedom
## Multiple R-squared:  0.2536, Adjusted R-squared:  0.2471 
## F-statistic: 39.32 on 8 and 926 DF,  p-value: < 2.2e-16

anova(model3, lm(log(wage) ~ educ + exper + tenure + married + black + south + urban, data = wage2))

## Analysis of Variance Table
## 
## Model 1: log(wage) ~ educ + exper + tenure + married + black + south + 
##     urban + educ * black
## Model 2: log(wage) ~ educ + exper + tenure + married + black + south + 
##     urban
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    926 123.65                           
## 2    927 123.82 -1  -0.16778 1.2565 0.2626

The return to education does not significantly depend on race in this data.

iv)

model4 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban + married:black, data = wage2)
summary(model4)

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban + married:black, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98013 -0.21780  0.01057  0.24219  1.22889 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    5.403793   0.114122  47.351  < 2e-16 ***
## educ           0.065475   0.006253  10.471  < 2e-16 ***
## exper          0.014146   0.003191   4.433 1.04e-05 ***
## tenure         0.011663   0.002458   4.745 2.41e-06 ***
## married        0.188915   0.042878   4.406 1.18e-05 ***
## black         -0.240820   0.096023  -2.508 0.012314 *  
## south         -0.091989   0.026321  -3.495 0.000497 ***
## urban          0.184350   0.026978   6.833 1.50e-11 ***
## married:black  0.061354   0.103275   0.594 0.552602    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3656 on 926 degrees of freedom
## Multiple R-squared:  0.2528, Adjusted R-squared:  0.2464 
## F-statistic: 39.17 on 8 and 926 DF,  p-value: < 2.2e-16

Holding other factors constant, the estimated wage differential between married blacks and married nonblacks is 6.14%. However, since the p-value is 0.5526, this difference is not statistically significant, meaning there is no strong evidence to suggest that the wage differential between married blacks and married nonblacks is different from zero in this sample.

Chapter 8

1. Which of the following are consequences of heteroskedasticity?

i) The OLS estimators, \(\hat{\beta_j}\), are inconsistent

No. Inconsistency is due to correlation between error and regressors, this is due, for example, to omitted variables, measurement error in the regressors, but not to conditional heteroskedasticity.

ii) The usual F statistic no longer has an F distribution

Yes. For the F statistic to have a Fisher-F distribution under the null we require both A.MLR5 (conditional homoskedasticity) and A.MLR6 (errors conditionally normal).

iii) The OLS estimators are no longer BLUE

Yes. B means best, i.e. it means that OLS is the most efficient (i.e. minimum variance) estimator among all linear unbiased estimators

5.

i)

The heteroskedasticity-robust standard errors are generally similar to the usual ones, with slight differences (e.g., for age ,restaurn, and white) , indicating minor heteroskedasticity.

ii)

The coefficient for educ is -0.029. If education increases by 4 years, the estimated probability of smoking decreases by: 4 x 0.029 = 0.116 (approximately an 11.6% reduction in smoking probability).

iii)

At the point where the net effect is zero:

\(age=\frac{0.02}{2\times0.00026}\approx38.46 years\)

After approximately the age of 38, the probability of smoking begins to decrease.

iv)

The coefficient for restaurn is −0.101. This indicates that living in a state with restaurant smoking restrictions reduces the probability of smoking by approximately 10.1%.

v)

Substitute into the equation:

\(\hat{smokes}=0.656-0.069\times\text{log(67.44)}+0.012\times\text{log(6500)}-0.29\times16+0.02\times77-0.00026\times77^2-0.101\times0-0.026\times0\approx0.11\)

The predicted smoking probability is 0.11 (or 11%), it indicates a very low likelihood of smoking for this individual, which aligns with the actual value of 0 for smokes for this person.

C4.

i)

data("vote1")
modelc4 <- lm(voteA ~ prtystrA + democA + lexpendA + lexpendB,data = vote1)
summary(modelc4)

## 
## Call:
## lm(formula = voteA ~ prtystrA + democA + lexpendA + lexpendB, 
##     data = vote1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.576  -4.864  -1.146   4.903  24.566 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.66142    4.73604   7.952 2.56e-13 ***
## prtystrA     0.25192    0.07129   3.534  0.00053 ***
## democA       3.79294    1.40652   2.697  0.00772 ** 
## lexpendA     5.77929    0.39182  14.750  < 2e-16 ***
## lexpendB    -6.23784    0.39746 -15.694  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.573 on 168 degrees of freedom
## Multiple R-squared:  0.8012, Adjusted R-squared:  0.7964 
## F-statistic: 169.2 on 4 and 168 DF,  p-value: < 2.2e-16

The R-squared=0 because the residuals are uncorrelated with the regressors by construction in OLS.

ii)

# Breusch-Pagan test
library(lmtest)
bptest(modelc4)

## 
##  studentized Breusch-Pagan test
## 
## data:  modelc4
## BP = 9.0934, df = 4, p-value = 0.05881

The p-value is slightly above the 0.05 significance level, indicating weak evidence of heteroskedasticity.

iii)

# White test for heteroskedasticity
bptest(modelc4, ~ prtystrA + democA + log(expendA) + log(expendB) + 
         I(prtystrA^2) + I(democA^2) + I(log(expendA)^2) + I(log(expendB)^2), data = vote1)

## 
##  studentized Breusch-Pagan test
## 
## data:  modelc4
## BP = 19.581, df = 7, p-value = 0.00655

The p-value is well below 0.05, providing strong evidence of heteroskedasticity. The White test is more sensitive as it includes squared and interaction terms, capturing more complex patterns of heteroskedasticity.

C13.

i)

data("fertil2")
library(sandwich)
modelc13_1 <- lm(children ~ age + agesq + educ + electric + urban, data = fertil2)

# Usual standard errors
summary(modelc13_1)

## 
## Call:
## lm(formula = children ~ age + agesq + educ + electric + urban, 
##     data = fertil2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9012 -0.7136 -0.0039  0.7119  7.4318 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.2225162  0.2401888 -17.580  < 2e-16 ***
## age          0.3409255  0.0165082  20.652  < 2e-16 ***
## agesq       -0.0027412  0.0002718 -10.086  < 2e-16 ***
## educ        -0.0752323  0.0062966 -11.948  < 2e-16 ***
## electric    -0.3100404  0.0690045  -4.493 7.20e-06 ***
## urban       -0.2000339  0.0465062  -4.301 1.74e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.452 on 4352 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.5734, Adjusted R-squared:  0.5729 
## F-statistic:  1170 on 5 and 4352 DF,  p-value: < 2.2e-16

# Robust standard errors
coeftest(modelc13_1, vcov = vcovHC(modelc13_1, type = "HC1"))

## 
## t test of coefficients:
## 
##                Estimate  Std. Error  t value  Pr(>|t|)    
## (Intercept) -4.22251623  0.24385099 -17.3160 < 2.2e-16 ***
## age          0.34092552  0.01917466  17.7800 < 2.2e-16 ***
## agesq       -0.00274121  0.00035051  -7.8206 6.549e-15 ***
## educ        -0.07523232  0.00630771 -11.9270 < 2.2e-16 ***
## electric    -0.31004041  0.06394815  -4.8483 1.289e-06 ***
## urban       -0.20003386  0.04547093  -4.3992 1.113e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The robust standard errors are not always bigger than the nonrobust ones, for the electric and urban variables

ii)

# Add religious dummies
modelc13_2 <- lm(children ~ age + agesq + educ + electric + urban + spirit + protest + catholic, data = fertil2)

# Nonrobust test
linearHypothesis(modelc13_2, c("spirit = 0", "protest = 0", "catholic = 0"))

## 
## Linear hypothesis test:
## spirit = 0
## protest = 0
## catholic = 0
## 
## Model 1: restricted model
## Model 2: children ~ age + agesq + educ + electric + urban + spirit + protest + 
##     catholic
## 
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1   4352 9176.4                              
## 2   4349 9162.5  3     13.88 2.1961 0.08641 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value for the nonrobust test is 0.08641

# Robust test
linearHypothesis(modelc13_2, c("spirit = 0", "protest = 0", "catholic = 0"), vcov = vcovHC(modelc13_2, type = "HC1"))

## 
## Linear hypothesis test:
## spirit = 0
## protest = 0
## catholic = 0
## 
## Model 1: restricted model
## Model 2: children ~ age + agesq + educ + electric + urban + spirit + protest + 
##     catholic
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F Pr(>F)  
## 1   4352                   
## 2   4349  3 2.1559 0.0911 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value for the robust test is 0.0911.

The p-values of the robust and nonrobust tests indicate that the three religious dummy variables are not jointly significant.

iii)

# Obtain fitted values and residuals
fitted_vals <- fitted(modelc13_2)
residuals_sq <- residuals(modelc13_2)^2

# Regress u^2 on fitted values and fitted values squared
hetero_test <- lm(residuals_sq ~ fitted_vals + I(fitted_vals^2))

# Joint significance test for heteroskedasticity
linearHypothesis(hetero_test, c("fitted_vals = 0", "I(fitted_vals^2) = 0"))

## 
## Linear hypothesis test:
## fitted_vals = 0
## I(fitted_vals^2) = 0
## 
## Model 1: restricted model
## Model 2: residuals_sq ~ fitted_vals + I(fitted_vals^2)
## 
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1   4357 76589                                  
## 2   4355 57436  2     19153 726.11 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The small p-value indicates overwhelming evidence to reject the null hypothesis that the coefficients of fitted_vals and I(fitted_vals^2) are jointly zero. This result confirms that heteroskedasticity is present in the model.

iv)

The test confirms heteroskedasticity, but its practical importance depends on whether robust standard errors significantly change inference. If robust and non-robust results align, the impact is minimal; otherwise, it requires correction.

Chapter 9

1.

Adding \(ceoten^2\) and \(comten^2\) increases \(R^2\) from 0.353 to 0.375. The increase suggests that the quadratic terms improve model fit, indicating possible functional form misspecification in the original model.

5.

The failure of some colleges to report crimes in 1992 may not be exogenous since underreporting could be correlated with unobserved factors like college policies or safety standards. This creates a sample selection issue.

C4.

i)

data("infmrt")
infmrt_90 <- infmrt %>% 
  filter(year == 1990)
summary(lm(infmort ~ lpcinc + lphysic + lpopul + DC, data = infmrt_90))

## 
## Call:
## lm(formula = infmort ~ lpcinc + lphysic + lpopul + DC, data = infmrt_90)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4964 -0.8076  0.0000  0.9358  2.6077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  23.9548    12.4195   1.929  0.05994 .  
## lpcinc       -0.5669     1.6412  -0.345  0.73134    
## lphysic      -2.7418     1.1908  -2.303  0.02588 *  
## lpopul        0.6292     0.1911   3.293  0.00191 ** 
## DC           16.0350     1.7692   9.064 8.43e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.246 on 46 degrees of freedom
## Multiple R-squared:  0.691,  Adjusted R-squared:  0.6641 
## F-statistic: 25.71 on 4 and 46 DF,  p-value: 3.146e-11

The DC dummy (16.035, p<0.001) is highly significant, indicating DC has an infant mortality rate 16.035 units higher than average, controlling for other factors.

ii)

summary(lm(infmort ~ lpcinc + lphysic + lpopul, data = infmrt_90))

## 
## Call:
## lm(formula = infmort ~ lpcinc + lphysic + lpopul, data = infmrt_90)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0811 -1.2064 -0.0521  1.0639  7.9589 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 33.85931   20.42785   1.658  0.10408   
## lpcinc      -4.68466    2.60412  -1.799  0.07845 . 
## lphysic      4.15326    1.51266   2.746  0.00853 **
## lpopul      -0.08782    0.28725  -0.306  0.76116   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.058 on 47 degrees of freedom
## Multiple R-squared:  0.1391, Adjusted R-squared:  0.08413 
## F-statistic: 2.531 on 3 and 47 DF,  p-value: 0.06841

Including DC improves R-squared from 0.139 to 0.691, significantly improving model fit. It changes lphysic to a negative effect, makes lpopul significant, and renders lpcinc insignificant, showing DC’s outlier impact.

C5.

data("rdchem")
rdchem_c5 <- rdchem %>% 
  mutate(sales = sales/1000) %>% 
  mutate(salessq = salessq/1000)
rdchem_without <- rdchem_c5 %>% 
  filter(sales < 39)

i)

#With largest firm
summary(lm(rdintens ~ sales + salessq + profmarg, data = rdchem_c5))

## 
## Call:
## lm(formula = rdintens ~ sales + salessq + profmarg, data = rdchem_c5)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0371 -1.1238 -0.4547  0.7165  5.8522 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  2.059e+00  6.263e-01   3.288  0.00272 **
## sales        3.166e-01  1.389e-01   2.280  0.03041 * 
## salessq     -7.390e-06  3.716e-06  -1.989  0.05657 . 
## profmarg     5.332e-02  4.421e-02   1.206  0.23787   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.774 on 28 degrees of freedom
## Multiple R-squared:  0.1905, Adjusted R-squared:  0.1037 
## F-statistic: 2.196 on 3 and 28 DF,  p-value: 0.1107

#Without largest firm
summary(lm(rdintens ~ sales + salessq + profmarg, data = rdchem_without))

## 
## Call:
## lm(formula = rdintens ~ sales + salessq + profmarg, data = rdchem_without)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0843 -1.1354 -0.5505  0.7570  5.7783 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  1.984e+00  7.176e-01   2.764   0.0102 *
## sales        3.606e-01  2.389e-01   1.510   0.1427  
## salessq     -1.025e-05  1.308e-05  -0.784   0.4401  
## profmarg     5.528e-02  4.579e-02   1.207   0.2378  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.805 on 27 degrees of freedom
## Multiple R-squared:  0.1912, Adjusted R-squared:  0.1013 
## F-statistic: 2.128 on 3 and 27 DF,  p-value: 0.1201

When the largest firm is included, sales have a significant positive effect on R&D intensity (coefficient = 0.3166), while the quadratic term for sales (salessq) is marginally significant. Without the largest firm, the sales coefficient becomes insignificant (0.3606), and the quadratic term is also insignificant. The profit margin (profmarg) is not significant in either model.

The largest firm drives the significance of sales in the model.
Quadratic sales relationship weakens without the largest firm.
Profit margin remains insignificant in both models.

ii)

#With largest firm
summary(rq(rdintens ~ sales + salessq + profmarg, data = rdchem_c5))

## 
## Call: rq(formula = rdintens ~ sales + salessq + profmarg, data = rdchem_c5)
## 
## tau: [1] 0.5
## 
## Coefficients:
##             coefficients lower bd upper bd
## (Intercept)  1.40428      0.87031  2.66628
## sales        0.26346     -0.13508  0.75753
## salessq     -0.00001     -0.00002  0.00000
## profmarg     0.11400      0.01376  0.16427

#Without largest firm
summary(rq(rdintens ~ sales + salessq + profmarg, data = rdchem_without))

## 
## Call: rq(formula = rdintens ~ sales + salessq + profmarg, data = rdchem_without)
## 
## tau: [1] 0.5
## 
## Coefficients:
##             coefficients lower bd upper bd
## (Intercept)  2.61047      0.58936  2.81404
## sales       -0.22364     -0.23542  0.87607
## salessq      0.00002     -0.00003  0.00003
## profmarg     0.07594      0.00578  0.16392

The intercept is significantly lower in the model with the largest firm (1.404 vs. 2.610), indicating that removing the largest firm leads to a higher baseline R&D intensity.
The sales coefficient changes from positive and insignificant (0.263) with the largest firm to negative and slightly significant (-0.224) without it, suggesting that the largest firm had a notable influence on the sales-R&D relationship.
The quadratic term (salessq) shows minimal effect in both models, but the sign flips from negative to positive when the largest firm is excluded, though the confidence intervals for both models include zero.
Profmarg has a stronger positive effect in the model with the largest firm (0.114 vs. 0.076), indicating that profit margins play a larger role in determining R&D intensity when the largest firm is included.

In conclusion, excluding the largest firm changes the direction and significance of the sales coefficient, and results in a higher intercept, suggesting that the largest firm had a distinct impact on the R&D intensity equation.

iii)

LAD is more resilient to outliers than OLS. In the presence of the largest firm, OLS estimates are significantly influenced by the outlier, especially for the sales coefficient. In contrast, LAD reduces the impact of the largest firm, resulting in less significant sales effects and a higher intercept. This shows that LAD is less sensitive to extreme values, making it more robust in the presence of outliers.

Chapter 10

1.

i)

Disagree: Time series observations often exhibit autocorrelation, violating independence.

ii)

Agree: Under the first three Gauss-Markov assumptions, OLS remains unbiased.

iii)

Agree: A trending variable can cause spurious regression and cannot be the dependent variable unless detrended.

iv)

Agree: Seasonality issues are minimized when using annual time series data.

5.

\(\text{housing_starts}_t=\beta_0+\beta_1\text{interest_rate}_t+\beta_2\text{per_capita_income}_t+\beta_3time_t+u_t\)

Where:

\(\text{housing_starts}_t\): Number of housing starts at time t

\(\text{interest_rate}_t\): Interest rate at time t

\(\text{per_capita_income}_t\): Real per capita income at time t

\(time_t\): Time trend variable to account for trends over time

\(u_t\): Error term capturing unexplained variation at time t

C1.

data("intdef")
intdef_c1 <- intdef %>% 
  mutate(after1979 = ifelse(year>1979,1,0))
summary(lm(i3 ~ inf + def + after1979, data = intdef_c1))

## 
## Call:
## lm(formula = i3 ~ inf + def + after1979, data = intdef_c1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4674 -0.8407  0.2388  1.0148  3.9654 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.29623    0.42535   3.047  0.00362 ** 
## inf          0.60842    0.07625   7.979 1.37e-10 ***
## def          0.36266    0.12025   3.016  0.00396 ** 
## after1979    1.55877    0.50577   3.082  0.00329 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.711 on 52 degrees of freedom
## Multiple R-squared:  0.6635, Adjusted R-squared:  0.6441 
## F-statistic: 34.18 on 3 and 52 DF,  p-value: 2.408e-12

Yes, there is a significant shift in the interest rate equation around 1979. The coefficient for after1979 is 1.55877 with a p-value of 0.00329, indicating that, on average, the 3-month T-bill rate increased by 1.56 percentage points after 1979, holding other factors constant. This suggests the effects of the policy change. Additionally, inflation (inf) and federal deficit (def) both significantly affect the interest rate, with higher inflation and deficits leading to higher rates.

C9.

i)

\(\beta_1\) should have a positive sign and \(\beta_2\) should have a negative sign.

(ii)

summary(lm(rsp500 ~ pcip + i3, data = volat))

## 
## Call:
## lm(formula = rsp500 ~ pcip + i3, data = volat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -157.871  -22.580    2.103   25.524  138.137 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18.84306    3.27488   5.754 1.44e-08 ***
## pcip         0.03642    0.12940   0.281   0.7785    
## i3          -1.36169    0.54072  -2.518   0.0121 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 40.13 on 554 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.01189,    Adjusted R-squared:  0.008325 
## F-statistic: 3.334 on 2 and 554 DF,  p-value: 0.03637

Intercept (18.84306): When both predictors (pcip and i3) are zero, the expected return on the S&P 500 (rsp500) is 18.84.
pcip (0.03642): For a 1-unit increase in the percentage change in industrial production (pcip), the return on the S&P 500 (rsp500) is expected to increase by 0.03642, holding i3 constant. The effect is very small and statistically insignificant.
i3 (-1.36169): For a 1% increase in the 3-month T-bill rate (i3), the return on the S&P 500 is expected to decrease by 1.36169, holding pcip constant. This effect is statistically significant.

(iii)

i3 is statistically significant (p-value = 0.0121), while pcip is not (p-value = 0.7785).

(iv)

No, the model has a very low R-squared (0.01189), indicating that the predictors explain only a tiny portion of the variation in the S&P 500 returns. The statistical significance of i3 suggests it has a small, but potentially useful, effect, but overall, the predictability of S&P 500 returns from these variables is weak.

Chapter 11

C7.

data("consump")

i)

modelc7_i <- lm(gc[3:37] ~ gc_1[3:37], data = consump)
summary(modelc7_i)

## 
## Call:
## lm(formula = gc[3:37] ~ gc_1[3:37], data = consump)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.027878 -0.005974 -0.001450  0.007142  0.020227 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 0.011431   0.003778   3.026  0.00478 **
## gc_1[3:37]  0.446133   0.156047   2.859  0.00731 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01161 on 33 degrees of freedom
## Multiple R-squared:  0.1985, Adjusted R-squared:  0.1742 
## F-statistic: 8.174 on 1 and 33 DF,  p-value: 0.007311

Null Hypothesis (H₀): \(\beta_1=0\)
The null hypothesis suggests that there is no relationship between \(gc_t\) and its lagged value \(gc_\text{t-1}\), which would be consistent with the Permanent Income Hypothesis (PIH). According to PIH, the growth rate of consumption at time t should be independent of past consumption growth, implying that past consumption growth does not provide useful information for predicting current consumption growth.
Alternative Hypothesis (H₁): \(\beta_1\ne0\)
The alternative hypothesis suggests that past consumption growth \(gc_\text{t-1}\) is significantly related to current consumption growth \(gc_t\). This would indicate that consumption growth follows a pattern over time, contradicting the PIH.
The coefficient for \(gc_\text{t-1}\) is significant and positive, with a value of 0.44613, suggesting that past consumption growth is a significant predictor of current consumption growth. This rejects the null hypothesis that \(\beta_1=0\), which means the data does not support the Permanent Income Hypothesis (PIH), where consumption growth should be independent of past consumption growth.
The p-value for \(\beta_1\) is 0.00731, which is less than 0.05, indicating strong evidence against the null hypothesis.

Thus, we conclude that there is significant autocorrelation in consumption growth, and the PIH does not hold for this data.

ii)

modelc7_ii <- lm(gc[3:37] ~ gc_1[3:37] + gy_1[3:37] + i3[2:36] + inf[2:36], data = consump)
summary(modelc7_ii)

## 
## Call:
## lm(formula = gc[3:37] ~ gc_1[3:37] + gy_1[3:37] + i3[2:36] + 
##     inf[2:36], data = consump)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0249090 -0.0075867  0.0000855  0.0087231  0.0188620 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.0225944  0.0070892   3.187  0.00335 **
## gc_1[3:37]   0.4335777  0.2896546   1.497  0.14487   
## gy_1[3:37]  -0.1079113  0.1946394  -0.554  0.58340   
## i3[2:36]    -0.0007467  0.0011107  -0.672  0.50653   
## inf[2:36]   -0.0008281  0.0010041  -0.825  0.41606   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01134 on 30 degrees of freedom
## Multiple R-squared:  0.3038, Adjusted R-squared:  0.211 
## F-statistic: 3.273 on 4 and 30 DF,  p-value: 0.02431

The p-values for \(gy_\text{t-1},i3_\text{t-1}, inf_\text{t-1}\) are 0.58, 0.51, and 0.42 respectively. This indicates that these new variables are not individually significant at the 5% level.

anova(modelc7_i, modelc7_ii)

## Analysis of Variance Table
## 
## Model 1: gc[3:37] ~ gc_1[3:37]
## Model 2: gc[3:37] ~ gc_1[3:37] + gy_1[3:37] + i3[2:36] + inf[2:36]
##   Res.Df       RSS Df  Sum of Sq      F Pr(>F)
## 1     33 0.0044447                            
## 2     30 0.0038609  3 0.00058386 1.5122 0.2315

The p-value (0.2315) is greater than 0.05, which means we fail to reject the null hypothesis that the additional variables (\(gy_\text{t-1},i3_\text{t-1}, inf_\text{t-1}\)) do not jointly improve the model’s fit.
The additional variables \(gy_\text{t-1},i3_\text{t-1}, inf_\text{t-1}\) are not jointly significant at the 5% level in explaining the growth of per capita consumption.

iii)

The p-value for \(gc_\text{t-1}\) increased to 0.14487, indicating it is not statistically significant at the 5% level when controlling for \(gy_\text{t-1},i3_\text{t-1}, inf_\text{t-1}\). This suggests that past consumption growth (\(gc_\text{t-1}\)) no longer has a significant effect on current consumption growth. The PIH hypothesis, which suggests that current consumption growth depends on past consumption growth, is not strongly supported by this data, as the additional variables seem to explain consumption growth better.

iv)

modelc7_iv <- lm(gc[3:37] ~ 1, data = consump)
anova(modelc7_iv, modelc7_ii)

## Analysis of Variance Table
## 
## Model 1: gc[3:37] ~ 1
## Model 2: gc[3:37] ~ gc_1[3:37] + gy_1[3:37] + i3[2:36] + inf[2:36]
##   Res.Df       RSS Df Sum of Sq      F  Pr(>F)  
## 1     34 0.0055456                              
## 2     30 0.0038609  4 0.0016848 3.2728 0.02431 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value (0.02431) is less than 0.05, we reject the null hypothesis and conclude that the four explanatory variables (\(gc_\text{t-1},gy_\text{t-1},i3_\text{t-1}, inf_\text{t-1}\)) are jointly significant at the 5% level.

This supports the PIH hypothesis, but the additional variables suggest that factors beyond past consumption growth contribute to explaining current consumption growth.

C12.

i)

data("minwage")
minwage232 <- minwage %>% 
  select(gwage232, gemp232, gmwage, gcpi) %>% 
  na.omit()
acf(minwage232$gwage232)

The ACF plot suggests that the gwage232 series appear to be weakly dependent

(ii)

summary(lm(gwage232[2:nrow(minwage232)] ~ gwage232[1:(nrow(minwage232)-1)] + gmwage[2:nrow(minwage232)] + gcpi[2:nrow(minwage232)], data = minwage232))

## 
## Call:
## lm(formula = gwage232[2:nrow(minwage232)] ~ gwage232[1:(nrow(minwage232) - 
##     1)] + gmwage[2:nrow(minwage232)] + gcpi[2:nrow(minwage232)], 
##     data = minwage232)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.044642 -0.004134 -0.001312  0.004482  0.041612 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         0.0024003  0.0004308   5.572 3.79e-08 ***
## gwage232[1:(nrow(minwage232) - 1)] -0.0779092  0.0342851  -2.272  0.02341 *  
## gmwage[2:nrow(minwage232)]          0.1518459  0.0096485  15.738  < 2e-16 ***
## gcpi[2:nrow(minwage232)]            0.2630876  0.0824457   3.191  0.00149 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007889 on 606 degrees of freedom
## Multiple R-squared:  0.2986, Adjusted R-squared:  0.2951 
## F-statistic: 85.99 on 3 and 606 DF,  p-value: < 2.2e-16

The results from the regression indicates that an increase in the federal minimum wage result in a contemporaneous increase in gwage232. The very small value of p-value supports the conclusion.

(iii)

summary(lm(gwage232[2:nrow(minwage232)] ~ gwage232[1:(nrow(minwage232)-1)] + gmwage[2:nrow(minwage232)] + gcpi[2:nrow(minwage232)] + gemp232[1:(nrow(minwage232)-1)] , data = minwage232))

## 
## Call:
## lm(formula = gwage232[2:nrow(minwage232)] ~ gwage232[1:(nrow(minwage232) - 
##     1)] + gmwage[2:nrow(minwage232)] + gcpi[2:nrow(minwage232)] + 
##     gemp232[1:(nrow(minwage232) - 1)], data = minwage232)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.043842 -0.004378 -0.001034  0.004321  0.042548 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         0.002451   0.000426   5.753  1.4e-08 ***
## gwage232[1:(nrow(minwage232) - 1)] -0.074546   0.033901  -2.199 0.028262 *  
## gmwage[2:nrow(minwage232)]          0.152707   0.009540  16.007  < 2e-16 ***
## gcpi[2:nrow(minwage232)]            0.252296   0.081544   3.094 0.002066 ** 
## gemp232[1:(nrow(minwage232) - 1)]   0.066131   0.016962   3.899 0.000108 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007798 on 605 degrees of freedom
## Multiple R-squared:  0.3158, Adjusted R-squared:  0.3112 
## F-statistic:  69.8 on 4 and 605 DF,  p-value: < 2.2e-16

The coefficient for the variable is statistically significant.

(iv)

summary(lm(gwage232[2:nrow(minwage232)] ~ gmwage[2:nrow(minwage232)] + gcpi[2:nrow(minwage232)], data = minwage232))

## 
## Call:
## lm(formula = gwage232[2:nrow(minwage232)] ~ gmwage[2:nrow(minwage232)] + 
##     gcpi[2:nrow(minwage232)], data = minwage232)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.044464 -0.004095 -0.001352  0.004545  0.041188 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                0.0021904  0.0004222   5.188  2.9e-07 ***
## gmwage[2:nrow(minwage232)] 0.1505574  0.0096648  15.578  < 2e-16 ***
## gcpi[2:nrow(minwage232)]   0.2427430  0.0822388   2.952  0.00328 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007916 on 607 degrees of freedom
## Multiple R-squared:  0.2926, Adjusted R-squared:  0.2903 
## F-statistic: 125.5 on 2 and 607 DF,  p-value: < 2.2e-16

The estimate for the coefficient of the gmwage variable for the with and without lags models are 0.152707 and 0.1505574 respectively. Adding the two lagged variables does not have much of an effect on the gmwage coefficient.

(v)

summary(lm(gmwage[2:nrow(minwage232)] ~ gwage232[1:(nrow(minwage232)-1)] + gemp232[1:(nrow(minwage232)-1)] , data = minwage232))

## 
## Call:
## lm(formula = gmwage[2:nrow(minwage232)] ~ gwage232[1:(nrow(minwage232) - 
##     1)] + gemp232[1:(nrow(minwage232) - 1)], data = minwage232)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.01914 -0.00500 -0.00379 -0.00287  0.62208 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                         0.003433   0.001440   2.384   0.0174 *
## gwage232[1:(nrow(minwage232) - 1)]  0.203167   0.143140   1.419   0.1563  
## gemp232[1:(nrow(minwage232) - 1)]  -0.041706   0.072110  -0.578   0.5632  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03318 on 607 degrees of freedom
## Multiple R-squared:  0.00392,    Adjusted R-squared:  0.0006377 
## F-statistic: 1.194 on 2 and 607 DF,  p-value: 0.3036

The R-squared when running the regression of gmwage on the lagged variables gwage232 and gemp232 is 0.00392, suggesting that the variations in the gmwage variable is weakly correlated with the lagged variables gwage232 and gemp232.

Chapter 12

C11.

i)

data("nyse")
modelc11_i <- lm(return ~ return_1, data = nyse)
residuals <- modelc11_i$residuals
squared_residuals <- residuals^2

# Calculate the average, minimum, and maximum of squared residuals
avg_squared_residual <- mean(squared_residuals, na.rm = TRUE)
min_squared_residual <- min(squared_residuals, na.rm = TRUE)
max_squared_residual <- max(squared_residuals, na.rm = TRUE)

# Output the results
avg_squared_residual

## [1] 4.440839

min_squared_residual

## [1] 7.35465e-06

max_squared_residual

## [1] 232.8946

ii)

nyse_ii <- nyse %>% 
  select(return, return_1) %>% 
  na.omit() %>% 
  mutate(return_1_sq = return_1^2)
nyse_ii$residual_sq <- squared_residuals
summary(lm(residual_sq ~ return_1 + return_1_sq, data = nyse_ii))

## 
## Call:
## lm(formula = residual_sq ~ return_1 + return_1_sq, data = nyse_ii)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.459  -3.011  -1.975   0.676 221.469 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.25734    0.44085   7.389 4.32e-13 ***
## return_1    -0.78946    0.19569  -4.034 6.09e-05 ***
## return_1_sq  0.29666    0.03552   8.351 3.75e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.66 on 686 degrees of freedom
## Multiple R-squared:  0.1303, Adjusted R-squared:  0.1278 
## F-statistic:  51.4 on 2 and 686 DF,  p-value: < 2.2e-16

iii)

delta_0 <- 3.25734
delta_1 <- -0.78946
delta_2 <- 0.29666
f = function(x) {
  delta_0 + delta_1*x + delta_2*x^2
}

x = 0:20
plot(x, f(x), type = 'l') 
abline(h = 0)
abline(v = 0)

find.vertex = function(delta_2, delta_1, delta_0) {
  x_vertex = -delta_1/(2 * delta_2)
  y_vertex = f(x_vertex)
  c(x_vertex, y_vertex)
}
V = find.vertex(delta_2, delta_1, delta_0)
V

## [1] 1.33058 2.73212

When \(return_\text{t-1}\) is 1.33058, the variance is the smallest, at 2.73212.

iv)

Since the smallest value of the predicted variance is 2.73212, the model does not produce any negative variance estimates.

v)

The R-squared for the model in part (ii) is 0.1303, higher than that of the R-squared value than the ARCH(1) model (at 0.114). This suggests that the model in part (ii) seems to fit slightly better than the ARCH(1) model.

vi)

summary(lm(residual_sq[3:689] ~ residual_sq[2:688] + residual_sq[1:687], data = nyse_ii))

## 
## Call:
## lm(formula = residual_sq[3:689] ~ residual_sq[2:688] + residual_sq[1:687], 
##     data = nyse_ii)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.934  -3.298  -2.158   0.600 224.296 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2.82950    0.45495   6.219 8.69e-10 ***
## residual_sq[2:688]  0.32284    0.03820   8.450  < 2e-16 ***
## residual_sq[1:687]  0.04179    0.03820   1.094    0.274    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.76 on 684 degrees of freedom
## Multiple R-squared:  0.1151, Adjusted R-squared:  0.1125 
## F-statistic: 44.47 on 2 and 684 DF,  p-value: < 2.2e-16

The p-value of the coefficient for the second lag is 0.274, indicating that it’s not statistically significant. The R-squared for the model is 0.1151, lower than that of the model in part (ii), suggesting that it does not fit better than the model in part (ii).

Final Exam

PhamMinhTam

2025-01-03

Chapter 7

1.

i)

ii)

iii)

3.

i)

ii)

iii)

iv)

C1

i)

ii)

iii)

C2

i)

ii)

iii)

iv)

Chapter 8

1. Which of the following are consequences of heteroskedasticity?

i) The OLS estimators, \(\hat{\beta_j}\), are inconsistent

ii) The usual F statistic no longer has an F distribution

iii) The OLS estimators are no longer BLUE

5.

i)

ii)

iii)

iv)

v)

C4.

i)

ii)

iii)

C13.

i)

ii)

iii)

iv)

Chapter 9

1.

5.

C4.

i)

ii)

C5.

i)

ii)

iii)

Chapter 10

1.

i)

ii)

iii)

iv)

5.

C1.

C9.

i)

(ii)

(iii)

(iv)

Chapter 11

C7.

i)

ii)

iii)

iv)

C12.

i)

(ii)

(iii)

(iv)

(v)

Chapter 12

C11.

i)