\(~\)

Question 1 - Draw \(x_1\sim\mathbb{N}(2,3)\) and \(\text{aux}\sim \mathbb{N}(0,1)\) \(n=10000\) times and generate \(u=2 x_1 \times aux\) and \(y=2+3x_1+u\) :

\(~\)

library(tidyverse)
library(kableExtra)
library(skedastic)

# Generating the data

x_1 <- rnorm(n = 10000, mean = 2, sd = 3)
aux <- rnorm(n = 10000, mean = 0, sd = 1)

# Computing y and u

u <- 2*x_1*aux
y <- 2 + 3*x_1 + u

# Building the dataset 

data <- cbind(y, x_1, aux, u)

# Printing the output

head(data) %>% 
  kbl(booktabs = T) %>%
  kable_styling(latex_options = "striped", full_width = T) 
y x_1 aux u
5.210574 0.9039949 0.2757700 0.4985893
4.425891 1.1728867 -0.4658461 -1.0927694
-1.743588 -3.5699377 -0.9756788 6.9662254
2.194987 4.3637542 -1.4776583 -12.8962755
29.207952 5.9519592 0.7856299 9.3520746
19.738861 4.6820141 0.3943622 3.6928190

\(~\)

Question 2 - Regress \(y\) on \(x_1\). Show a scatter plot of the regression residuals on \(x_1\).What do you observe? :

\(~\)

# Building the model

model_1 <- lm(y ~ x_1, data = as.data.frame(data))

# Extracting the residuals 

residuals <- residuals(model_1) 

# Plotting the residuals 

residuals_plot <- as.data.frame(residuals) %>%
  ggplot(aes(y = x_1, x = residuals), color = V1) +
  geom_point(color = '#7DD181') + 
  theme(panel.background = 
            element_rect(fill ="white"),
          axis.line = element_line(size = 0.5, 
                                   colour ="gray",
                                   linetype=1)) +
  xlab("Residuals")

# Printing the output

show(residuals_plot)

\(~\)

Question 3 - The White test aims at testing for heteroscedasticy. Its null hypthesis is \(H_0 : \alpha_i = \alpha_j\), \(\forall (i,j)\), \(j\). Follow these steps to perform a White test :

\(~\)

Question 3.a - Obtain predicited values \(\hat{y}_i\) and residuals \(\hat{u}_i\) from regression in 2. :

\(~\)

# Extracting the fitted values

fitted_y <- fitted(model_1)
fitted_u <- resid(model_1)
fitted_y_2 <- fitted(model_1)^2
fitted_u_2 <- resid(model_1)^2

# Adding to the dataset 

data_1 <- cbind(y, x_1, aux, u, fitted_y, fitted_u, fitted_y_2, fitted_u_2)

# Printing the output

head(data_1) %>% 
  kbl(booktabs = T) %>%
  kable_styling(latex_options = "striped", full_width = T) 
y x_1 aux u fitted_y fitted_u fitted_y_2 fitted_u_2
5.210574 0.9039949 0.2757700 0.4985893 4.786149 0.4244255 22.90722 0.180137
4.425891 1.1728867 -0.4658461 -1.0927694 5.589722 -1.1638315 31.24499 1.354504
-1.743588 -3.5699377 -0.9756788 6.9662254 -8.584042 6.8404541 73.68578 46.791813
2.194987 4.3637542 -1.4776583 -12.8962755 15.125518 -12.9305304 228.78128 167.198617
29.207952 5.9519592 0.7856299 9.3520746 19.871812 9.3361399 394.88892 87.163508
19.738861 4.6820141 0.3943622 3.6928190 16.076626 3.6622352 258.45791 13.411967

\(~\)

Question 3.b - Estimate the following model \(\hat{u}_i^2 = \delta_0 + \delta_1 \hat{y}_i + \delta_2 \hat{y}_i^2\) :

\(~\)

# Computing model 2 

model_2 <- lm(fitted_u_2 ~ fitted_y + fitted_y_2, data = as.data.frame(data_1))

# Summary 

summary(model_2)
## 
## Call:
## lm(formula = fitted_u_2 ~ fitted_y + fitted_y_2, data = as.data.frame(data_1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -720.35  -29.39   -4.07    1.45 2648.44 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.07467    1.66408   1.848   0.0647 .  
## fitted_y    -2.17598    0.21906  -9.933   <2e-16 ***
## fitted_y_2   0.46508    0.01089  42.692   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 123.7 on 9997 degrees of freedom
## Multiple R-squared:  0.2418, Adjusted R-squared:  0.2417 
## F-statistic:  1594 on 2 and 9997 DF,  p-value: < 2.2e-16
# Storing the R^2 

R_2 <- summary(model_2)$r.squared

\(~\)

Question 3.c - Under \(H_0\), the \(nR_\epsilon^2\) statistic follows a \(\chi^2(2)\). Display the test threshold at the \(5\%\) level. What

can you conclude?

\(~\)

# Custom White's test

nR_2 <- 10000*R_2
chi.stat <- dchisq(nR_2, df = 2)
p.value <- round(chi.stat, digits = 5)

# Results

tibble(nR_2, chi.stat, p.value) %>% 
  kbl(booktabs = T) %>%
  kable_styling(latex_options = "striped", full_width = T) 
nR_2 chi.stat p.value
2418.171 0 0

\(~\)

As the p-value is below the level of significance (alpha); the null hypothesis of homoskedasticity is rejected.

\(~\)

Question 3.d - Perform the test directly using command imtest in Stata

\(~\)

# skedastic function

white_lm(model_1) %>% 
  kbl(booktabs = T) %>%
  kable_styling(latex_options = "striped", full_width = T) 
statistic p.value parameter method alternative
2418.171 0 2 White’s Test greater

\(~\)

Question 4 - How can you account for heteroskedasticity directly in the original regression? Perform the regression with the relevant option. Are coefficients’s standard error larger or smaller when ac- counting for heteroskedasticity?

\(~\)

# Feasible least square estimation

fgls_model <- lm(y ~ x_1, data = as.data.frame(data), weights =1/model_1$residuals^2)

# output

summary(fgls_model)
## 
## Call:
## lm(formula = y ~ x_1, data = as.data.frame(data), weights = 1/model_1$residuals^2)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1078 -1.0002 -0.9997  0.9998  1.0593 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.0847526  0.0004988    4179   <2e-16 ***
## x_1         2.9886739  0.0005330    5607   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1 on 9998 degrees of freedom
## Multiple R-squared:  0.9997, Adjusted R-squared:  0.9997 
## F-statistic: 3.144e+07 on 1 and 9998 DF,  p-value: < 2.2e-16
# var-cov matrix : fgls_model

vcov(fgls_model)
##               (Intercept)           x_1
## (Intercept)  2.488374e-07 -5.025373e-08
## x_1         -5.025373e-08  2.840740e-07
# var-cov matrix : model_1

vcov(model_1)
##              (Intercept)           x_1
## (Intercept)  0.007477413 -0.0011379232
## x_1         -0.001137923  0.0005821868

\(~\)

The feasible least square estimation has manage to reduce the the standard error.