Ejercicio 1

Consider a linear model to explain monthly beer consumption:

Write the transformed equation that has a homoscedastic error term. Show this is homoscedastic. Ejercicio realizado manualmente

Ejercicio 2

Using the data in GPA3, the following equation was estimated for the fall and second semester students:

Here, trmgpa is term GPA, crsgpa is a weighted average of overall GPA in courses taken, cumgpa is GPA prior to the current semester, tothrs is total credit hours prior to the semester, sat is SAT score, hsperc is graduating percentile in high school class, female is a gender dummy, and season is a dummy variable equal to unity if the student’s sport is in season during the fall. The usual and heteroskedasticity-robust standard errors are reported in parentheses and brackets, respectively.

Do the variables crsgpa, cumgpa, and tothrs have the expected estimated effects? Which of these variables are statistically significant at the 5% level? Does it matter which standard errors are used?

Tiene sentido el promedio del GPA de los cursos tomados sean altamente relacionados con el GPA del semestre actual. Sin embargo, es extraña la estimación del efecto del GPA del curso anterior en el GPA del semestre en curso. Deberían estar, también, altamente relacionados si se supone cierta constante en el rendimiento académico de la muestra. Mismo argumento para la extraña estimación encontrada en las horas acreditadas del semestre pasado en el GPA del semestre en curso. Las variables estadísiticaente significativas al 5% de confianza son: hsperc (en este caso no importa cuál se se usó. en el cao de la variable season, pasa a no ser estadísticamente difrerente de 0 a ser estadísiticamente significativa al 5% cuando se usan los errores estándar robustos.

Why does the hypothesis H0: crsgpa = 1 make sense? Test this hypothesis against the two-sided alternative at the 5% level, using both standard errors. Describe your conclusions.

Para saber si se cumple una evitente relación que debería suceder, que es una correlación positiva perfecta entre term GPA y el promedio del GPA de los cursos tomados. Realmente la estimación indica que no es estadísitcamente distinta de 0, ya que se obtiene un estadístico t igual a -0.571 usando \(se\) comunes e igual a -0.602 usando los \(se\) robustos.

Test whether there is an in-season effect on term GPA, using both standard errors. Does the significance level at which the null can be rejected depend on the standard error used?

Si depende, en un caso no se rechaza (se usuales) y en otro, se rechaza a un nivel de 5% de confianza.

Ejercicio 3

Consider a model at the employee level, where the unobserved variable fi is a “firm effect” to each employee at a given firm i. The error term vi,e is specific to employee e at firm i. The composite error is ui,e = fi + vi,e, such as in equation (8.28) in Wooldridge 7ed.

Ejercicio realizado manualmente.

Ejercicio 4

Use OLS to estimate the model relating Wage with education, experience, and tenure.

library(wooldridge)
est4<-lm(wage ~ educ+exper+tenure,data = wage1)
summary(est4)

## 
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = wage1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6068 -1.7747 -0.6279  1.1969 14.6536 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.87273    0.72896  -3.941 9.22e-05 ***
## educ         0.59897    0.05128  11.679  < 2e-16 ***
## exper        0.02234    0.01206   1.853   0.0645 .  
## tenure       0.16927    0.02164   7.820 2.93e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.084 on 522 degrees of freedom
## Multiple R-squared:  0.3064, Adjusted R-squared:  0.3024 
## F-statistic: 76.87 on 3 and 522 DF,  p-value: < 2.2e-16

#Calculo de residuos necesario para estadístico F
resid<-summary(est4)$residuals
#Regresión de residuos sobre variables explicatorias
est4b<-lm(resid^2 ~ educ+exper+tenure,data=wage1)
rsq2<-summary(est4b)$r.squared
#Cálculo de estadístico
Fstat <- (rsq2/3) *((526 - 3 - 1)/(1-rsq2))
print(Fstat)

## [1] 15.5282

#Cálculo por medio de LM
LM<-526*rsq2
print(LM)

## [1] 43.09562

Dados los estadísticos muy elevando, hay evidencia para rechazar homosedasticidad en la regresión original.

Show that, what you get in b. equals: (where est4 is the object where you save your first regression)

library(lmtest)
bptest(est4)

## 
##  studentized Breusch-Pagan test
## 
## data:  est4
## BP = 43.096, df = 3, p-value = 2.349e-09

Compute manually the alternative White test. In the regression of U-hat2 on the predicted values and the squared predicted values for wage coming from a.

fittedvalues <- (predict(est4))
fittedvalues2 <- (predict(est4))^2

est4d <- lm(resid^2 ~ fittedvalues + fittedvalues2, data = wage1)
rsq2.0<-summary(est4d)$r.squared
LM2<-526*rsq2.0
print(LM2)

## [1] 51.7765

Show that your results are the same using lmtest in R studio.

bptest(est4d)

## 
##  studentized Breusch-Pagan test
## 
## data:  est4d
## BP = 8.6797, df = 2, p-value = 0.01304

No da igual!

Use solo educación como predictor e interprete beta-hat ¿es heteroscedástica la relación? Interprete el coeficiente de educ.

est4f<-lm(wage ~ educ,data = wage1)
bptest(est4f)

## 
##  studentized Breusch-Pagan test
## 
## data:  est4f
## BP = 15.306, df = 1, p-value = 9.144e-05

El test BP indica heterosedasticidad.

Luego suponga que \(Var(U_i|X_k)=\sigma^2*educ_i\) Obtenga los estimadores de \(\beta_k\) con WLS ¿se soluciona el problema de heteroscedasticidad encontrado en inciso f. al 95% de confianza? Interprete el coeficiente de educ.

#Calculamos nuevas variables necesarias
wt<-sqrt(wage1$educ)

#Agregamos al df
wage1$educ_wt<-wage1$educ/wt
wage1$wage_wt<-wage1$wage/wt

#Calculamos regresión
est4g<-lm(wage_wt ~educ_wt,data = wage1 )

#Analizamos heterosedasticidad
bptest(est4g)

## 
##  studentized Breusch-Pagan test
## 
## data:  est4g
## BP = 4.141, df = 1, p-value = 0.04186

Aún no se rechaza que existe homosedasticidad, no obtante, esta vez es menos heterosedasticidad.

Verify that the fitted values from d. are all positive and obtain the weighted least squares after estimating hi (i.e. FGLS)

sum(est4d$coefficients<=0) #Note que si hay valores predichos negativos

## [1] 1

#FGLS
h<-log(resid^2) #obtenemos h
reg4h<-lm(h ~ educ+exper+tenure,dat=wage1) #regresión
fitted.values3<-predict(reg4h) #valores ajustados de la regresión
h_hat<-exp(fitted.values3) #valores ajustados de h
wt_h<-1/h_hat #inversa para usar como peso

#FGLS
reg4h.1<-lm(wage ~educ+exper+tenure,data=wage1, weights = wt_h)
summary(reg4h.1)

## 
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = wage1, weights = wt_h)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5426 -1.3984 -0.4874  0.9856 12.5386 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.59554    0.41407   1.438  0.15096    
## educ         0.29689    0.03161   9.392  < 2e-16 ***
## exper        0.03186    0.00930   3.426  0.00066 ***
## tenure       0.15022    0.02435   6.169 1.38e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.147 on 522 degrees of freedom
## Multiple R-squared:  0.2101, Adjusted R-squared:  0.2055 
## F-statistic: 46.27 on 3 and 522 DF,  p-value: < 2.2e-16

Conduct a white test for the FGLS estimation and report your results.

library(whitestrap)

## 
## Please cite as:

## Lopez, J. (2020), White's test and Bootstrapped White's test under the methodology of Jeong, J., Lee, K. (1999) package version 0.0.1

white_test(reg4h.1)

## White's test results
## 
## Null hypothesis: Homoskedasticity of the residuals
## Alternative hypothesis: Heteroskedasticity of the residuals
## Test Statistic: 51.36
## P-value: 0

Estimate FGLS with heteroskedastic-robust white-Huber SEs. Report your results and discuss why you would like to estimate this? Are the SE too different from those obtained in h. why?

library(sandwich)
lm9 <- lm(wage ~ educ + exper + tenure, data = wage1, weights = wt_h)

coeftest(lm9,
         vcov=vcovHC(lm9, type = "HC0"))

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 0.5955445  0.6159974  0.9668 0.3340932    
## educ        0.2968899  0.0461869  6.4280 2.920e-10 ***
## exper       0.0318644  0.0091421  3.4855 0.0005327 ***
## tenure      0.1502191  0.0279000  5.3842 1.102e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Los se son diferentes, ya que en este caso nos aseguramos de tener heterosedasticidad robusta, para una mejor estimación.

Ejercicio 5

# generate heteroskedastic data 
library(ggplot2)
library(car)
X <- 1:1000
Y <- rnorm(n = 1000, mean = X, sd = 0.6 * X)
df<-data.frame(X,Y)

# plot the data and regresion
ggplot(df, aes(x=X,y=Y))+geom_smooth(method=lm)+geom_point()

#Code:
t <- c()
t.rob <- c()

# loop sampling and estimation
for (i in 1:10000) {
  
  X <- 1:1000
  Y <- rnorm(n = 1000, mean = X, sd = 0.6 * X)
  
  # estimate regression model
  reg <- lm(Y ~ X)
  
  # homoskedasdicity-only significance test
  t[i] <- linearHypothesis(reg, "X = 1")$'Pr(>F)'[2] < 0.05
  
  # robust significance test
  t.rob[i] <- linearHypothesis(reg, "X = 1", white.adjust = "hc1")$'Pr(>F)'[2] < 0.05
  
}

# compute the fraction of false rejections
round(cbind(t = mean(t), t.rob = mean(t.rob)), 3)

##          t t.rob
## [1,] 0.076 0.052

Describa brevemente qué es lo que está realizando el código

En un inicio muesta la gráfica de una regresión con heterosedasticidad. También genera pruebas de hipótesis sobre la heterosedasticidad del modelo, se encuentra que sin corregir estre problema, aumentan los errores donde se rechaza la hopótesis incorrectamente más que cuendo se corrigen.

Cambie n a 100 ¿cuál es el resultado en la fracción de rechazos falsos aun con SE robustos? ¿Qué puede concluir sobre los errores estándar robustos?
Describa formalmente el procedimiento para obtener WLS con estos datos.

Este procedimiento aplica únicamente para modelos que presentan heterocedasticidad, asumiendo que se cumple de MRL.1 hasta MLR.4 donde debe cumplirse \(Var(u_{i}lx_{i})=\sigma^2h(x_{i})\), tal que \(Var(\frac{u_{i}}{\sqrt h_{i}})=\sigma^2\)

Ejercicio 6

We estimate a linear probability model for whether a young man was arrested during 1986:

Using the data in CRIME1, estimate this model by OLS and verify that all fitted values are strictly between zero and one. What are the smallest and largest fitted values?

data("crime1")
lm10 <- lm(narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86, data = crime1)

Checamos mayor y menor fitted values:

min(lm10$fitted.values)

## [1] 0.08373395

max(lm10$fitted.values)

## [1] 0.9868572

Etimate the equation by weighted least squares, as discussed in Section 8-5.

lm11 <- lm(log(lm10$fitted.values, base = exp(1)) ~ pcnv + avgsen + tottime + ptime86 + qemp86, data = crime1)

exp12 <- exp(lm11$fitted.values)
weights <- 1/exp12
lm12 <- lm(narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86, data = crime1, weights = weights)

summary(lm12)

## 
## Call:
## lm(formula = narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86, 
##     data = crime1, weights = weights)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8320 -0.6823 -0.5477  0.3873 15.4894 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.750823   0.036963  20.313  < 2e-16 ***
## pcnv        -0.180890   0.033111  -5.463 5.10e-08 ***
## avgsen      -0.002865   0.013088  -0.219    0.827    
## tottime      0.007274   0.010480   0.694    0.488    
## ptime86     -0.046157   0.006537  -7.061 2.09e-12 ***
## qemp86      -0.114281   0.010444 -10.943  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.267 on 2719 degrees of freedom
## Multiple R-squared:  0.05656,    Adjusted R-squared:  0.05483 
## F-statistic:  32.6 on 5 and 2719 DF,  p-value: < 2.2e-16

Use the FGLS estimates to determine whether avgsen and tottime are jointly significant at the 5% level.

#Modelo No-Restringido:
lm12 <- lm(narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86, data = crime1,
           weights = weights)

#Modelo Restringido:
lm13 <- lm(narr86 ~ pcnv + ptime86 + qemp86, data = crime1, weights = weights)

F_Stat <- ((summary(lm12)$r.squared-summary(lm13)$r.squared)/2) /
  ((1-summary(lm12)$r.squared)/(2725-5-1))
#F_5,2725-5-1 = 2.21
F_Stat > 2.21

## [1] FALSE

Por lo tanto, no son significativas en conjunto al 5%

Actividad 5: Heterocedasticidad

Víctor Escandón

2023-10-24