Abstract
This is an undergrad student level exercise for class use.This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
License: CC BY-SA 4.0
Sugestão de citação: FIGUEIREDO, Adriano Marcos Rodrigues. Econometria: Salários de advogados - Wooldridge(exemplo 7.8) adaptado de MOHR (2018). Campo Grande-MS,Brasil: RStudio/Rpubs, 2019. Disponível em http://rpubs.com/amrofi/wooldridge_ex7_8.
O dataset contém dados sobre a mediana dos salários iniciais dos formados em Direito. Uma variável explicativa é a classificação da faculdade cursada. O grupo base é das faculdades com rank maior que 100.
O data.frame contém 156 observações de 21 variáveis. O dataset está no pacote wooldridge
acessado fazendo data("lawsch85")
:
salary
: median starting salaryLSAT
: median LSAT (Law School Admission Test) scoreGPA
: median college GPA (Grade Point Average)rank
: law school rankingcost
: law school costlibvol
: no. volumes in lib., 1000slsalary
: log(salary)top10
: =1 if ranked in top 10r11_25
: =1 if ranked 11-25r26_40
: =1 if ranked 26-40r41_60
: =1 if ranked 41-60llibvol
: log(libvol)lcost
: log(cost)library(wooldridge)
data("lawsch85")
# Gerar dummies - Não era necessário, pois já estavam no dataset, mas
# criamos abaixo para ver como o dataset contem uma variável rank para os
# salarios (salary)
top10 <- as.numeric(lawsch85$rank <= 10)
r11.25 <- as.numeric(lawsch85$rank >= 11 & lawsch85$rank <= 25)
r26.40 <- as.numeric(lawsch85$rank >= 26 & lawsch85$rank <= 40)
r41.60 <- as.numeric(lawsch85$rank >= 41 & lawsch85$rank <= 60)
r61.100 <- as.numeric(lawsch85$rank >= 61 & lawsch85$rank <= 100)
attach(lawsch85)
dados <- as.data.frame(cbind(salary, rank, LSAT, GPA, libvol, cost, top10, r11_25,
r26_40, r41_60, r61.100))
detach(lawsch85)
attach(dados)
mod1 <- lm(log(salary) ~ log(rank) + log(LSAT) + log(GPA) + log(libvol) + log(cost),
data = dados)
mod2 <- lm((salary) ~ (rank) + (LSAT) + (GPA) + (libvol) + (cost), data = dados)
mod3 <- lm(log(salary) ~ top10 + r11_25 + r26_40 + r41_60 + r61.100 + log(LSAT) +
log(GPA) + log(libvol) + log(cost), data = dados)
mod1$AIC <- AIC(mod1)
mod2$AIC <- AIC(mod2)
mod3$AIC <- AIC(mod3)
mod1$BIC <- BIC(mod1)
mod2$BIC <- BIC(mod2)
mod3$BIC <- BIC(mod3)
library(stargazer)
star.1 <- stargazer(mod1, mod2, mod3, title = "Título: Resultados das Regressões",
column.labels = c("MQO-mod1", "MQO-mod2", "MQO-mod3"), align = TRUE, type = "text",
keep.stat = c("aic", "bic", "rsq", "adj.rsq", "n"))
Título: Resultados das Regressões
=========================================================
Dependent variable:
-------------------------------------
log(salary) (salary) log(salary)
MQO-mod1 MQO-mod2 MQO-mod3
(1) (2) (3)
---------------------------------------------------------
log(rank) -0.227***
(0.019)
top10 0.700***
(0.053)
r11_25 0.593***
(0.039)
r26_40 0.374***
(0.034)
r41_60 0.262***
(0.028)
r61.100 0.131***
(0.021)
log(LSAT) 0.726 0.885*
(0.559) (0.472)
log(GPA) 0.191 0.057
(0.282) (0.239)
log(libvol) 0.017 0.036
(0.033) (0.026)
log(cost) 0.008 0.001
(0.029) (0.025)
rank -115.815***
(16.153)
LSAT 155.286
(192.598)
GPA 14,500.240***
(4,357.070)
libvol 12.956***
(3.261)
cost 0.366**
(0.146)
Constant 7.383*** -33,124.550 5.562**
(2.577) (25,210.760) (2.141)
---------------------------------------------------------
Observations 136 136 136
R2 0.870 0.806 0.911
Adjusted R2 0.865 0.798 0.905
Akaike Inf. Crit. -227.844 2,737.692 -271.086
Bayesian Inf. Crit. -207.455 2,758.080 -239.046
=========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
library(corrplot)
corel <- cor(dados)
corrplot(corel, method = "number")
library(car)
reg1.vif <- vif(mod1)
reg1.vif
log(rank) log(LSAT) log(GPA) log(libvol) log(cost)
4.676860 3.456882 3.636916 2.475024 1.621828
reg2.vif <- vif(mod2)
reg2.vif
rank LSAT GPA libvol cost
2.777994 3.469461 3.264910 1.735231 1.558141
reg3.vif <- vif(mod3)
reg3.vif
top10 r11_25 r26_40 r41_60 r61.100 log(LSAT)
3.488650 2.635882 1.730471 1.589780 1.542470 3.478267
log(GPA) log(libvol) log(cost)
3.693818 2.225695 1.653360
Conclusão: Não tenho evidências de multicolinearidade prejudicial ao modelo.
lmtest::bptest(mod3, ~log(LSAT) + log(GPA) + log(libvol) + log(cost) + I(log(LSAT)^2) +
I(log(GPA)^2) + I(log(libvol)^2) + I(log(cost)^2), data = dados)
studentized Breusch-Pagan test
data: mod3
BP = 15.673, df = 8, p-value = 0.04731
Concluo pela presença de heterocedasticidade dos resíduos em mod3.
library(car)
# possibilidades: hccm(regressao1,type=c('hc0','hc1','hc2','hc3','hc4'))
vcov.white0 <- hccm(mod3, type = c("hc1"))
#
lmtest::coeftest(mod3, vcov.white0)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.5619076 2.1432862 2.5950 0.01058 *
top10 0.7002290 0.0572244 12.2365 < 2.2e-16 ***
r11_25 0.5931739 0.0467471 12.6890 < 2.2e-16 ***
r26_40 0.3744336 0.0524192 7.1431 6.463e-11 ***
r41_60 0.2623891 0.0263407 9.9613 < 2.2e-16 ***
r61.100 0.1311293 0.0181522 7.2239 4.248e-11 ***
log(LSAT) 0.8847047 0.4854581 1.8224 0.07076 .
log(GPA) 0.0570718 0.2579492 0.2213 0.82525
log(libvol) 0.0362878 0.0259046 1.4008 0.16372
log(cost) 0.0010463 0.0293381 0.0357 0.97161
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod3)
Call:
lm(formula = log(salary) ~ top10 + r11_25 + r26_40 + r41_60 +
r61.100 + log(LSAT) + log(GPA) + log(libvol) + log(cost),
data = dados)
Residuals:
Min 1Q Median 3Q Max
-0.294920 -0.040379 -0.001164 0.044715 0.277004
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.561908 2.140706 2.598 0.0105 *
top10 0.700229 0.052517 13.333 < 2e-16 ***
r11_25 0.593174 0.039208 15.129 < 2e-16 ***
r26_40 0.374434 0.034036 11.001 < 2e-16 ***
r41_60 0.262389 0.027979 9.378 3.57e-16 ***
r61.100 0.131129 0.021049 6.230 6.42e-09 ***
log(LSAT) 0.884705 0.471736 1.875 0.0630 .
log(GPA) 0.057072 0.239032 0.239 0.8117
log(libvol) 0.036288 0.025998 1.396 0.1652
log(cost) 0.001046 0.025055 0.042 0.9668
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.08558 on 126 degrees of freedom
(20 observations deleted due to missingness)
Multiple R-squared: 0.9111, Adjusted R-squared: 0.9047
F-statistic: 143.4 on 9 and 126 DF, p-value: < 2.2e-16
# fazer stargazer com dois modelos
cov <- vcov.white0
robust.se <- sqrt(diag(cov))
stargazer(mod3, mod3, se = list(NULL, robust.se), column.labels = c("MQO-mod3",
"mod3 robusto"), title = "Título: Resultado da Regressão", align = TRUE,
type = "text", style = "all", keep.stat = c("aic", "bic", "rsq", "adj.rsq",
"n"))
Título: Resultado da Regressão
================================================
Dependent variable:
----------------------------
log(salary)
MQO-mod3 mod3 robusto
(1) (2)
------------------------------------------------
top10 0.700*** 0.700***
(0.053) (0.057)
t = 13.333 t = 12.237
p = 0.000 p = 0.000
r11_25 0.593*** 0.593***
(0.039) (0.047)
t = 15.129 t = 12.689
p = 0.000 p = 0.000
r26_40 0.374*** 0.374***
(0.034) (0.052)
t = 11.001 t = 7.143
p = 0.000 p = 0.000
r41_60 0.262*** 0.262***
(0.028) (0.026)
t = 9.378 t = 9.961
p = 0.000 p = 0.000
r61.100 0.131*** 0.131***
(0.021) (0.018)
t = 6.230 t = 7.224
p = 0.000 p = 0.000
log(LSAT) 0.885* 0.885*
(0.472) (0.485)
t = 1.875 t = 1.822
p = 0.064 p = 0.069
log(GPA) 0.057 0.057
(0.239) (0.258)
t = 0.239 t = 0.221
p = 0.812 p = 0.825
log(libvol) 0.036 0.036
(0.026) (0.026)
t = 1.396 t = 1.401
p = 0.166 p = 0.162
log(cost) 0.001 0.001
(0.025) (0.029)
t = 0.042 t = 0.036
p = 0.967 p = 0.972
Constant 5.562** 5.562***
(2.141) (2.143)
t = 2.598 t = 2.595
p = 0.011 p = 0.010
------------------------------------------------
Observations 136 136
R2 0.911 0.911
Adjusted R2 0.905 0.905
Akaike Inf. Crit. -271.086 -271.086
Bayesian Inf. Crit. -239.046 -239.046
================================================
Note: *p<0.1; **p<0.05; ***p<0.01
library(car)
library(lmtest)
library(sandwich)
dw.mod3 <- dwtest(mod3)
dw.mod3
Durbin-Watson test
data: mod3
DW = 1.9225, p-value = 0.3026
alternative hypothesis: true autocorrelation is greater than 0
Fiz uma rotina para rodar vários BGtest até ordem 12.
# padrao do teste de BG, com distribuição qui-quadrado
bgorder = 1:12 # definindo até a máxima ordem do bgtest
d = NULL
for (p in bgorder) {
bgtest.chi <- bgtest(mod3, order = p, type = c("Chisq"), data = dados)
print(bgtest.chi)
d = rbind(d, data.frame(bgtest.chi$statistic, bgtest.chi$p.value))
}
Breusch-Godfrey test for serial correlation of order up to 1
data: mod3
LM test = 0.1924, df = 1, p-value = 0.6609
Breusch-Godfrey test for serial correlation of order up to 2
data: mod3
LM test = 0.20033, df = 2, p-value = 0.9047
Breusch-Godfrey test for serial correlation of order up to 3
data: mod3
LM test = 0.20195, df = 3, p-value = 0.9773
Breusch-Godfrey test for serial correlation of order up to 4
data: mod3
LM test = 0.28421, df = 4, p-value = 0.9908
Breusch-Godfrey test for serial correlation of order up to 5
data: mod3
LM test = 0.46994, df = 5, p-value = 0.9932
Breusch-Godfrey test for serial correlation of order up to 6
data: mod3
LM test = 0.47794, df = 6, p-value = 0.9981
Breusch-Godfrey test for serial correlation of order up to 7
data: mod3
LM test = 0.5817, df = 7, p-value = 0.9991
Breusch-Godfrey test for serial correlation of order up to 8
data: mod3
LM test = 0.78819, df = 8, p-value = 0.9993
Breusch-Godfrey test for serial correlation of order up to 9
data: mod3
LM test = 0.89366, df = 9, p-value = 0.9996
Breusch-Godfrey test for serial correlation of order up to 10
data: mod3
LM test = 0.9225, df = 10, p-value = 0.9999
Breusch-Godfrey test for serial correlation of order up to 11
data: mod3
LM test = 1.2522, df = 11, p-value = 0.9998
Breusch-Godfrey test for serial correlation of order up to 12
data: mod3
LM test = 2.1526, df = 12, p-value = 0.9991
d
Conclusão: não tenho autocorrelação residual.
u.hat <- resid(mod3)
library(tseries)
JB.mod3 <- jarque.bera.test(u.hat)
JB.mod3
Jarque Bera Test
data: u.hat
X-squared = 20.538, df = 2, p-value = 3.47e-05
Rejeito H0 e concluo por resíduos não normais. Preciso verificar outliers.
TesteRESET.power <- lmtest::resettest(mod3, power = 2:3)
TesteRESET.power
RESET test
data: mod3
RESET = 0.30745, df1 = 2, df2 = 124, p-value = 0.7359
outlierTest(mod3)
rstudent unadjusted p-value Bonferroni p
3 -3.850327 0.0001873 0.025473
qqPlot(mod3)
[1] 3 47
vif(mod3)
top10 r11_25 r26_40 r41_60 r61.100 log(LSAT)
3.488650 2.635882 1.730471 1.589780 1.542470 3.478267
log(GPA) log(libvol) log(cost)
3.693818 2.225695 1.653360
As observações 3 e 47 sugerem outliers a serem controlados. Colocarei uma dummy para ambos.
dados$dummy <- 0
dados$dummy[3] = 1
dados$dummy[47] = 1
dados$dummy[79] = 1
Farei nova regressão partindo de mod3 :
attach(dados)
mod4 <- lm(log(salary) ~ top10 + r11_25 + r26_40 + r41_60 + r61.100 + log(LSAT) +
log(GPA) + log(libvol) + log(cost) + dummy + I(dummy * LSAT), data = dados)
mod4$AIC <- AIC(mod4)
mod4$BIC <- BIC(mod4)
mod5 <- lm(log(salary) ~ top10 + r11_25 + r26_40 + r41_60 + r61.100 + log(LSAT) +
log(GPA) + log(libvol) + log(cost) + dummy + I(dummy * GPA), data = dados)
mod5$AIC <- AIC(mod5)
mod5$BIC <- BIC(mod5)
mod6 <- lm(log(salary) ~ top10 + r11_25 + r26_40 + r41_60 + r61.100 + log(LSAT) +
log(GPA) + log(libvol) + log(cost) + dummy + I(dummy * libvol), data = dados)
mod6$AIC <- AIC(mod6)
mod6$BIC <- BIC(mod6)
mod7 <- lm(log(salary) ~ top10 + r11_25 + r26_40 + r41_60 + r61.100 + log(LSAT) +
log(GPA) + log(libvol) + log(cost) + dummy + I(dummy * cost), data = dados)
mod7$AIC <- AIC(mod7)
mod7$BIC <- BIC(mod7)
library(stargazer)
star.2 <- stargazer(mod4, mod5, mod6, mod7, title = "Título: Resultados das Regressões",
column.labels = c("MQO-mod4", "MQO-mod5", "MQO-mod6", "MQO-mod7"), align = TRUE,
type = "text", keep.stat = c("aic", "bic", "rsq", "adj.rsq", "n"))
Título: Resultados das Regressões
==========================================================
Dependent variable:
--------------------------------------
log(salary)
MQO-mod4 MQO-mod5 MQO-mod6 MQO-mod7
(1) (2) (3) (4)
----------------------------------------------------------
top10 0.717*** 0.713*** 0.752*** 0.745***
(0.051) (0.052) (0.047) (0.046)
r11_25 0.604*** 0.602*** 0.630*** 0.624***
(0.038) (0.039) (0.035) (0.035)
r26_40 0.356*** 0.356*** 0.412*** 0.389***
(0.033) (0.034) (0.031) (0.030)
r41_60 0.268*** 0.267*** 0.285*** 0.280***
(0.027) (0.027) (0.025) (0.025)
r61.100 0.134*** 0.134*** 0.148*** 0.143***
(0.020) (0.021) (0.019) (0.018)
log(LSAT) 0.853* 0.898* 0.923** 0.774*
(0.461) (0.468) (0.417) (0.417)
log(GPA) -0.003 0.005 -0.090 -0.074
(0.231) (0.235) (0.211) (0.210)
log(libvol) 0.035 0.034 0.023 0.031
(0.025) (0.026) (0.023) (0.023)
log(cost) -0.005 -0.005 -0.017 -0.011
(0.025) (0.025) (0.022) (0.022)
dummy -5.833*** -2.467** 8.269*** -8.352***
(2.031) (1.211) (1.368) (1.385)
I(dummy * LSAT) 0.038***
(0.013)
I(dummy * GPA) 0.775**
(0.365)
I(dummy * libvol) -0.020***
(0.003)
I(dummy * cost) 0.0005***
(0.0001)
Constant 5.852*** 5.623*** 5.772*** 6.410***
(2.087) (2.118) (1.890) (1.889)
----------------------------------------------------------
Observations 136 136 136 136
R2 0.919 0.917 0.933 0.934
Adjusted R2 0.912 0.909 0.927 0.928
Akaike Inf. Crit. -280.148 -275.942 -305.474 -306.833
Bayesian Inf. Crit. -242.284 -238.077 -267.610 -268.968
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Testarei novamente para normalidade:
dados$obs <- 1:156
u.hat <- resid(mod7)
library(tseries)
JB.mod7 <- jarque.bera.test(u.hat)
JB.mod7
Jarque Bera Test
data: u.hat
X-squared = 0.44485, df = 2, p-value = 0.8006
outlierTest(mod7)
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
96 2.444258 0.015933 NA
# detectei que o 79 também é outlier
qqPlot(mod7)
[1] 96 134
Agora sim, colocando 3, 47 e 79 como dummies consegui resíduos normais.
TesteRESET.power <- lmtest::resettest(mod7, power = 2:3)
TesteRESET.power
RESET test
data: mod7
RESET = 2.2131, df1 = 2, df2 = 122, p-value = 0.1137
Posso considerar o modelo 7 bem especificado, não autocorrelacionado conforme abaixo.
# padrao do teste de BG, com distribuição qui-quadrado
bgorder = 1:12 # definindo até a máxima ordem do bgtest
d = NULL
for (p in bgorder) {
bgtest.chi <- bgtest(mod7, order = p, type = c("Chisq"), data = dados)
print(bgtest.chi)
d = rbind(d, data.frame(bgtest.chi$statistic, bgtest.chi$p.value))
}
Breusch-Godfrey test for serial correlation of order up to 1
data: mod7
LM test = 0.06658, df = 1, p-value = 0.7964
Breusch-Godfrey test for serial correlation of order up to 2
data: mod7
LM test = 0.72076, df = 2, p-value = 0.6974
Breusch-Godfrey test for serial correlation of order up to 3
data: mod7
LM test = 0.8096, df = 3, p-value = 0.8472
Breusch-Godfrey test for serial correlation of order up to 4
data: mod7
LM test = 2.1152, df = 4, p-value = 0.7146
Breusch-Godfrey test for serial correlation of order up to 5
data: mod7
LM test = 2.2408, df = 5, p-value = 0.8149
Breusch-Godfrey test for serial correlation of order up to 6
data: mod7
LM test = 2.2696, df = 6, p-value = 0.8933
Breusch-Godfrey test for serial correlation of order up to 7
data: mod7
LM test = 2.2961, df = 7, p-value = 0.9417
Breusch-Godfrey test for serial correlation of order up to 8
data: mod7
LM test = 2.3205, df = 8, p-value = 0.9696
Breusch-Godfrey test for serial correlation of order up to 9
data: mod7
LM test = 3.0965, df = 9, p-value = 0.9603
Breusch-Godfrey test for serial correlation of order up to 10
data: mod7
LM test = 3.4961, df = 10, p-value = 0.9672
Breusch-Godfrey test for serial correlation of order up to 11
data: mod7
LM test = 3.4961, df = 11, p-value = 0.9824
Breusch-Godfrey test for serial correlation of order up to 12
data: mod7
LM test = 6.1923, df = 12, p-value = 0.9061
d
Vou checar a heterocedasticidade. A dummy somente entra em nível, não adiantando colocar ao quadrado. Não detecto heterocedasticidade. OU seja, ao controlar pelos outliers, resolvi o problema de heterocedasticidade.
lmtest::bptest(mod7, ~log(LSAT) + log(GPA) + log(libvol) + log(cost) + dummy +
I(dummy * cost) + I(log(LSAT)^2) + I(log(GPA)^2) + I(log(libvol)^2) + I(log(cost)^2),
data = dados)
studentized Breusch-Pagan test
data: mod7
BP = 14.633, df = 10, p-value = 0.146
WOOLDRIDGE, J.M. Introdução à econometria. São Paulo: CENGAGE, 4.ed. 2011.
MOHR, Franz. Chapter 7: Multiple Regression Analysis with Qualitative Information. February 4, 2018. Disponível em: https://www.r-econometrics.com/reproduction/wooldridge/wooldridge07/