Abstract
This is an undergrad student level exercise for class use. The example is drawn following Greene (2012,7ed.,p.28-29 e 1109).This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
License: CC BY-SA 4.0
Os primeiros passos são criar ou abrir um diretório de trabalho. Se optar por criar um novo projeto, haverá a possibilidade de criar em uma pasta vazia. No presente caso, geramos um dput()
dos dados de modo a incorporar (embed) a informação no código.
To illustrate the computations in a multiple regression, we consider an example based on the macroeconomic data in Appendix Table F3.1, from 1968 to 1982. To estimate an investment equation, we first convert the investment and GNP series in Table F3.1 to real terms by dividing them by the CPI and then scale the two series so that they are measured in trillions of dollars. The other variables in the regression are a time trend(1,2,…), an interest rate, and the rate of inflation computed as the percentage change in the CPI.
Chamando os dados de http://pages.stern.nyu.edu/~wgreene/Text/Edition7/TableF3-1old.txt a partir do arquivo ou a partir do resultado do dput()
. O arquivo TabelF3-1.txt traz dados atualizados para 2000-2014, diferente do texto da página 29.
> dados <- structure(list(year = c(1968, 1969, 1970, 1971, 1972, 1973, 1974,
+ 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982), index = c(1,
+ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), gnp = c(873.4,
+ 944, 992.7, 1077.6, 1185.9, 1326.4, 1434.2, 1549.2, 1718, 1918.3,
+ 2163.9, 2417.8, 2633.1, 2937.7, 3057.5), invest = c(133.3, 149.3,
+ 144.2, 166.4, 195, 229.8, 228.7, 206.1, 257.9, 324.1, 386.6,
+ 423, 402.3, 471.5, 421.9), cpi = c(82.54, 86.79, 91.45, 96.01,
+ 100, 105.75, 115.08, 125.79, 132.34, 140.05, 150.42, 163.42,
+ 178.64, 195.51, 207.23), interest = c(5.16, 5.87, 5.95, 4.88,
+ 4.5, 6.44, 7.83, 6.25, 5.5, 5.46, 7.46, 10.28, 11.77, 13.42,
+ 11.02), y = c(0.1614975, 0.1720244, 0.1576818, 0.1733153, 0.195,
+ 0.217305, 0.1987313, 0.1638445, 0.1948768, 0.2314174, 0.2570137,
+ 0.2588423, 0.2252015, 0.2411641, 0.2035902), g = c(1.058154,
+ 1.087683, 1.085511, 1.122383, 1.1859, 1.254279, 1.246264, 1.231576,
+ 1.298171, 1.369725, 1.438572, 1.479501, 1.47397, 1.502583, 1.475414
+ ), p = c(4.4, 5.149019, 5.369282, 4.986331, 4.155817, 5.75, 8.822695,
+ 9.306569, 5.207091, 5.825903, 7.404499, 8.642467, 9.313425, 9.443574,
+ 5.994578)), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
+ "data.frame"))
> attach(dados)
São dados de investimento dos Estados Unidos da América em http://people.stern.nyu.edu/wgreene/Text/Edition7/TableF3-1old.txt), contendo as seguintes variáveis: - year: ano - invest = investimento nominal;
- index = tendência;
- gnp = PNB nominal;
- interest = r = taxa de juros;
- cpi = indice de preco - p = rate of inflation computed as the percentage change in the CPI: taxa de inflação; - y = investimento real - g = PNB real
Vamos ver como está a tabela importada:
> library(knitr)
> kable(dados,caption="Dados da tabela F3-1")
year | index | gnp | invest | cpi | interest | y | g | p |
---|---|---|---|---|---|---|---|---|
1968 | 1 | 873.4 | 133.3 | 82.54 | 5.16 | 0.1614975 | 1.058154 | 4.400000 |
1969 | 2 | 944.0 | 149.3 | 86.79 | 5.87 | 0.1720244 | 1.087683 | 5.149019 |
1970 | 3 | 992.7 | 144.2 | 91.45 | 5.95 | 0.1576818 | 1.085511 | 5.369282 |
1971 | 4 | 1077.6 | 166.4 | 96.01 | 4.88 | 0.1733153 | 1.122383 | 4.986331 |
1972 | 5 | 1185.9 | 195.0 | 100.00 | 4.50 | 0.1950000 | 1.185900 | 4.155817 |
1973 | 6 | 1326.4 | 229.8 | 105.75 | 6.44 | 0.2173050 | 1.254279 | 5.750000 |
1974 | 7 | 1434.2 | 228.7 | 115.08 | 7.83 | 0.1987313 | 1.246264 | 8.822695 |
1975 | 8 | 1549.2 | 206.1 | 125.79 | 6.25 | 0.1638445 | 1.231576 | 9.306569 |
1976 | 9 | 1718.0 | 257.9 | 132.34 | 5.50 | 0.1948768 | 1.298171 | 5.207091 |
1977 | 10 | 1918.3 | 324.1 | 140.05 | 5.46 | 0.2314174 | 1.369725 | 5.825903 |
1978 | 11 | 2163.9 | 386.6 | 150.42 | 7.46 | 0.2570137 | 1.438572 | 7.404499 |
1979 | 12 | 2417.8 | 423.0 | 163.42 | 10.28 | 0.2588423 | 1.479501 | 8.642467 |
1980 | 13 | 2633.1 | 402.3 | 178.64 | 11.77 | 0.2252015 | 1.473970 | 9.313425 |
1981 | 14 | 2937.7 | 471.5 | 195.51 | 13.42 | 0.2411641 | 1.502583 | 9.443574 |
1982 | 15 | 3057.5 | 421.9 | 207.23 | 11.02 | 0.2035902 | 1.475414 | 5.994578 |
Estimando o modelo linear de regressao multipla fazendo conforme a expressão do enunciado.
Fazendo as regressoes com uso de logaritmo e depois sem logaritmos.
> tsdata <- ts(dados, start=c(1968), frequency=1)
> # regressao multipla de y~index+g+interest+p
> mod1 <- lm(y~index+g+interest+p, data=tsdata )
Vamos utilizar o pacote stargazer para organizar as saídas de resultados. Se a saída fosse apenas pelo comando summary, sairia da forma:
> summary(mod1)
Call:
lm(formula = y ~ index + g + interest + p, data = tsdata)
Residuals:
Min 1Q Median 3Q Max
-0.0100884 -0.0024963 0.0004332 0.0028830 0.0079355
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.5090669 0.0539329 -9.439 2.69e-06 ***
index -0.0165896 0.0019294 -8.598 6.23e-06 ***
g 0.6703021 0.0537996 12.459 2.05e-07 ***
interest -0.0024282 0.0011938 -2.034 0.0693 .
p 0.0000639 0.0013188 0.048 0.9623
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.006571 on 10 degrees of freedom
Multiple R-squared: 0.9735, Adjusted R-squared: 0.9629
F-statistic: 91.83 on 4 and 10 DF, p-value: 7.672e-08
Agora, criando uma tabela com a saída do modelo, com o pacote stargazer tem-se, com a geração de AIC e BIC:
> library(stargazer)
> mod1$AIC <- AIC(mod1)
> mod1$BIC <- BIC(mod1)
>
> library(stargazer)
> star.1 <- stargazer(mod1,
+ title="Título: Resultado da Regressão",
+ align=TRUE,
+ type = "text", style = "all",
+ keep.stat=c("aic","bic","rsq", "adj.rsq","n")
+ )
Título: Resultado da Regressão
===============================================
Dependent variable:
---------------------------
y
-----------------------------------------------
index -0.017***
(0.002)
t = -8.598
p = 0.00001
g 0.670***
(0.054)
t = 12.459
p = 0.00000
interest -0.002*
(0.001)
t = -2.034
p = 0.070
p 0.0001
(0.001)
t = 0.048
p = 0.963
Constant -0.509***
(0.054)
t = -9.439
p = 0.00001
-----------------------------------------------
Observations 15
R2 0.973
Adjusted R2 0.963
Akaike Inf. Crit. -102.265
Bayesian Inf. Crit. -98.017
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
A análise de correlação permitirá ter uma ideia inicial de possível multicolinearidade:
> correlacao<-cor(dados)
> correlacao
year index gnp invest cpi interest
year 1.0000000 1.0000000 0.9783635 0.9593932 0.9784008 0.8042624
index 1.0000000 1.0000000 0.9783635 0.9593932 0.9784008 0.8042624
gnp 0.9783635 0.9783635 1.0000000 0.9728115 0.9977459 0.8792517
invest 0.9593932 0.9593932 0.9728115 1.0000000 0.9565138 0.8528462
cpi 0.9784008 0.9784008 0.9977459 0.9565138 1.0000000 0.8710475
interest 0.8042624 0.8042624 0.8792517 0.8528462 0.8710475 1.0000000
y 0.7489120 0.7489120 0.7196194 0.8555747 0.6776192 0.5855466
g 0.9786167 0.9786167 0.9609032 0.9856688 0.9473514 0.8039356
p 0.6356942 0.6356942 0.6059782 0.6132829 0.6031172 0.7243053
y g p
year 0.7489120 0.9786167 0.6356942
index 0.7489120 0.9786167 0.6356942
gnp 0.7196194 0.9609032 0.6059782
invest 0.8555747 0.9856688 0.6132829
cpi 0.6776192 0.9473514 0.6031172
interest 0.5855466 0.8039356 0.7243053
y 1.0000000 0.8628333 0.4800352
g 0.8628333 1.0000000 0.6362055
p 0.4800352 0.6362055 1.0000000
As correlações em módulo com valores acima de 0.8 podem indicar presença de relação linear entre as variáveis.
Esses testes fazem uma estatística para avaliar um modelo restrito contra o modelo irrestrito. No presente caso, faremos apenas para o modelo.
> library(car)
Loading required package: carData
> # F test: APENAS ESPECIFICO AS QUE TERAO TESTE DE OMISSAO
> myH0 <- c("p")
> linearHypothesis(mod1, myH0)
Linear hypothesis test
Hypothesis:
p = 0
Model 1: restricted model
Model 2: y ~ index + g + interest + p
Res.Df RSS Df Sum of Sq F Pr(>F)
1 11 0.00043191
2 10 0.00043181 1 1.0136e-07 0.0023 0.9623
Vamos trabalhar com um modelo reduzido, sem \(p\).
> library(lmtest)
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
> lmtest::resettest(mod1,power = 2:4,type = "fitted")
RESET test
data: mod1
RESET = 0.18588, df1 = 3, df2 = 7, p-value = 0.9027
> # resíduos do modelo
> u.hat<-resid(mod1)
> # teste de média dos resíduos: H0: resíduos têm média zero
> t.test(u.hat)
One Sample t-test
data: u.hat
t = 3.0273e-17, df = 14, p-value = 1
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.003075544 0.003075544
sample estimates:
mean of x
4.341044e-20
Histograma
> # histograma
> hist.mod1<- hist(u.hat, freq = FALSE)
> curve(dnorm, add = TRUE,col="red")
Q-QPlot
> car::qqPlot(mod1)
[1] 1 13
Teste de normalidade Jarque-Bera
> JB.mod1<-tseries::jarque.bera.test(u.hat)
> JB.mod1
Jarque Bera Test
data: u.hat
X-squared = 0.59194, df = 2, p-value = 0.7438
Outros testes de normalidade
> library(nortest)
> # Testes
> t1.2 <- ks.test(u.hat, "pnorm") # KS
> t2.2 <- lillie.test(u.hat) # Lilliefors
> t3.2 <- cvm.test(u.hat) # Cramér-von Mises
> t4.2 <- shapiro.test(u.hat) # Shapiro-Wilk
> t5.2 <- sf.test(u.hat) # Shapiro-Francia
> t6.2 <- ad.test(u.hat) # Anderson-Darling
> # Tabela de resultados
> testes <- c(t1.2$method, t2.2$method, t3.2$method, t4.2$method, t5.2$method,
+ t6.2$method)
> estt <- as.numeric(c(t1.2$statistic, t2.2$statistic, t3.2$statistic,
+ t4.2$statistic, t5.2$statistic, t6.2$statistic))
> valorp <- c(t1.2$p.value, t2.2$p.value, t3.2$p.value, t4.2$p.value, t5.2$p.value,
+ t6.2$p.value)
> resultados <- cbind(estt, valorp)
> rownames(resultados) <- testes
> colnames(resultados) <- c("Estatística", "valor-prob")
> print(resultados, digits = 4)
Estatística valor-prob
One-sample Kolmogorov-Smirnov test 0.49683 0.0006048
Lilliefors (Kolmogorov-Smirnov) normality test 0.15500 0.4301853
Cramer-von Mises normality test 0.04203 0.6184573
Shapiro-Wilk normality test 0.94105 0.3957908
Shapiro-Francia normality test 0.95238 0.4796082
Anderson-Darling normality test 0.31601 0.5073463
> knitr::kable(resultados, digits = 4,caption = "Outros testes de normalidade dos resíduos")
Estatística | valor-prob | |
---|---|---|
One-sample Kolmogorov-Smirnov test | 0.4968 | 0.0006 |
Lilliefors (Kolmogorov-Smirnov) normality test | 0.1550 | 0.4302 |
Cramer-von Mises normality test | 0.0420 | 0.6185 |
Shapiro-Wilk normality test | 0.9411 | 0.3958 |
Shapiro-Francia normality test | 0.9524 | 0.4796 |
Anderson-Darling normality test | 0.3160 | 0.5073 |
> #teste de White
> # mod1 <- lm(y~year+g+interest+p, data=tsdata )
> m <- mod1
> data <- dados
> #rotina do teste com base em m e data
> u2 <- m$residuals^2
> #sem termos cruzados, no cross-terms
> reg.auxiliar <- lm(u2 ~ year+g+interest+p+
+ I(year^2)+I(g^2)+I(interest^2)+I(p^2))
> summary(reg.auxiliar)
Call:
lm(formula = u2 ~ year + g + interest + p + I(year^2) + I(g^2) +
I(interest^2) + I(p^2))
Residuals:
Min 1Q Median 3Q Max
-3.720e-05 -2.096e-05 6.030e-07 1.640e-05 4.690e-05
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.969e+00 8.571e+00 0.463 0.660
year -4.017e-03 8.675e-03 -0.463 0.660
g 4.534e-04 4.469e-03 0.101 0.923
interest -1.420e-05 6.021e-05 -0.236 0.821
p -7.437e-05 1.256e-04 -0.592 0.575
I(year^2) 1.016e-06 2.194e-06 0.463 0.660
I(g^2) -1.562e-04 1.653e-03 -0.094 0.928
I(interest^2) 6.332e-07 3.162e-06 0.200 0.848
I(p^2) 6.122e-06 8.386e-06 0.730 0.493
Residual standard error: 3.916e-05 on 6 degrees of freedom
Multiple R-squared: 0.4839, Adjusted R-squared: -0.2043
F-statistic: 0.7031 on 8 and 6 DF, p-value: 0.686
> Ru2<- summary(reg.auxiliar)$r.squared
> LM <- nrow(data)*Ru2
> #obtendo o numero de regressores menos o intercepto
> k <- length(coefficients(reg.auxiliar))-1
> k # O TESTE TEM k TERMOS REGRESSORES EM reg.auxiliar
[1] 8
> p.value <- 1-pchisq(LM, k)
> #c("LM","p.value")
> c(LM, p.value)
[1] 7.2579369 0.5090762
> # pelo pacote lmtest, o bptest dará o mesmo resultado, desde que
> # especifique a mesma regressão auxiliar usada no White acima
> library(lmtest)
> bptest(m,~year+g+interest+p+I(year^2)+I(g^2)+I(interest^2)+I(p^2))
studentized Breusch-Pagan test
data: m
BP = 7.2579, df = 8, p-value = 0.5091
Não precisa, pois aceitou a homocedasticidade dos resíduos. Mas fiz a correção para ilustrar.
> vcov.white0<-hccm(mod1,type=c("hc1"))
> coeftest(mod1,vcov.white0)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.0907e-01 3.4020e-02 -14.9639 3.578e-08 ***
index -1.6590e-02 1.3905e-03 -11.9304 3.086e-07 ***
g 6.7030e-01 3.3943e-02 19.7481 2.429e-09 ***
interest -2.4282e-03 1.1451e-03 -2.1205 0.05997 .
p 6.3897e-05 1.2446e-03 0.0513 0.96007
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> # library(car)
> reg1.vif<-vif(mod1)
> reg1.vif
index g interest p
24.138403 24.113602 3.654118 2.141294
> reg1.index <- lm(index~g+interest+p, data=tsdata )
> summary(reg1.index)
Call:
lm(formula = index ~ g + interest + p, data = tsdata)
Residuals:
Min 1Q Median 3Q Max
-1.26201 -0.72377 -0.09786 0.46658 1.82371
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -26.31751 2.84076 -9.264 1.58e-06 ***
g 26.16778 2.90406 9.011 2.07e-06 ***
interest 0.07354 0.18523 0.397 0.699
p 0.01253 0.20606 0.061 0.953
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.027 on 11 degrees of freedom
Multiple R-squared: 0.9586, Adjusted R-squared: 0.9473
F-statistic: 84.84 on 3 and 11 DF, p-value: 6.885e-08
> reg1.g <- lm(g~index+interest+p, data=tsdata )
> summary(reg1.g)
Call:
lm(formula = g ~ index + interest + p, data = tsdata)
Residuals:
Min 1Q Median 3Q Max
-0.055715 -0.019970 0.006032 0.017044 0.049755
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9953314 0.0360281 27.627 1.63e-11 ***
index 0.0336554 0.0037350 9.011 2.07e-06 ***
interest 0.0024341 0.0066499 0.366 0.721
p 0.0006912 0.0073883 0.094 0.927
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.03683 on 11 degrees of freedom
Multiple R-squared: 0.9585, Adjusted R-squared: 0.9472
F-statistic: 84.75 on 3 and 11 DF, p-value: 6.924e-08
> reg1.interest <- lm(interest~index+g+p, data=tsdata )
> summary(reg1.interest)
Call:
lm(formula = interest ~ index + g + p, data = tsdata)
Residuals:
Min 1Q Median 3Q Max
-2.36627 -1.03928 -0.00877 1.21104 2.33681
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.8159 13.5732 -0.281 0.784
index 0.1921 0.4839 0.397 0.699
g 4.9437 13.5062 0.366 0.721
p 0.5063 0.2961 1.710 0.115
Residual standard error: 1.66 on 11 degrees of freedom
Multiple R-squared: 0.7263, Adjusted R-squared: 0.6517
F-statistic: 9.732 on 3 and 11 DF, p-value: 0.001986
> reg1.p <- lm(p~index+g+interest, data=tsdata )
> summary(reg1.p)
Call:
lm(formula = p ~ index + g + interest, data = tsdata)
Residuals:
Min 1Q Median 3Q Max
-2.5409 -0.5644 -0.2925 0.4927 3.2182
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.86426 12.31724 0.151 0.882
index 0.02683 0.44102 0.061 0.953
g 1.15028 12.29468 0.094 0.927
interest 0.41486 0.24257 1.710 0.115
Residual standard error: 1.502 on 11 degrees of freedom
Multiple R-squared: 0.533, Adjusted R-squared: 0.4056
F-statistic: 4.185 on 3 and 11 DF, p-value: 0.03328
> reg2 <- lm(y~g+interest+p, data=tsdata )
> summary(reg2)
Call:
lm(formula = y ~ g + interest + p, data = tsdata)
Residuals:
Min 1Q Median 3Q Max
-0.031352 -0.004970 0.003501 0.009404 0.020617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.072471 0.050213 -1.443 0.176812
g 0.236190 0.051332 4.601 0.000763 ***
interest -0.003648 0.003274 -1.114 0.288942
p -0.000144 0.003642 -0.040 0.969163
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01815 on 11 degrees of freedom
Multiple R-squared: 0.7776, Adjusted R-squared: 0.7169
F-statistic: 12.82 on 3 and 11 DF, p-value: 0.0006529
> reg2.vif<-vif(reg2)
> reg2.vif
g interest p
2.877086 3.602503 2.140574