This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

License: CC BY-SA 4.0

License: CC BY-SA 4.0

1 Primeiros passos

Os primeiros passos são criar ou abrir um diretório de trabalho. Se optar por criar um novo projeto, haverá a possibilidade de criar em uma pasta vazia. No presente caso, geramos um dput()dos dados de modo a incorporar (embed) a informação no código.

To illustrate the computations in a multiple regression, we consider an example based on the macroeconomic data in Appendix Table F3.1, from 1968 to 1982. To estimate an investment equation, we first convert the investment and GNP series in Table F3.1 to real terms by dividing them by the CPI and then scale the two series so that they are measured in trillions of dollars. The other variables in the regression are a time trend(1,2,…), an interest rate, and the rate of inflation computed as the percentage change in the CPI.

Chamando os dados de http://pages.stern.nyu.edu/~wgreene/Text/Edition7/TableF3-1old.txt a partir do arquivo ou a partir do resultado do dput(). O arquivo TabelF3-1.txt traz dados atualizados para 2000-2014, diferente do texto da página 29.

> dados <- structure(list(year = c(1968, 1969, 1970, 1971, 1972, 1973, 1974, 
+ 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982), index = c(1, 
+ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), gnp = c(873.4, 
+ 944, 992.7, 1077.6, 1185.9, 1326.4, 1434.2, 1549.2, 1718, 1918.3, 
+ 2163.9, 2417.8, 2633.1, 2937.7, 3057.5), invest = c(133.3, 149.3, 
+ 144.2, 166.4, 195, 229.8, 228.7, 206.1, 257.9, 324.1, 386.6, 
+ 423, 402.3, 471.5, 421.9), cpi = c(82.54, 86.79, 91.45, 96.01, 
+ 100, 105.75, 115.08, 125.79, 132.34, 140.05, 150.42, 163.42, 
+ 178.64, 195.51, 207.23), interest = c(5.16, 5.87, 5.95, 4.88, 
+ 4.5, 6.44, 7.83, 6.25, 5.5, 5.46, 7.46, 10.28, 11.77, 13.42, 
+ 11.02), y = c(0.1614975, 0.1720244, 0.1576818, 0.1733153, 0.195, 
+ 0.217305, 0.1987313, 0.1638445, 0.1948768, 0.2314174, 0.2570137, 
+ 0.2588423, 0.2252015, 0.2411641, 0.2035902), g = c(1.058154, 
+ 1.087683, 1.085511, 1.122383, 1.1859, 1.254279, 1.246264, 1.231576, 
+ 1.298171, 1.369725, 1.438572, 1.479501, 1.47397, 1.502583, 1.475414
+ ), p = c(4.4, 5.149019, 5.369282, 4.986331, 4.155817, 5.75, 8.822695, 
+ 9.306569, 5.207091, 5.825903, 7.404499, 8.642467, 9.313425, 9.443574, 
+ 5.994578)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", 
+ "data.frame"))
> attach(dados)

São dados de investimento dos Estados Unidos da América em http://people.stern.nyu.edu/wgreene/Text/Edition7/TableF3-1old.txt), contendo as seguintes variáveis: - year: ano - invest = investimento nominal;
- index = tendência;
- gnp = PNB nominal;
- interest = r = taxa de juros;
- cpi = indice de preco - p = rate of inflation computed as the percentage change in the CPI: taxa de inflação; - y = investimento real - g = PNB real

Vamos ver como está a tabela importada:

> library(knitr)
> kable(dados,caption="Dados da tabela F3-1")
Dados da tabela F3-1
year index gnp invest cpi interest y g p
1968 1 873.4 133.3 82.54 5.16 0.1614975 1.058154 4.400000
1969 2 944.0 149.3 86.79 5.87 0.1720244 1.087683 5.149019
1970 3 992.7 144.2 91.45 5.95 0.1576818 1.085511 5.369282
1971 4 1077.6 166.4 96.01 4.88 0.1733153 1.122383 4.986331
1972 5 1185.9 195.0 100.00 4.50 0.1950000 1.185900 4.155817
1973 6 1326.4 229.8 105.75 6.44 0.2173050 1.254279 5.750000
1974 7 1434.2 228.7 115.08 7.83 0.1987313 1.246264 8.822695
1975 8 1549.2 206.1 125.79 6.25 0.1638445 1.231576 9.306569
1976 9 1718.0 257.9 132.34 5.50 0.1948768 1.298171 5.207091
1977 10 1918.3 324.1 140.05 5.46 0.2314174 1.369725 5.825903
1978 11 2163.9 386.6 150.42 7.46 0.2570137 1.438572 7.404499
1979 12 2417.8 423.0 163.42 10.28 0.2588423 1.479501 8.642467
1980 13 2633.1 402.3 178.64 11.77 0.2252015 1.473970 9.313425
1981 14 2937.7 471.5 195.51 13.42 0.2411641 1.502583 9.443574
1982 15 3057.5 421.9 207.23 11.02 0.2035902 1.475414 5.994578

Estimando o modelo linear de regressao multipla fazendo conforme a expressão do enunciado.

2 Resultados

2.1 Estimação

Fazendo as regressoes com uso de logaritmo e depois sem logaritmos.

> tsdata <- ts(dados, start=c(1968), frequency=1)
> # regressao multipla de y~index+g+interest+p
> mod1 <- lm(y~index+g+interest+p, data=tsdata )

Vamos utilizar o pacote stargazer para organizar as saídas de resultados. Se a saída fosse apenas pelo comando summary, sairia da forma:

> summary(mod1)

Call:
lm(formula = y ~ index + g + interest + p, data = tsdata)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0100884 -0.0024963  0.0004332  0.0028830  0.0079355 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.5090669  0.0539329  -9.439 2.69e-06 ***
index       -0.0165896  0.0019294  -8.598 6.23e-06 ***
g            0.6703021  0.0537996  12.459 2.05e-07 ***
interest    -0.0024282  0.0011938  -2.034   0.0693 .  
p            0.0000639  0.0013188   0.048   0.9623    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.006571 on 10 degrees of freedom
Multiple R-squared:  0.9735,    Adjusted R-squared:  0.9629 
F-statistic: 91.83 on 4 and 10 DF,  p-value: 7.672e-08

Agora, criando uma tabela com a saída do modelo, com o pacote stargazer tem-se, com a geração de AIC e BIC:

> library(stargazer)
> mod1$AIC <- AIC(mod1)
> mod1$BIC <- BIC(mod1)
> 
> library(stargazer)
> star.1 <- stargazer(mod1,
+           title="Título: Resultado da Regressão",
+           align=TRUE,
+           type = "text", style = "all",
+           keep.stat=c("aic","bic","rsq", "adj.rsq","n")
+ )

Título: Resultado da Regressão
===============================================
                        Dependent variable:    
                    ---------------------------
                                 y             
-----------------------------------------------
index                        -0.017***         
                              (0.002)          
                            t = -8.598         
                            p = 0.00001        
g                            0.670***          
                              (0.054)          
                            t = 12.459         
                            p = 0.00000        
interest                      -0.002*          
                              (0.001)          
                            t = -2.034         
                             p = 0.070         
p                             0.0001           
                              (0.001)          
                             t = 0.048         
                             p = 0.963         
Constant                     -0.509***         
                              (0.054)          
                            t = -9.439         
                            p = 0.00001        
-----------------------------------------------
Observations                    15             
R2                             0.973           
Adjusted R2                    0.963           
Akaike Inf. Crit.            -102.265          
Bayesian Inf. Crit.           -98.017          
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

2.2 Correlação

A análise de correlação permitirá ter uma ideia inicial de possível multicolinearidade:

> correlacao<-cor(dados)
> correlacao
              year     index       gnp    invest       cpi  interest
year     1.0000000 1.0000000 0.9783635 0.9593932 0.9784008 0.8042624
index    1.0000000 1.0000000 0.9783635 0.9593932 0.9784008 0.8042624
gnp      0.9783635 0.9783635 1.0000000 0.9728115 0.9977459 0.8792517
invest   0.9593932 0.9593932 0.9728115 1.0000000 0.9565138 0.8528462
cpi      0.9784008 0.9784008 0.9977459 0.9565138 1.0000000 0.8710475
interest 0.8042624 0.8042624 0.8792517 0.8528462 0.8710475 1.0000000
y        0.7489120 0.7489120 0.7196194 0.8555747 0.6776192 0.5855466
g        0.9786167 0.9786167 0.9609032 0.9856688 0.9473514 0.8039356
p        0.6356942 0.6356942 0.6059782 0.6132829 0.6031172 0.7243053
                 y         g         p
year     0.7489120 0.9786167 0.6356942
index    0.7489120 0.9786167 0.6356942
gnp      0.7196194 0.9609032 0.6059782
invest   0.8555747 0.9856688 0.6132829
cpi      0.6776192 0.9473514 0.6031172
interest 0.5855466 0.8039356 0.7243053
y        1.0000000 0.8628333 0.4800352
g        0.8628333 1.0000000 0.6362055
p        0.4800352 0.6362055 1.0000000

As correlações em módulo com valores acima de 0.8 podem indicar presença de relação linear entre as variáveis.

2.3 Testes de restrição do modelo

Esses testes fazem uma estatística para avaliar um modelo restrito contra o modelo irrestrito. No presente caso, faremos apenas para o modelo.

> library(car)
Loading required package: carData
> # F test: APENAS ESPECIFICO AS QUE TERAO TESTE DE OMISSAO
> myH0 <- c("p")
> linearHypothesis(mod1, myH0)
Linear hypothesis test

Hypothesis:
p = 0

Model 1: restricted model
Model 2: y ~ index + g + interest + p

  Res.Df        RSS Df  Sum of Sq      F Pr(>F)
1     11 0.00043191                            
2     10 0.00043181  1 1.0136e-07 0.0023 0.9623

Vamos trabalhar com um modelo reduzido, sem \(p\).

2.4 Teste RESET de Ramsey

> library(lmtest)
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
> lmtest::resettest(mod1,power = 2:4,type = "fitted")

    RESET test

data:  mod1
RESET = 0.18588, df1 = 3, df2 = 7, p-value = 0.9027

2.5 Teste de média e normalidade dos resíduos

> # resíduos do modelo 
> u.hat<-resid(mod1)
> # teste de média dos resíduos: H0: resíduos têm média zero
> t.test(u.hat)   

    One Sample t-test

data:  u.hat
t = 3.0273e-17, df = 14, p-value = 1
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.003075544  0.003075544
sample estimates:
   mean of x 
4.341044e-20 

2.5.1 Normalidade

Histograma

> # histograma
> hist.mod1<- hist(u.hat, freq = FALSE)
> curve(dnorm, add = TRUE,col="red")

Q-QPlot

> car::qqPlot(mod1)

[1]  1 13

Teste de normalidade Jarque-Bera

> JB.mod1<-tseries::jarque.bera.test(u.hat)
> JB.mod1

    Jarque Bera Test

data:  u.hat
X-squared = 0.59194, df = 2, p-value = 0.7438

Outros testes de normalidade

> library(nortest)
> # Testes
> t1.2 <- ks.test(u.hat, "pnorm") # KS
> t2.2 <- lillie.test(u.hat) # Lilliefors
> t3.2 <- cvm.test(u.hat) # Cramér-von Mises
> t4.2 <- shapiro.test(u.hat) # Shapiro-Wilk
> t5.2 <- sf.test(u.hat) # Shapiro-Francia
> t6.2 <- ad.test(u.hat) # Anderson-Darling
> # Tabela de resultados
> testes <- c(t1.2$method, t2.2$method, t3.2$method, t4.2$method, t5.2$method,
+             t6.2$method)
> estt <- as.numeric(c(t1.2$statistic, t2.2$statistic, t3.2$statistic, 
+                      t4.2$statistic, t5.2$statistic, t6.2$statistic))
> valorp <- c(t1.2$p.value, t2.2$p.value, t3.2$p.value, t4.2$p.value, t5.2$p.value,
+             t6.2$p.value)
> resultados <- cbind(estt, valorp)
> rownames(resultados) <- testes
> colnames(resultados) <- c("Estatística", "valor-prob")
> print(resultados, digits = 4)
                                               Estatística valor-prob
One-sample Kolmogorov-Smirnov test                 0.49683  0.0006048
Lilliefors (Kolmogorov-Smirnov) normality test     0.15500  0.4301853
Cramer-von Mises normality test                    0.04203  0.6184573
Shapiro-Wilk normality test                        0.94105  0.3957908
Shapiro-Francia normality test                     0.95238  0.4796082
Anderson-Darling normality test                    0.31601  0.5073463
> knitr::kable(resultados, digits = 4,caption = "Outros testes de normalidade dos resíduos")
Outros testes de normalidade dos resíduos
Estatística valor-prob
One-sample Kolmogorov-Smirnov test 0.4968 0.0006
Lilliefors (Kolmogorov-Smirnov) normality test 0.1550 0.4302
Cramer-von Mises normality test 0.0420 0.6185
Shapiro-Wilk normality test 0.9411 0.3958
Shapiro-Francia normality test 0.9524 0.4796
Anderson-Darling normality test 0.3160 0.5073

2.6 Teste de Heterocedasticidade de White

> #teste de White 
> # mod1 <- lm(y~year+g+interest+p, data=tsdata )
> m <- mod1
> data <- dados
> #rotina do teste com base em m e data
> u2 <- m$residuals^2
>   #sem termos cruzados, no cross-terms
> reg.auxiliar <- lm(u2 ~ year+g+interest+p+
+                      I(year^2)+I(g^2)+I(interest^2)+I(p^2))
> summary(reg.auxiliar)

Call:
lm(formula = u2 ~ year + g + interest + p + I(year^2) + I(g^2) + 
    I(interest^2) + I(p^2))

Residuals:
       Min         1Q     Median         3Q        Max 
-3.720e-05 -2.096e-05  6.030e-07  1.640e-05  4.690e-05 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)
(Intercept)    3.969e+00  8.571e+00   0.463    0.660
year          -4.017e-03  8.675e-03  -0.463    0.660
g              4.534e-04  4.469e-03   0.101    0.923
interest      -1.420e-05  6.021e-05  -0.236    0.821
p             -7.437e-05  1.256e-04  -0.592    0.575
I(year^2)      1.016e-06  2.194e-06   0.463    0.660
I(g^2)        -1.562e-04  1.653e-03  -0.094    0.928
I(interest^2)  6.332e-07  3.162e-06   0.200    0.848
I(p^2)         6.122e-06  8.386e-06   0.730    0.493

Residual standard error: 3.916e-05 on 6 degrees of freedom
Multiple R-squared:  0.4839,    Adjusted R-squared:  -0.2043 
F-statistic: 0.7031 on 8 and 6 DF,  p-value: 0.686
> Ru2<- summary(reg.auxiliar)$r.squared
> LM <- nrow(data)*Ru2
> #obtendo o numero de regressores menos o intercepto
> k <- length(coefficients(reg.auxiliar))-1
> k    # O TESTE TEM k TERMOS REGRESSORES EM reg.auxiliar
[1] 8
> p.value <- 1-pchisq(LM, k) 
> #c("LM","p.value")
> c(LM, p.value)
[1] 7.2579369 0.5090762
> # pelo pacote lmtest, o bptest dará o mesmo resultado, desde que 
> # especifique a mesma regressão auxiliar usada no White acima
> library(lmtest)
> bptest(m,~year+g+interest+p+I(year^2)+I(g^2)+I(interest^2)+I(p^2))

    studentized Breusch-Pagan test

data:  m
BP = 7.2579, df = 8, p-value = 0.5091

2.7 Correção de White

Não precisa, pois aceitou a homocedasticidade dos resíduos. Mas fiz a correção para ilustrar.

> vcov.white0<-hccm(mod1,type=c("hc1"))
> coeftest(mod1,vcov.white0)

t test of coefficients:

               Estimate  Std. Error  t value  Pr(>|t|)    
(Intercept) -5.0907e-01  3.4020e-02 -14.9639 3.578e-08 ***
index       -1.6590e-02  1.3905e-03 -11.9304 3.086e-07 ***
g            6.7030e-01  3.3943e-02  19.7481 2.429e-09 ***
interest    -2.4282e-03  1.1451e-03  -2.1205   0.05997 .  
p            6.3897e-05  1.2446e-03   0.0513   0.96007    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.8 Investigação da Multicolinearidade das variáveis Explicativas

2.8.1 Variance Inflation Factor (VIF)=Fator de Inflação da Variância (FIV)

> # library(car)
> reg1.vif<-vif(mod1)
> reg1.vif
    index         g  interest         p 
24.138403 24.113602  3.654118  2.141294 

2.8.2 regressões auxiliares para a regra de Klein

> reg1.index <- lm(index~g+interest+p, data=tsdata )
> summary(reg1.index)

Call:
lm(formula = index ~ g + interest + p, data = tsdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.26201 -0.72377 -0.09786  0.46658  1.82371 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -26.31751    2.84076  -9.264 1.58e-06 ***
g            26.16778    2.90406   9.011 2.07e-06 ***
interest      0.07354    0.18523   0.397    0.699    
p             0.01253    0.20606   0.061    0.953    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.027 on 11 degrees of freedom
Multiple R-squared:  0.9586,    Adjusted R-squared:  0.9473 
F-statistic: 84.84 on 3 and 11 DF,  p-value: 6.885e-08
> reg1.g <- lm(g~index+interest+p, data=tsdata )
> summary(reg1.g)

Call:
lm(formula = g ~ index + interest + p, data = tsdata)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.055715 -0.019970  0.006032  0.017044  0.049755 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.9953314  0.0360281  27.627 1.63e-11 ***
index       0.0336554  0.0037350   9.011 2.07e-06 ***
interest    0.0024341  0.0066499   0.366    0.721    
p           0.0006912  0.0073883   0.094    0.927    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.03683 on 11 degrees of freedom
Multiple R-squared:  0.9585,    Adjusted R-squared:  0.9472 
F-statistic: 84.75 on 3 and 11 DF,  p-value: 6.924e-08
> reg1.interest <- lm(interest~index+g+p, data=tsdata )
> summary(reg1.interest)

Call:
lm(formula = interest ~ index + g + p, data = tsdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.36627 -1.03928 -0.00877  1.21104  2.33681 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -3.8159    13.5732  -0.281    0.784
index         0.1921     0.4839   0.397    0.699
g             4.9437    13.5062   0.366    0.721
p             0.5063     0.2961   1.710    0.115

Residual standard error: 1.66 on 11 degrees of freedom
Multiple R-squared:  0.7263,    Adjusted R-squared:  0.6517 
F-statistic: 9.732 on 3 and 11 DF,  p-value: 0.001986
> reg1.p <- lm(p~index+g+interest, data=tsdata )
> summary(reg1.p)

Call:
lm(formula = p ~ index + g + interest, data = tsdata)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5409 -0.5644 -0.2925  0.4927  3.2182 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.86426   12.31724   0.151    0.882
index        0.02683    0.44102   0.061    0.953
g            1.15028   12.29468   0.094    0.927
interest     0.41486    0.24257   1.710    0.115

Residual standard error: 1.502 on 11 degrees of freedom
Multiple R-squared:  0.533, Adjusted R-squared:  0.4056 
F-statistic: 4.185 on 3 and 11 DF,  p-value: 0.03328

2.8.3 Retirando a variável index

> reg2 <- lm(y~g+interest+p, data=tsdata )
> summary(reg2)

Call:
lm(formula = y ~ g + interest + p, data = tsdata)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.031352 -0.004970  0.003501  0.009404  0.020617 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.072471   0.050213  -1.443 0.176812    
g            0.236190   0.051332   4.601 0.000763 ***
interest    -0.003648   0.003274  -1.114 0.288942    
p           -0.000144   0.003642  -0.040 0.969163    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.01815 on 11 degrees of freedom
Multiple R-squared:  0.7776,    Adjusted R-squared:  0.7169 
F-statistic: 12.82 on 3 and 11 DF,  p-value: 0.0006529
> reg2.vif<-vif(reg2)
> reg2.vif
       g interest        p 
2.877086 3.602503 2.140574