Inicialmente, será carregado o banco de dados referente a Infecção Respiratoria Aguda em crianças, ao longo de 16 anos na cidade de Rondonopolis - MT, os dados foram coletados entre janeiro de 1999 até dezembro de 2014, este estudo contém 5 variáveis, sendo elas IRA (Número de pessoas com infecção respiratória aguda), Precip, Temp, Umid.

setwd("C:\\Users\\Mateus\\Desktop\\P7\\Consultoria\\Atividade 3")
dados<-read.table("Dados3.txt",h=T)
head(dados)

##   Mes.Ano IRA Precip  Temp  Umid
## 1  jan/99 170  413.8 25.39 86.15
## 2  fev/99 134   81.0 25.96 84.09
## 3  mar/99 241   86.4 25.49 88.12
## 4  abr/99 293   26.5 24.61 75.22
## 5  mai/99 402    3.9 22.17 69.27
## 6  jun/99 324    1.4 22.76 68.59

attach(dados)

Instalandos e carregando os pacotes nescessarios.

#install.packages("hnp")

library(hnp)

## Warning: package 'hnp' was built under R version 3.5.3

## Loading required package: MASS

library(psych)

## Warning: package 'psych' was built under R version 3.5.3

library(nortest)
require(fBasics)

## Loading required package: fBasics

## Warning: package 'fBasics' was built under R version 3.5.3

## Loading required package: timeDate

## Loading required package: timeSeries

## 
## Attaching package: 'timeSeries'

## The following object is masked from 'package:psych':
## 
##     outlier

## 
## Attaching package: 'fBasics'

## The following object is masked from 'package:psych':
## 
##     tr

require(TSA)

## Loading required package: TSA

## Warning: package 'TSA' was built under R version 3.5.3

## 
## Attaching package: 'TSA'

## The following objects are masked from 'package:timeDate':
## 
##     kurtosis, skewness

## The following objects are masked from 'package:stats':
## 
##     acf, arima

## The following object is masked from 'package:utils':
## 
##     tar

library(tseries)

## Warning: package 'tseries' was built under R version 3.5.3

library(forecast)

## Warning: package 'forecast' was built under R version 3.5.3

describe(dados)

##          vars   n   mean     sd median trimmed    mad min    max  range
## Mes.Ano*    1 192  96.50  55.57  96.50   96.50  71.16   1 192.00 191.00
## IRA         2 192 431.05 215.66 426.50  421.09 263.90 103 904.00 801.00
## Precip      3 192 104.43 108.59  68.65   89.89 100.22   0 572.20 572.20
## Temp        4 192  25.17   1.61  25.59   25.25   1.22  21  28.74   7.74
## Umid        5 192  76.73  12.55  80.73   77.93  11.64  42  95.29  53.29
##           skew kurtosis    se
## Mes.Ano*  0.00    -1.22  4.01
## IRA       0.24    -0.98 15.56
## Precip    1.10     0.95  7.84
## Temp     -0.52    -0.40  0.12
## Umid     -0.74    -0.44  0.91

A unica variavel que tem a curtose positiva é Precip, ou seja possui caudas pesadas. As demais variáveis possui a curtose negativa, então a funçao de distribuição é mais achatada do que a distribuição normal

Gerando o histograma das variáveis

par(mfrow=c(2,2))
histPlot(as.timeSeries(dados$IRA))
histPlot(as.timeSeries(dados$Precip))
histPlot(as.timeSeries(dados$Temp))
histPlot(as.timeSeries(dados$Umid))

Teste de Normalidade

ad.test(dados$IRA)

## 
##  Anderson-Darling normality test
## 
## data:  dados$IRA
## A = 1.9645, p-value = 5.049e-05

ad.test(dados$Precip)

## 
##  Anderson-Darling normality test
## 
## data:  dados$Precip
## A = 8.0584, p-value < 2.2e-16

ad.test(dados$Temp)

## 
##  Anderson-Darling normality test
## 
## data:  dados$Temp
## A = 4.3088, p-value = 9.714e-11

ad.test(dados$Umid)

## 
##  Anderson-Darling normality test
## 
## data:  dados$Umid
## A = 4.8197, p-value = 5.67e-12

Conforme o tesde de normalidade, todas foram significativas, logo as variáveis IRA, Precip, Temp, Umid não segue normalidade.

Correlações

cor.test(dados$IRA,dados$Precip, method = "kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  dados$IRA and dados$Precip
## z = -4.5062, p-value = 6.599e-06
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##        tau 
## -0.2209575

cor.test(dados$IRA,dados$Temp, method = "kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  dados$IRA and dados$Temp
## z = -3.9666, p-value = 7.29e-05
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##      tau 
## -0.19303

cor.test(dados$IRA,dados$Umid, method = "kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  dados$IRA and dados$Umid
## z = -6.7018, p-value = 2.059e-11
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##        tau 
## -0.3257263

As variaveis IRA, Precip, Temp e Umid, tem correlação negativa. Ou seja nesse caso à medida que Precip, Temp e Umid cresce, a IRA decresce (em média), são inversamente proporcionais.

par(mfrow=c(2,2))
densityPlot(as.timeSeries(dados$IRA))
densityPlot(as.timeSeries(dados$Precip))
densityPlot(as.timeSeries(dados$Temp))
densityPlot(as.timeSeries(dados$Umid))

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:fBasics':
## 
##     densityPlot

## The following object is masked from 'package:psych':
## 
##     logit

qqPlot(dados$IRA)

## [1] 19 45

Modelos Lineares Generalizados

Ajustando o modelo com a distribuição Poisson

mod<- glm(IRA ~ Precip + Temp + Umid , family = poisson)

summary(mod)

## 
## Call:
## glm(formula = IRA ~ Precip + Temp + Umid, family = poisson)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -20.5705   -8.1575    0.1349    6.8616   17.9029  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.346e+00  6.158e-02 151.779  < 2e-16 ***
## Precip       2.348e-04  4.671e-05   5.026 5.01e-07 ***
## Temp        -8.277e-02  2.208e-03 -37.479  < 2e-16 ***
## Umid        -1.631e-02  3.426e-04 -47.611  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 21614  on 191  degrees of freedom
## Residual deviance: 16300  on 188  degrees of freedom
## AIC: 17796
## 
## Number of Fisher Scoring iterations: 4

1-pchisq(16300,188)

## [1] 0

O modelo linear generalizado não se adequou bem aos dados.

set.seed(1)
hnp(mod, print=TRUE,verb=TRUE,xlab="Quantis teóricos")

## Poisson model 
## Simulation 1 out of 99 
## Simulation 2 out of 99 
## Simulation 3 out of 99 
## Simulation 4 out of 99 
## Simulation 5 out of 99 
## Simulation 6 out of 99 
## Simulation 7 out of 99 
## Simulation 8 out of 99 
## Simulation 9 out of 99 
## Simulation 10 out of 99 
## Simulation 11 out of 99 
## Simulation 12 out of 99 
## Simulation 13 out of 99 
## Simulation 14 out of 99 
## Simulation 15 out of 99 
## Simulation 16 out of 99 
## Simulation 17 out of 99 
## Simulation 18 out of 99 
## Simulation 19 out of 99 
## Simulation 20 out of 99 
## Simulation 21 out of 99 
## Simulation 22 out of 99 
## Simulation 23 out of 99 
## Simulation 24 out of 99 
## Simulation 25 out of 99 
## Simulation 26 out of 99 
## Simulation 27 out of 99 
## Simulation 28 out of 99 
## Simulation 29 out of 99 
## Simulation 30 out of 99 
## Simulation 31 out of 99 
## Simulation 32 out of 99 
## Simulation 33 out of 99 
## Simulation 34 out of 99 
## Simulation 35 out of 99 
## Simulation 36 out of 99 
## Simulation 37 out of 99 
## Simulation 38 out of 99 
## Simulation 39 out of 99 
## Simulation 40 out of 99 
## Simulation 41 out of 99 
## Simulation 42 out of 99 
## Simulation 43 out of 99 
## Simulation 44 out of 99 
## Simulation 45 out of 99 
## Simulation 46 out of 99 
## Simulation 47 out of 99 
## Simulation 48 out of 99 
## Simulation 49 out of 99 
## Simulation 50 out of 99 
## Simulation 51 out of 99 
## Simulation 52 out of 99 
## Simulation 53 out of 99 
## Simulation 54 out of 99 
## Simulation 55 out of 99 
## Simulation 56 out of 99 
## Simulation 57 out of 99 
## Simulation 58 out of 99 
## Simulation 59 out of 99 
## Simulation 60 out of 99 
## Simulation 61 out of 99 
## Simulation 62 out of 99 
## Simulation 63 out of 99 
## Simulation 64 out of 99 
## Simulation 65 out of 99 
## Simulation 66 out of 99 
## Simulation 67 out of 99 
## Simulation 68 out of 99 
## Simulation 69 out of 99 
## Simulation 70 out of 99 
## Simulation 71 out of 99 
## Simulation 72 out of 99 
## Simulation 73 out of 99 
## Simulation 74 out of 99 
## Simulation 75 out of 99 
## Simulation 76 out of 99 
## Simulation 77 out of 99 
## Simulation 78 out of 99 
## Simulation 79 out of 99 
## Simulation 80 out of 99 
## Simulation 81 out of 99 
## Simulation 82 out of 99 
## Simulation 83 out of 99 
## Simulation 84 out of 99 
## Simulation 85 out of 99 
## Simulation 86 out of 99 
## Simulation 87 out of 99 
## Simulation 88 out of 99 
## Simulation 89 out of 99 
## Simulation 90 out of 99 
## Simulation 91 out of 99 
## Simulation 92 out of 99 
## Simulation 93 out of 99 
## Simulation 94 out of 99 
## Simulation 95 out of 99 
## Simulation 96 out of 99 
## Simulation 97 out of 99 
## Simulation 98 out of 99 
## Simulation 99 out of 99

Ajustando modelo com a distribuição binomial negativa

mod.2<- glm.nb(IRA ~ Precip + Temp + Umid , data = dados)

summary(mod.2)

## 
## Call:
## glm.nb(formula = IRA ~ Precip + Temp + Umid, data = dados, init.theta = 4.439088697, 
##     link = log)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.25860  -0.90984  -0.00822   0.61173   1.82142  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.6310305  0.6520314  14.771  < 2e-16 ***
## Precip       0.0003234  0.0004354   0.743 0.457649    
## Temp        -0.0886262  0.0230982  -3.837 0.000125 ***
## Umid        -0.0182615  0.0035407  -5.158  2.5e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.4391) family taken to be 1)
## 
##     Null deviance: 255.05  on 191  degrees of freedom
## Residual deviance: 199.24  on 188  degrees of freedom
## AIC: 2557.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.439 
##           Std. Err.:  0.443 
## 
##  2 x log-likelihood:  -2547.086

1-pchisq(199.24/4.439,188)

## [1] 1

set.seed(22)
hnp(mod.2, print=TRUE,verb=TRUE,xlab="Quantis teóricos")

## Negative binomial model (using MASS package) 
## Simulation 1 out of 99 
## Simulation 2 out of 99 
## Simulation 3 out of 99 
## Simulation 4 out of 99 
## Simulation 5 out of 99 
## Simulation 6 out of 99 
## Simulation 7 out of 99 
## Simulation 8 out of 99 
## Simulation 9 out of 99 
## Simulation 10 out of 99 
## Simulation 11 out of 99 
## Simulation 12 out of 99 
## Simulation 13 out of 99 
## Simulation 14 out of 99 
## Simulation 15 out of 99 
## Simulation 16 out of 99 
## Simulation 17 out of 99 
## Simulation 18 out of 99 
## Simulation 19 out of 99 
## Simulation 20 out of 99 
## Simulation 21 out of 99 
## Simulation 22 out of 99 
## Simulation 23 out of 99 
## Simulation 24 out of 99 
## Simulation 25 out of 99 
## Simulation 26 out of 99 
## Simulation 27 out of 99 
## Simulation 28 out of 99 
## Simulation 29 out of 99 
## Simulation 30 out of 99 
## Simulation 31 out of 99 
## Simulation 32 out of 99 
## Simulation 33 out of 99 
## Simulation 34 out of 99 
## Simulation 35 out of 99 
## Simulation 36 out of 99 
## Simulation 37 out of 99 
## Simulation 38 out of 99 
## Simulation 39 out of 99 
## Simulation 40 out of 99 
## Simulation 41 out of 99 
## Simulation 42 out of 99 
## Simulation 43 out of 99 
## Simulation 44 out of 99 
## Simulation 45 out of 99 
## Simulation 46 out of 99 
## Simulation 47 out of 99 
## Simulation 48 out of 99 
## Simulation 49 out of 99 
## Simulation 50 out of 99 
## Simulation 51 out of 99 
## Simulation 52 out of 99 
## Simulation 53 out of 99 
## Simulation 54 out of 99 
## Simulation 55 out of 99 
## Simulation 56 out of 99 
## Simulation 57 out of 99 
## Simulation 58 out of 99 
## Simulation 59 out of 99 
## Simulation 60 out of 99 
## Simulation 61 out of 99 
## Simulation 62 out of 99 
## Simulation 63 out of 99 
## Simulation 64 out of 99 
## Simulation 65 out of 99 
## Simulation 66 out of 99 
## Simulation 67 out of 99 
## Simulation 68 out of 99 
## Simulation 69 out of 99 
## Simulation 70 out of 99 
## Simulation 71 out of 99 
## Simulation 72 out of 99 
## Simulation 73 out of 99 
## Simulation 74 out of 99 
## Simulation 75 out of 99 
## Simulation 76 out of 99 
## Simulation 77 out of 99 
## Simulation 78 out of 99 
## Simulation 79 out of 99 
## Simulation 80 out of 99 
## Simulation 81 out of 99 
## Simulation 82 out of 99 
## Simulation 83 out of 99 
## Simulation 84 out of 99 
## Simulation 85 out of 99 
## Simulation 86 out of 99 
## Simulation 87 out of 99 
## Simulation 88 out of 99 
## Simulation 89 out of 99 
## Simulation 90 out of 99 
## Simulation 91 out of 99 
## Simulation 92 out of 99 
## Simulation 93 out of 99 
## Simulation 94 out of 99 
## Simulation 95 out of 99 
## Simulation 96 out of 99 
## Simulation 97 out of 99 
## Simulation 98 out of 99 
## Simulation 99 out of 99

Retirando a variavel não significativa Precip

mod.3<- glm.nb(IRA ~ Temp + Umid , data = dados)

summary(mod.3)

## 
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dados, init.theta = 4.428413664, 
##     link = log)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.26621  -0.86707  -0.00378   0.61908   1.79729  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.371591   0.559763  16.742  < 2e-16 ***
## Temp        -0.082011   0.021540  -3.807  0.00014 ***
## Umid        -0.016606   0.002771  -5.993 2.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.4284) family taken to be 1)
## 
##     Null deviance: 254.44  on 191  degrees of freedom
## Residual deviance: 199.26  on 189  degrees of freedom
## AIC: 2555.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.428 
##           Std. Err.:  0.442 
## 
##  2 x log-likelihood:  -2547.577

1-pchisq(199.26/4.428,189)

## [1] 1

set.seed(10)
hnp(mod.3, print=TRUE,verb=TRUE,xlab="Quantis teóricos")

## Negative binomial model (using MASS package) 
## Simulation 1 out of 99 
## Simulation 2 out of 99 
## Simulation 3 out of 99 
## Simulation 4 out of 99 
## Simulation 5 out of 99 
## Simulation 6 out of 99 
## Simulation 7 out of 99 
## Simulation 8 out of 99 
## Simulation 9 out of 99 
## Simulation 10 out of 99 
## Simulation 11 out of 99 
## Simulation 12 out of 99 
## Simulation 13 out of 99 
## Simulation 14 out of 99 
## Simulation 15 out of 99 
## Simulation 16 out of 99 
## Simulation 17 out of 99 
## Simulation 18 out of 99 
## Simulation 19 out of 99 
## Simulation 20 out of 99 
## Simulation 21 out of 99 
## Simulation 22 out of 99 
## Simulation 23 out of 99 
## Simulation 24 out of 99 
## Simulation 25 out of 99 
## Simulation 26 out of 99 
## Simulation 27 out of 99 
## Simulation 28 out of 99 
## Simulation 29 out of 99 
## Simulation 30 out of 99 
## Simulation 31 out of 99 
## Simulation 32 out of 99 
## Simulation 33 out of 99 
## Simulation 34 out of 99 
## Simulation 35 out of 99 
## Simulation 36 out of 99 
## Simulation 37 out of 99 
## Simulation 38 out of 99 
## Simulation 39 out of 99 
## Simulation 40 out of 99 
## Simulation 41 out of 99 
## Simulation 42 out of 99 
## Simulation 43 out of 99 
## Simulation 44 out of 99 
## Simulation 45 out of 99 
## Simulation 46 out of 99 
## Simulation 47 out of 99 
## Simulation 48 out of 99 
## Simulation 49 out of 99 
## Simulation 50 out of 99 
## Simulation 51 out of 99 
## Simulation 52 out of 99 
## Simulation 53 out of 99 
## Simulation 54 out of 99 
## Simulation 55 out of 99 
## Simulation 56 out of 99 
## Simulation 57 out of 99 
## Simulation 58 out of 99 
## Simulation 59 out of 99 
## Simulation 60 out of 99 
## Simulation 61 out of 99 
## Simulation 62 out of 99 
## Simulation 63 out of 99 
## Simulation 64 out of 99 
## Simulation 65 out of 99 
## Simulation 66 out of 99 
## Simulation 67 out of 99 
## Simulation 68 out of 99 
## Simulation 69 out of 99 
## Simulation 70 out of 99 
## Simulation 71 out of 99 
## Simulation 72 out of 99 
## Simulation 73 out of 99 
## Simulation 74 out of 99 
## Simulation 75 out of 99 
## Simulation 76 out of 99 
## Simulation 77 out of 99 
## Simulation 78 out of 99 
## Simulation 79 out of 99 
## Simulation 80 out of 99 
## Simulation 81 out of 99 
## Simulation 82 out of 99 
## Simulation 83 out of 99 
## Simulation 84 out of 99 
## Simulation 85 out of 99 
## Simulation 86 out of 99 
## Simulation 87 out of 99 
## Simulation 88 out of 99 
## Simulation 89 out of 99 
## Simulation 90 out of 99 
## Simulation 91 out of 99 
## Simulation 92 out of 99 
## Simulation 93 out of 99 
## Simulation 94 out of 99 
## Simulation 95 out of 99 
## Simulation 96 out of 99 
## Simulation 97 out of 99 
## Simulation 98 out of 99 
## Simulation 99 out of 99

influenceIndexPlot(mod.3, vars = c('Studentized','Cook','Hat'), id.n=10)

Gerando um outro modelo sem os pontos (29,141,153,175,187).

mod.4<-glm.nb(IRA~Temp + Umid, data = dados[-c(29,141,153,175,187),])
summary(mod.4)

## 
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dados[-c(29, 141, 
##     153, 175, 187), ], init.theta = 4.681453477, link = log)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.17624  -0.86633  -0.02689   0.64024   1.83226  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.339802   0.560124  16.675  < 2e-16 ***
## Temp        -0.076033   0.022114  -3.438 0.000585 ***
## Umid        -0.018048   0.002801  -6.444 1.16e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.6815) family taken to be 1)
## 
##     Null deviance: 255.53  on 186  degrees of freedom
## Residual deviance: 193.75  on 184  degrees of freedom
## AIC: 2481.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.681 
##           Std. Err.:  0.475 
## 
##  2 x log-likelihood:  -2473.856

summary(mod.3)

## 
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dados, init.theta = 4.428413664, 
##     link = log)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.26621  -0.86707  -0.00378   0.61908   1.79729  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.371591   0.559763  16.742  < 2e-16 ***
## Temp        -0.082011   0.021540  -3.807  0.00014 ***
## Umid        -0.016606   0.002771  -5.993 2.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.4284) family taken to be 1)
## 
##     Null deviance: 254.44  on 191  degrees of freedom
## Residual deviance: 199.26  on 189  degrees of freedom
## AIC: 2555.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.428 
##           Std. Err.:  0.442 
## 
##  2 x log-likelihood:  -2547.577

difinter<- 9.371591-9.339802
porcent<- difinter*100/9.371591
porcent

## [1] 0.339206

diftemp<- 0.082011-0.076033
porcent2<- diftemp*100/0.082011
porcent2

## [1] 7.289266

difumid<- (0.016606-0.018048)*-1
porcent3<- difumid*100/0.0182615
porcent3

## [1] 7.896394

Como houve uma variação maior que 5% em pelo menos uma das variáveis do modelo (Temp e Umid) não se pode reitrar estes pontos (29,141,153,175,187) do MLG.

Gráfico de Series Temporais

par(mar = c(4,4,1,4))

plot(IRA, type = "l", col = "Green", lwd = 1,
     ylab= "Infecção respiratória aguda",
     bty = "n",
     xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192,204), las=2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")
plot(Precip, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "Black")
axis(4,lty=2, col.axis = "black", c(0,100,200,300,400,500,600))
#axis(1)
mtext("Precipitação Pluviométrica média (mm)",side=4 , line = 3)
title(main="IRA X Precip")

par(mar = c(4,4,1,4))

plot(IRA, type = "l", col = "Green", lwd = 1,
     ylab= "Infecção respiratória aguda",
     bty = "n",
     xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192), las = 2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")

plot(Temp, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "Black")
axis(4,lty=2, col.axis = "black", c(20,21,22,23,24,25,26,27,28,29,30))
#axis(1)
mtext("Temperatura média (ºC)",side=4 , line = 3)
title(main="IRA X Temp ")

par(mar = c(4,4,1,4))

plot(IRA, type = "l", col = "Green", lwd = 1,
     ylab= "Infecção respiratória aguda",
     bty = "n",
     xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192), las = 2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")

plot(Umid, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "Black")
axis(4,lty=2, col.axis = "black", c(40,50,60,70,80,90,100))
#axis(1)
mtext("Umidade (%)",side=4 , line = 3)
title(main="IRA X Umid ")

Series Temporais

Foi ajusatado para os dados uma serie com transformação logaritima para suavizar a variabilidade da serie, mas para a variavel precip quando foi aplicado a transformação logaritima, retornou valores infinitos, por esse motivo não se aplicou a transformação logaritima para a variavel precip.

Primeiro temos que usar o comando ts para que o R reconheça as variavéis como series temporais.

dadosst<-dados[,-c(1)]

st1 <- ts(IRA,start = 1999, frequency=12)
st2 <- ts(Precip,start = 1999, frequency=12)
st3 <- ts(Temp,start = 1999, frequency=12)
st4 <- ts(Umid,start = 1999, frequency=12)

log.st1<- log(st1)
log.st3<- log(st3)
log.st4<- log(st4)

Se iniciam em janeiro de 1999 com frequência 12, ou seja, são dados mensais.

Em seguida será gerado o plot das variavéis.

par(mfrow=c(2,2))
plot(log.st1,type = "l", col='blue',main='IRA',xlab='Anos')
plot(st2,type = "l", col='Red',main='Precip',xlab='Anos')
plot(log.st3,type = "l", col='Yellow',main='Temp',xlab='Anos')
plot(log.st4,type = "l", col='Green',main='Umid',xlab='Anos')

par(mfrow=c(2,2))
histPlot(as.timeSeries(st1))
histPlot(as.timeSeries(st2))
histPlot(as.timeSeries(st3))
histPlot(as.timeSeries(st4))

Verificando a Estacionariedade das series.

adf.test(log.st1)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  log.st1
## Dickey-Fuller = -4.8978, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

adf.test(st2)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  st2
## Dickey-Fuller = -10.955, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

adf.test(log.st3)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  log.st3
## Dickey-Fuller = -7.2784, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

adf.test(log.st4)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  log.st4
## Dickey-Fuller = -10.102, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

Usando adf.test, ele gera um valor p muito pequeno, por essa razão o software no retorna o valor p de 0.01 que significa que os dados são estacionários.

Gráfico da decomposição das series

a<-decompose(log.st1, type=c("additive", "multiplicative"), filter=NULL)
plot(a)

b<-decompose(st2, type=c("additive", "multiplicative"), filter=NULL)
plot(b)

c<-decompose(log.st3, type=c("additive", "multiplicative"), filter=NULL)
plot(c)

d<-decompose(log.st4, type=c("additive", "multiplicative"), filter=NULL)
plot(d)

Na figura acima observa-se que todas as quatros séries aparentemente apresemtam sazonalidade e a variavel IRA apresenta uma tendência é decrescente.

Correlograma

par(mfrow=c(2,2))
acf(log.st1,lag.max=150, main="Correlograma da serie IRA")
acf(st2,lag.max=150, main="Correlograma da serie Precip")
acf(log.st3,lag.max=150, main="Correlograma da serie Temp")
acf(log.st4,lag.max=150, main="Correlograma da serie Umid")

A acf amostral comprova que as séries são não-estacionária, pois apresentam um decaimento pra zero lento.

Correlograma Parcial

par(mfrow=c(2,2))
pacf(log.st1,lag.max=100, main = "Correlograma Parcial IRA")
pacf(st2,lag.max=100, main = "Correlograma Parcial Precip")
pacf(log.st3,lag.max=100, main = "Correlograma Parcial Temp")
pacf(log.st4,lag.max=100, main = "Correlograma Parcial Umid")

Os picos na PACF representam as fortes correlações entre os meses da série.

Modelagem e Previsão

Será modelada a série IRA (Infecção Respiratoria aguda) com valores que abrangem o período de Janeiro/1999 a Dezembro/2014. Consideraremos previsões de 1,3,6 e 12 passos à frente, a fim de avaliar as capacidades preditivas dos diferentes métodos em diferentes horizontes de previsão.

Usaremos inicialmente os algoritmos de suavização aditivo e multiplicativo de Holt-Winters para produzir previsões das variavéis.

Dado a ser predito

observado.st1<-log.st1[length(log.st1)]
observado.st2<-st2[length(st2)]
observado.st3<-log.st3[length(log.st3)]
observado.st4<-log.st4[length(log.st4)]

Vamos realizar agora a previsão a 6 passos a frente e a 12 passos a frente (sem as 6 e as 12 últimas observações)

####IRA

## Previsão 6 passo à frente (Sem última observação)
IRA6p <- log.st1[-((length(log.st1)-5):length(log.st1))]
st16p <- ts(IRA6p, start = c(1999,7), frequency = 12)

## Previsão 12 passo à frente (Sem última observação)
IRA12p <- log.st1[-((length(log.st1)-11):length(log.st1))]
st112p <- ts(IRA12p, start = c(1994,7), frequency = 12)


####Precipitação

## Previsão 6 passo à frente (Sem última observação)
Precip6p <- st2[-((length(st2)-5):length(st2))]
st26p <- ts(Precip6p, start = c(1999,7), frequency = 12)

## Previsão 12 passo à frente (Sem última observação)
Precip12p <- st2[-((length(st2)-11):length(st2))]
st212p <- ts(Precip12p, start = c(1994,7), frequency = 12)


####Temperatura

## Previsão 6 passo à frente (Sem última observação)
Temp6p <- log.st3[-((length(log.st3)-5):length(log.st3))]
st36p <- ts(Temp6p, start = c(1999,7), frequency = 12)

## Previsão 12 passo à frente (Sem última observação)
Precip12p <- log.st3[-((length(log.st3)-11):length(log.st3))]
st312p <- ts(Precip12p, start = c(1994,7), frequency = 12)


####Umidade
Umid6p <- log.st4[-((length(log.st4)-5):length(log.st4))]
st46p <- ts(Umid6p, start = c(1999,7), frequency = 12)

## Previsão 12 passo à frente (Sem última observação)
Umid12p <- log.st4[-((length(log.st4)-11):length(log.st4))]
st412p <- ts(Umid12p, start = c(1994,7), frequency = 12)

Holt-Winters Aditivo

hwa6p.st1 <- HoltWinters(st16p, seasonal = "additive")
hwa12p.st1 <- HoltWinters(st112p, seasonal = "additive")


hwa6p.st2<-HoltWinters(st26p, seasonal="additive")
hwa12p.st2<-HoltWinters(st212p, seasonal="additive")

hwa6p.st3<-HoltWinters(st36p, seasonal="additive")
hwa12p.st3<-HoltWinters(st312p, seasonal="additive")

hwa6p.st4<-HoltWinters(st46p, seasonal="additive")
hwa12p.st4<-HoltWinters(st412p, seasonal="additive")

seguindo com as previsões para as 4 séries em 6 passos a frente:

prev.hwa6p.st1<-predict(hwa6p.st1,6)[6]
prev.hwa6p.st2<-predict(hwa6p.st2,6)[6]
prev.hwa6p.st3<-predict(hwa6p.st3,6)[6]
prev.hwa6p.st4<-predict(hwa6p.st4,6)[6]

e o erro relativo de previsão:

(observado.st1-prev.hwa6p.st1)/observado.st1*100

## [1] 5.145289

(observado.st2-prev.hwa6p.st2)/observado.st2*100

## [1] 32.90784

(observado.st3-prev.hwa6p.st3)/observado.st3*100

## [1] -0.8407157

(observado.st4-prev.hwa6p.st4)/observado.st4*100

## [1] 0.6260008

Agora, as previsões para as 4 séries em 12 passos a frente:

prev.hwa12p.st1<-predict(hwa12p.st1,12)[12]
prev.hwa12p.st2<-predict(hwa12p.st2,12)[12]
prev.hwa12p.st3<-predict(hwa12p.st3,12)[12]
prev.hwa12p.st4<-predict(hwa12p.st4,12)[12]

Finalmente, o erro relativo de previsão para 12 passos a frente:

(observado.st1-prev.hwa12p.st1)/observado.st1*100

## [1] -1.786785

(observado.st2-prev.hwa12p.st2)/observado.st2*100

## [1] 24.49074

(observado.st3-prev.hwa12p.st3)/observado.st3*100

## [1] -0.8249554

(observado.st4-prev.hwa12p.st4)/observado.st4*100

## [1] -0.2383521

O erro relativo de previsão para a variavel precip foi muito elevado a 6 passos a frente em relação as demais, as outras tiveram um pequeno erro relativo de previsão.

Gráfico para verificar as previsões 6 passos a frente nas 4 séries:

par(mfrow = c(2,2))
plot(hwa6pp.ira,xlab="Tempo",ylab="Percentual",main="IRA")
plot(hwa6pp.precip,xlab="Tempo",ylab="Percentual",main="Precip")
plot(hwa6pp.umid,xlab="Tempo",ylab="Percentual",main="Umid")
plot(hwa6pp.temp,xlab="Tempo",ylab="Percentual",main="Temp")

O gráfico abaixo compara os valores ajustados e observados, ao qual a linha vermelha é o valor ajustado e a linha preta é o valor observado.

par(mfrow = c(2,2))
plot(hwa6p.st1,xlab="Tempo",ylab="Observado/Ajustado",main="IRA")
plot(hwa6p.st2,xlab="Tempo",ylab="Observado/Ajustado",main="Precip")
plot(hwa6p.st3,xlab="Tempo",ylab="Observado/Ajustado",main="Umid")
plot(hwa6p.st4,xlab="Tempo",ylab="Observado/Ajustado",main="Temp")

Interpretação: De acordo com o gráfico acima, o modelo não ajustou bem o início das séries acima. Por sua vez, abaixo apresenta os gráficos para 12 passos a frente.

par(mfrow = c(2,2))
plot(hwa12pp.ira,xlab="Tempo",ylab="Percentual",main="IRA")
plot(hwa12pp.precip,xlab="Tempo",ylab="Percentual",main="Precip")
plot(hwa12pp.umid,xlab="Tempo",ylab="Percentual",main="Umid")
plot(hwa12pp.temp,xlab="Tempo",ylab="Percentual",main="Temp")

É necessário realizar o gráfico dos valores ajustados versus observado, para 12 passos a frente, como foi feito para 6 passos:

par(mfrow = c(2,2))
plot(hwa12p.st1,xlab="Tempo",ylab="Observado/Ajustado",main="IRA")
plot(hwa12p.st2,xlab="Tempo",ylab="Observado/Ajustado",main="Precip")
plot(hwa12p.st3,xlab="Tempo",ylab="Observado/Ajustado",main="Umid")
plot(hwa12p.st4,xlab="Tempo",ylab="Observado/Ajustado",main="Temp")

Atividade 3

Mateus

19 de março de 2019