descrição dos dados

O banco de dados contém 5 variáveis: IRA = Número de pessoas com infecção respiratória aguda, Precip= nível de precipitação medio por mês em mm, Temp= Temperatura média por mês em °C, Umid=indice de umidade médio por mês em %, Os dados foram coletados entre janeiro de 1999 até dezembro de 2014, com o objetivo de saber se a variável IRA tem ligação com as demais variáveis.

dados<-read.table("C://Users//USER//Downloads//Mt3.txt", header = T)
head(dados)

##   Mes.Ano IRA Precip  Temp  Umid
## 1  jan/99 170  413.8 25.39 86.15
## 2  fev/99 134   81.0 25.96 84.09
## 3  mar/99 241   86.4 25.49 88.12
## 4  abr/99 293   26.5 24.61 75.22
## 5  mai/99 402    3.9 22.17 69.27
## 6  jun/99 324    1.4 22.76 68.59

attach(dados)

Análise Descritiva

require(fBasics)

## Loading required package: fBasics

## Loading required package: timeDate

## Loading required package: timeSeries

library(corrplot)

## corrplot 0.84 loaded

library(nortest)
require(ExpDes)

## Loading required package: ExpDes

## 
## Attaching package: 'ExpDes'

## The following object is masked from 'package:stats':
## 
##     ccf

require(car)

## Loading required package: car

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:fBasics':
## 
##     densityPlot

require(psych)

## Loading required package: psych

## 
## Attaching package: 'psych'

## The following object is masked from 'package:car':
## 
##     logit

## The following object is masked from 'package:fBasics':
## 
##     tr

## The following object is masked from 'package:timeSeries':
## 
##     outlier

require(hnp)

## Loading required package: hnp

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:ExpDes':
## 
##     ginv

Retirando a coluna referente ao tempo.

dadost<-dados[,c(2,3,4,5)]

describe(dadost)

##        vars   n   mean     sd median trimmed    mad min    max  range
## IRA       1 192 431.05 215.66 426.50  421.09 263.90 103 904.00 801.00
## Precip    2 192 104.43 108.59  68.65   89.89 100.22   0 572.20 572.20
## Temp      3 192  25.17   1.61  25.59   25.25   1.22  21  28.74   7.74
## Umid      4 192  76.73  12.55  80.73   77.93  11.64  42  95.29  53.29
##         skew kurtosis    se
## IRA     0.24    -0.98 15.56
## Precip  1.10     0.95  7.84
## Temp   -0.52    -0.40  0.12
## Umid   -0.74    -0.44  0.91

summary(dadost)

##       IRA            Precip            Temp            Umid      
##  Min.   :103.0   Min.   :  0.00   Min.   :21.00   Min.   :42.00  
##  1st Qu.:229.5   1st Qu.: 10.93   1st Qu.:24.20   1st Qu.:68.51  
##  Median :426.5   Median : 68.65   Median :25.59   Median :80.73  
##  Mean   :431.1   Mean   :104.43   Mean   :25.17   Mean   :76.73  
##  3rd Qu.:591.8   3rd Qu.:168.18   3rd Qu.:26.21   3rd Qu.:86.94  
##  Max.   :904.0   Max.   :572.20   Max.   :28.74   Max.   :95.29

Observa-se a seguir uma breve analise descritiva dos dados. Nele observam-se as médias, desvios padrões, medianas, valor de máximo e mínimo, como também os valores de assimetria e Curtose. Valores que se destacam entre essas estatísticas são: os valores de Curtoses, desvios padrões e amplitude das variáveis IRA e Precip.

par(mfrow=c(2,2))
boxplot(dadost$IRA, las= 2)
boxplot(dadost$Temp, las= 2)
boxplot(dadost$Precip, las= 2)
boxplot(dadost$Umid, las= 2)

As variáveis Temp e Precip foram as únicas variáveis que tiveram pontos discrepantes.

par(mfrow=c(2,2))
histPlot(as.timeSeries(IRA))
histPlot(as.timeSeries(Precip))
histPlot(as.timeSeries(Temp))
histPlot(as.timeSeries(Umid))

par(mfrow=c(2,2))
densityPlot(as.timeSeries(IRA))
densityPlot(as.timeSeries(Precip))
densityPlot(as.timeSeries(Temp))
densityPlot(as.timeSeries(Umid))

Usando como base os histogramas e funções de densidades, observa-se que nenhuma das quatro variáveis aparenta seguir uma distribuição normal, em seguida irá ser fazer um teste de normalidade, junto ao qqplot de cada variável.

par(mfrow=c(2,2))
qqPlot(IRA)

## [1] 19 45

qqPlot(Temp)

## [1]  19 141

qqPlot(precip)

##  Mobile Phoenix 
##       1       3

qqPlot(Umid)

## [1] 80  8

ad.test(IRA)

## 
##  Anderson-Darling normality test
## 
## data:  IRA
## A = 1.9645, p-value = 5.049e-05

ad.test(Precip)

## 
##  Anderson-Darling normality test
## 
## data:  Precip
## A = 8.0584, p-value < 2.2e-16

ad.test(Temp)

## 
##  Anderson-Darling normality test
## 
## data:  Temp
## A = 4.3088, p-value = 9.714e-11

ad.test(Umid)

## 
##  Anderson-Darling normality test
## 
## data:  Umid
## A = 4.8197, p-value = 5.67e-12

Observando os QQplots das variáveis e também usando o teste de Anderson Darling conclui-se que nenhuma das quatro variáveis seguem normalidade.

cor.test(IRA,Precip, method = "kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  IRA and Precip
## z = -4.5062, p-value = 6.599e-06
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##        tau 
## -0.2209575

cor.test(IRA,Temp, method = "kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  IRA and Temp
## z = -3.9666, p-value = 7.29e-05
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##      tau 
## -0.19303

cor.test(IRA,Umid, method = "kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  IRA and Umid
## z = -6.7018, p-value = 2.059e-11
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##        tau 
## -0.3257263

Usando o teste de Correlação pelo método de Kendall pode-se afirmar que nenhuma das 3 variáveis (Precip, Temp e Umid) tem correlação com IRA.

Tabela de correlação

corrplot(cor(dadost), order = "hclust",tl.col = 'black', tl.cex = 0.75)

Ajuste do modelo com a variável resposta seguindo a distribuição Poisson

fit<- glm(IRA~ Precip + Temp + Umid , family = poisson(link = "log"))

summary(fit)

## 
## Call:
## glm(formula = IRA ~ Precip + Temp + Umid, family = poisson(link = "log"))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -20.5705   -8.1575    0.1349    6.8616   17.9029  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.346e+00  6.158e-02 151.779  < 2e-16 ***
## Precip       2.348e-04  4.671e-05   5.026 5.01e-07 ***
## Temp        -8.277e-02  2.208e-03 -37.479  < 2e-16 ***
## Umid        -1.631e-02  3.426e-04 -47.611  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 21614  on 191  degrees of freedom
## Residual deviance: 16300  on 188  degrees of freedom
## AIC: 17796
## 
## Number of Fisher Scoring iterations: 4

1-pchisq(16300,188)

## [1] 0

hnp(fit, print = TRUE, verb = TRUE, xlab = "Quantis Teóricos")

## Poisson model 
## Simulation 1 out of 99 
## Simulation 2 out of 99 
## Simulation 3 out of 99 
## Simulation 4 out of 99 
## Simulation 5 out of 99 
## Simulation 6 out of 99 
## Simulation 7 out of 99 
## Simulation 8 out of 99 
## Simulation 9 out of 99 
## Simulation 10 out of 99 
## Simulation 11 out of 99 
## Simulation 12 out of 99 
## Simulation 13 out of 99 
## Simulation 14 out of 99 
## Simulation 15 out of 99 
## Simulation 16 out of 99 
## Simulation 17 out of 99 
## Simulation 18 out of 99 
## Simulation 19 out of 99 
## Simulation 20 out of 99 
## Simulation 21 out of 99 
## Simulation 22 out of 99 
## Simulation 23 out of 99 
## Simulation 24 out of 99 
## Simulation 25 out of 99 
## Simulation 26 out of 99 
## Simulation 27 out of 99 
## Simulation 28 out of 99 
## Simulation 29 out of 99 
## Simulation 30 out of 99 
## Simulation 31 out of 99 
## Simulation 32 out of 99 
## Simulation 33 out of 99 
## Simulation 34 out of 99 
## Simulation 35 out of 99 
## Simulation 36 out of 99 
## Simulation 37 out of 99 
## Simulation 38 out of 99 
## Simulation 39 out of 99 
## Simulation 40 out of 99 
## Simulation 41 out of 99 
## Simulation 42 out of 99 
## Simulation 43 out of 99 
## Simulation 44 out of 99 
## Simulation 45 out of 99 
## Simulation 46 out of 99 
## Simulation 47 out of 99 
## Simulation 48 out of 99 
## Simulation 49 out of 99 
## Simulation 50 out of 99 
## Simulation 51 out of 99 
## Simulation 52 out of 99 
## Simulation 53 out of 99 
## Simulation 54 out of 99 
## Simulation 55 out of 99 
## Simulation 56 out of 99 
## Simulation 57 out of 99 
## Simulation 58 out of 99 
## Simulation 59 out of 99 
## Simulation 60 out of 99 
## Simulation 61 out of 99 
## Simulation 62 out of 99 
## Simulation 63 out of 99 
## Simulation 64 out of 99 
## Simulation 65 out of 99 
## Simulation 66 out of 99 
## Simulation 67 out of 99 
## Simulation 68 out of 99 
## Simulation 69 out of 99 
## Simulation 70 out of 99 
## Simulation 71 out of 99 
## Simulation 72 out of 99 
## Simulation 73 out of 99 
## Simulation 74 out of 99 
## Simulation 75 out of 99 
## Simulation 76 out of 99 
## Simulation 77 out of 99 
## Simulation 78 out of 99 
## Simulation 79 out of 99 
## Simulation 80 out of 99 
## Simulation 81 out of 99 
## Simulation 82 out of 99 
## Simulation 83 out of 99 
## Simulation 84 out of 99 
## Simulation 85 out of 99 
## Simulation 86 out of 99 
## Simulation 87 out of 99 
## Simulation 88 out of 99 
## Simulation 89 out of 99 
## Simulation 90 out of 99 
## Simulation 91 out of 99 
## Simulation 92 out of 99 
## Simulation 93 out of 99 
## Simulation 94 out of 99 
## Simulation 95 out of 99 
## Simulation 96 out of 99 
## Simulation 97 out of 99 
## Simulation 98 out of 99 
## Simulation 99 out of 99

Observa-se no envelope de simulação usando a distribuição Poisson como distribuição para a variável resposta, nota-se que o modelo linear generalizado não se adequou bem aos dados.

Ajustando com a variável resposta seguindo a distribuição Binomial Negativa

fit2<- glm.nb(IRA ~ Precip + Temp + Umid, data = dadost)

summary(fit2)

## 
## Call:
## glm.nb(formula = IRA ~ Precip + Temp + Umid, data = dadost, init.theta = 4.439088697, 
##     link = log)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.25860  -0.90984  -0.00822   0.61173   1.82142  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.6310305  0.6520314  14.771  < 2e-16 ***
## Precip       0.0003234  0.0004354   0.743 0.457649    
## Temp        -0.0886262  0.0230982  -3.837 0.000125 ***
## Umid        -0.0182615  0.0035407  -5.158  2.5e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.4391) family taken to be 1)
## 
##     Null deviance: 255.05  on 191  degrees of freedom
## Residual deviance: 199.24  on 188  degrees of freedom
## AIC: 2557.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.439 
##           Std. Err.:  0.443 
## 
##  2 x log-likelihood:  -2547.086

1-pchisq(199.24/4.4391, 188)

## [1] 1

hnp(fit2, print = TRUE, verb = TRUE, xlab = "Quantis Teóricos", main ="Envelope de simulação com Precip")

## Negative binomial model (using MASS package) 
## Simulation 1 out of 99 
## Simulation 2 out of 99 
## Simulation 3 out of 99 
## Simulation 4 out of 99 
## Simulation 5 out of 99 
## Simulation 6 out of 99 
## Simulation 7 out of 99 
## Simulation 8 out of 99 
## Simulation 9 out of 99 
## Simulation 10 out of 99 
## Simulation 11 out of 99 
## Simulation 12 out of 99 
## Simulation 13 out of 99 
## Simulation 14 out of 99 
## Simulation 15 out of 99 
## Simulation 16 out of 99 
## Simulation 17 out of 99 
## Simulation 18 out of 99 
## Simulation 19 out of 99 
## Simulation 20 out of 99 
## Simulation 21 out of 99 
## Simulation 22 out of 99 
## Simulation 23 out of 99 
## Simulation 24 out of 99 
## Simulation 25 out of 99 
## Simulation 26 out of 99 
## Simulation 27 out of 99 
## Simulation 28 out of 99 
## Simulation 29 out of 99 
## Simulation 30 out of 99 
## Simulation 31 out of 99 
## Simulation 32 out of 99 
## Simulation 33 out of 99 
## Simulation 34 out of 99 
## Simulation 35 out of 99 
## Simulation 36 out of 99 
## Simulation 37 out of 99 
## Simulation 38 out of 99 
## Simulation 39 out of 99 
## Simulation 40 out of 99 
## Simulation 41 out of 99 
## Simulation 42 out of 99 
## Simulation 43 out of 99 
## Simulation 44 out of 99 
## Simulation 45 out of 99 
## Simulation 46 out of 99 
## Simulation 47 out of 99 
## Simulation 48 out of 99 
## Simulation 49 out of 99 
## Simulation 50 out of 99 
## Simulation 51 out of 99 
## Simulation 52 out of 99 
## Simulation 53 out of 99 
## Simulation 54 out of 99 
## Simulation 55 out of 99 
## Simulation 56 out of 99 
## Simulation 57 out of 99 
## Simulation 58 out of 99 
## Simulation 59 out of 99 
## Simulation 60 out of 99 
## Simulation 61 out of 99 
## Simulation 62 out of 99 
## Simulation 63 out of 99 
## Simulation 64 out of 99 
## Simulation 65 out of 99 
## Simulation 66 out of 99 
## Simulation 67 out of 99 
## Simulation 68 out of 99 
## Simulation 69 out of 99 
## Simulation 70 out of 99 
## Simulation 71 out of 99 
## Simulation 72 out of 99 
## Simulation 73 out of 99 
## Simulation 74 out of 99 
## Simulation 75 out of 99 
## Simulation 76 out of 99 
## Simulation 77 out of 99 
## Simulation 78 out of 99 
## Simulation 79 out of 99 
## Simulation 80 out of 99 
## Simulation 81 out of 99 
## Simulation 82 out of 99 
## Simulation 83 out of 99 
## Simulation 84 out of 99 
## Simulation 85 out of 99 
## Simulation 86 out of 99 
## Simulation 87 out of 99 
## Simulation 88 out of 99 
## Simulation 89 out of 99 
## Simulation 90 out of 99 
## Simulation 91 out of 99 
## Simulation 92 out of 99 
## Simulation 93 out of 99 
## Simulation 94 out of 99 
## Simulation 95 out of 99 
## Simulation 96 out of 99 
## Simulation 97 out of 99 
## Simulation 98 out of 99 
## Simulation 99 out of 99

Já Usando a Binomial Negativa como distribuição da variável resposta obteve-se que se pelo envelope de simulação tem-se um bom MLG para os dados em estudos. Porém, obteve-se uma variável não significativa (Precip). Retirando a variável Precip

fit3<- glm.nb(IRA ~ Temp + Umid, data = dadost)

summary(fit3)

## 
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dadost, init.theta = 4.428413664, 
##     link = log)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.26621  -0.86707  -0.00378   0.61908   1.79729  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.371591   0.559763  16.742  < 2e-16 ***
## Temp        -0.082011   0.021540  -3.807  0.00014 ***
## Umid        -0.016606   0.002771  -5.993 2.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.4284) family taken to be 1)
## 
##     Null deviance: 254.44  on 191  degrees of freedom
## Residual deviance: 199.26  on 189  degrees of freedom
## AIC: 2555.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.428 
##           Std. Err.:  0.442 
## 
##  2 x log-likelihood:  -2547.577

1-pchisq(199.26/4.4284, 189)

## [1] 1

hnp(fit3, print = TRUE, verb = TRUE, xlab = "Quantis Teóricos", main ="Envelope de simulação sem Precip")

## Negative binomial model (using MASS package) 
## Simulation 1 out of 99 
## Simulation 2 out of 99 
## Simulation 3 out of 99 
## Simulation 4 out of 99 
## Simulation 5 out of 99 
## Simulation 6 out of 99 
## Simulation 7 out of 99 
## Simulation 8 out of 99 
## Simulation 9 out of 99 
## Simulation 10 out of 99 
## Simulation 11 out of 99 
## Simulation 12 out of 99 
## Simulation 13 out of 99 
## Simulation 14 out of 99 
## Simulation 15 out of 99 
## Simulation 16 out of 99 
## Simulation 17 out of 99 
## Simulation 18 out of 99 
## Simulation 19 out of 99 
## Simulation 20 out of 99 
## Simulation 21 out of 99 
## Simulation 22 out of 99 
## Simulation 23 out of 99 
## Simulation 24 out of 99 
## Simulation 25 out of 99 
## Simulation 26 out of 99 
## Simulation 27 out of 99 
## Simulation 28 out of 99 
## Simulation 29 out of 99 
## Simulation 30 out of 99 
## Simulation 31 out of 99 
## Simulation 32 out of 99 
## Simulation 33 out of 99 
## Simulation 34 out of 99 
## Simulation 35 out of 99 
## Simulation 36 out of 99 
## Simulation 37 out of 99 
## Simulation 38 out of 99 
## Simulation 39 out of 99 
## Simulation 40 out of 99 
## Simulation 41 out of 99 
## Simulation 42 out of 99 
## Simulation 43 out of 99 
## Simulation 44 out of 99 
## Simulation 45 out of 99 
## Simulation 46 out of 99 
## Simulation 47 out of 99 
## Simulation 48 out of 99 
## Simulation 49 out of 99 
## Simulation 50 out of 99 
## Simulation 51 out of 99 
## Simulation 52 out of 99 
## Simulation 53 out of 99 
## Simulation 54 out of 99 
## Simulation 55 out of 99 
## Simulation 56 out of 99 
## Simulation 57 out of 99 
## Simulation 58 out of 99 
## Simulation 59 out of 99 
## Simulation 60 out of 99 
## Simulation 61 out of 99 
## Simulation 62 out of 99 
## Simulation 63 out of 99 
## Simulation 64 out of 99 
## Simulation 65 out of 99 
## Simulation 66 out of 99 
## Simulation 67 out of 99 
## Simulation 68 out of 99 
## Simulation 69 out of 99 
## Simulation 70 out of 99 
## Simulation 71 out of 99 
## Simulation 72 out of 99 
## Simulation 73 out of 99 
## Simulation 74 out of 99 
## Simulation 75 out of 99 
## Simulation 76 out of 99 
## Simulation 77 out of 99 
## Simulation 78 out of 99 
## Simulation 79 out of 99 
## Simulation 80 out of 99 
## Simulation 81 out of 99 
## Simulation 82 out of 99 
## Simulation 83 out of 99 
## Simulation 84 out of 99 
## Simulation 85 out of 99 
## Simulation 86 out of 99 
## Simulation 87 out of 99 
## Simulation 88 out of 99 
## Simulation 89 out of 99 
## Simulation 90 out of 99 
## Simulation 91 out of 99 
## Simulation 92 out of 99 
## Simulation 93 out of 99 
## Simulation 94 out of 99 
## Simulation 95 out of 99 
## Simulation 96 out of 99 
## Simulation 97 out of 99 
## Simulation 98 out of 99 
## Simulation 99 out of 99

Observa-se que com a variável Precip no MLG obteve-se um número maior de pontos fora do envelope, assim o modlo q melhor se adequa aos dados é o modelo sem a variável Precip.

influenceIndexPlot(fit3, vars = c('Studentized','Cook','Hat'), id.n=10)

Retirando os pontos c(29,141,153,175,187) e rodando um novo modelo…

fit4<-glm.nb(IRA~Temp + Umid, data = dadost[-c(29,141,153,175,187),])
summary(fit3)

## 
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dadost, init.theta = 4.428413664, 
##     link = log)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.26621  -0.86707  -0.00378   0.61908   1.79729  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.371591   0.559763  16.742  < 2e-16 ***
## Temp        -0.082011   0.021540  -3.807  0.00014 ***
## Umid        -0.016606   0.002771  -5.993 2.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.4284) family taken to be 1)
## 
##     Null deviance: 254.44  on 191  degrees of freedom
## Residual deviance: 199.26  on 189  degrees of freedom
## AIC: 2555.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.428 
##           Std. Err.:  0.442 
## 
##  2 x log-likelihood:  -2547.577

summary(fit4)

## 
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dadost[-c(29, 141, 
##     153, 175, 187), ], init.theta = 4.681453477, link = log)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.17624  -0.86633  -0.02689   0.64024   1.83226  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.339802   0.560124  16.675  < 2e-16 ***
## Temp        -0.076033   0.022114  -3.438 0.000585 ***
## Umid        -0.018048   0.002801  -6.444 1.16e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.6815) family taken to be 1)
## 
##     Null deviance: 255.53  on 186  degrees of freedom
## Residual deviance: 193.75  on 184  degrees of freedom
## AIC: 2481.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.681 
##           Std. Err.:  0.475 
## 
##  2 x log-likelihood:  -2473.856

difinter<- 9.371591-9.339802
porcent<- difinter*100/9.371591
porcent

## [1] 0.339206

#
diftemp<- 0.082011-0.076033
porcent2<- diftemp*100/0.082011
porcent2

## [1] 7.289266

#
difumid<- (0.016606-0.018048)*-1
porcent3<- difumid*100/0.0182615
porcent3

## [1] 7.896394

Como ocorreu uma variação maior que 5% em pelo menos uma das variáveis do modelo (para este caso nas variaveis Temp e Umid) não poderá ser retirado os dados nas posições c(29,141,153,175,187) do MLG!

Gráfico de sésries temporais IRA vs Variavéis

par(mar = c(4,4,1,4))

plot(IRA, type = "l", col = "red", lwd = 1,
     ylab= "Infecção respiratória aguda",
     bty = "n",
     xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192), las = 2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")
plot(Precip, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "blue")
axis(4,lty=2, col.axis = "black", c(0,100,200,300,400,500,600))
#axis(1)
mtext("Precipitação Pluviométrica média (mm)",side=4 , line = 3)
title(main="IRA vs Precip")

par(mar = c(4,4,1,4))

plot(IRA, type = "l", col = "red", lwd = 1,
     ylab= "Infecção respiratória aguda",
     bty = "n",
     xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192), las = 2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")

plot(Temp, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "blue")
axis(4,lty=2, col.axis = "black", c(20,21,22,23,24,25,26,27,28,29,30))
#axis(1)
mtext("Temperatura média (ºC)",side=4 , line = 3)
title(main="IRA vs Temp ")

par(mar = c(4,4,1,4))

plot(IRA, type = "l", col = "red", lwd = 1,
     ylab= "Infecção respiratória aguda",
     bty = "n",
     xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192), las = 2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")

plot(Umid, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "blue")
axis(4,lty=2, col.axis = "black", c(40,50,60,70,80,90,100))
#axis(1)
mtext("Umidade (%)",side=4 , line = 3)
title(main="IRA vs Umid ")

Nota-se pelo gráfico de series temporais pareado que ambas as variavas tem sazonalidade, porém a variável ira decresce com o passar dos meses enquanto as variáveis precip, temp e umid aparentemente também têm sazonalidade, porém suas variações se mantem em torno da média com variâncias versátil ao longo do tempo.

Série Temporal

Pacotes Necessários

## Loading required package: TSA

## 
## Attaching package: 'TSA'

## The following objects are masked from 'package:timeDate':
## 
##     kurtosis, skewness

## The following objects are masked from 'package:stats':
## 
##     acf, arima

## The following object is masked from 'package:utils':
## 
##     tar

## Loading required package: tseries

## Loading required package: forecast

## Loading required package: urca

## 
## Attaching package: 'urca'

## The following object is masked from 'package:ExpDes':
## 
##     plotres

## Loading required package: uroot

Formatação dos dados para uma série temporal

par(mfrow=c(2,2))
serie <- ts(IRA,start = c(1999,3), frequency=12)
plot(serie,type='o', xlab = "Período", ylab = "IRA", main = "Número de pessoas com infecção respiratória aguda")
serie2 <- ts(precip,start = c(1999,3), frequency=12)
plot(serie2,type='o', xlab = "Período", ylab = "Precipitação", main = "Precipitação(em mm) média por mês")
serie3 <- ts(Temp,start = c(1999,3), frequency=12)
plot(serie3,type='o', xlab = "Período", ylab = "Temperatura", main = "Temperatura(em °C) média por mês")
serie4 <- ts(Umid,start = c(1999,3), frequency=12)
plot(serie4,type='o', xlab = "Período", ylab = "Umidade", main = "Umidade(em %) média por mês")

Como se comentou anteriormente observa-se que a Variável IRA tem uma tendência, já as demais variáveis parecem ter apenas sazonalidade, onde, a variável IRA não demonstra estacionariedade, já as outras três variáveis aparentemente parecem ser estacionarias. Função de autocorrelação e autocorrelação parcial

#IRA
adf.test(serie)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  serie
## Dickey-Fuller = -6.073, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

#precip
adf.test(serie2)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  serie2
## Dickey-Fuller = -4.3194, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

#Temp
adf.test(serie3)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  serie3
## Dickey-Fuller = -7.2596, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

#Umid
adf.test(serie4)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  serie4
## Dickey-Fuller = -10.443, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

Observa-se pelos adf.test ,que ambas as séries são estacionárias no teste aplicado.

Histogramas das séries

par(mfrow=c(2,2))
hist(serie, prob=T, xlab="Percentual", ylab="Probabilidade", main="", col="pink")
hist(serie2, prob=T, xlab="Percentual", ylab="Probabilidade", main="", col="red")
hist(serie3, prob=T, xlab="Percentual", ylab="Probabilidade", main="", col="blue")
hist(serie4, prob=T, xlab="Percentual", ylab="Probabilidade", main="", col="yellow")

a<-decompose(serie, type=c("additive", "multiplicative"), filter=NULL)
plot(a)

b<-decompose(serie2, type=c("additive", "multiplicative"), filter=NULL)
plot(b)

c<-decompose(serie3, type=c("additive", "multiplicative"), filter=NULL)
plot(c)

d<-decompose(serie4, type=c("additive", "multiplicative"), filter=NULL)
plot(d)

Os gráficos acima permite observar a tendência, seguido do gráfico de sazonalidade e por fim o de resíduos.

correlograma

par(mfrow=c(2,2))
acf(serie, main="Correlograma", lag.max=90)
acf(serie2, main="Correlograma", lag.max=90)
acf(serie3, main="Correlograma", lag.max=90)
acf(serie4, main="Correlograma", lag.max=90)

OObserva-se pelas funções de ACF que a variável Ira,Temp e Humid como suspeitava-se não segue estacionariedade dado ao seu lento decaimento, porém o teste afirma o contrário. Fazendo a diferença tem que…

acf(diff(diff(serie)), main="Correlograma da 2a diferença", lag.max=90)

acf(diff(diff(serie3)), main="Correlograma da 2a diferença", lag.max=90)

acf(diff(diff(serie4)), main="Correlograma da 2a diferença", lag.max=90)

A acf da 2a diferença da série (checar a estacionariedade via teste específico!) evidencia um decaimento rápido. Os picos que aparecem ao longo do gráfico sugerem forte sazonalidade na série e também estacionariedade

par(mfrow=c(2,2))
#IRA
pacf(serie,lag.max=90, main = "Correlograma Parcial IRA")
#Precip
pacf(serie2,lag.max=90, main = "Correlograma Parcial Precip")
#Temp
pacf(serie3,lag.max=90, main = "Correlograma Parcial Temp")
#Umid
pacf(serie4,lag.max=90, main = "Correlograma Parcial Umid")

Os picos na PACF representam as fortes correlações entre os meses da série.

Modelagem e Previsão

utilizaremos os modelos SARIMA e Holt-Winters para a fazer modelos de previsão da variável IRA já que ela era a variável de interesse no estudo.

# Dado a ser predito
observado <- serie[length(serie)]; observado

## [1] 118

# Previsão 1 passo à frente (Sem última observação)
custo1p <- serie[-length(serie)]
serie1p <- ts(custo1p, start = c(1994,7), frequency = 12)

# Previsão 3 passo à frente (Sem última observação)
custo3p <- serie[-((length(serie)-2):length(serie))]
serie3p <- ts(custo3p, start = c(1994,7), frequency = 12)

# Previsão 6 passo à frente (Sem última observação)
custo6p <- serie[-((length(serie)-5):length(serie))]
serie6p <- ts(custo6p, start = c(1994,7), frequency = 12)

# Previsão 12 passo à frente (Sem última observação)
custo12p <- serie[-((length(serie)-11):length(serie))]
serie12p <- ts(custo12p, start = c(1994,7), frequency = 12)

Holt-Winters Aditivo

hwa1p <- HoltWinters(serie1p, seasonal = "additive")
hwa3p <- HoltWinters(serie3p, seasonal = "additive")
hwa6p <- HoltWinters(serie6p, seasonal = "additive")
hwa12p <- HoltWinters(serie12p, seasonal = "additive")

previsões

prev.hwa1p <- predict(hwa1p,1)[1]
prev.hwa3p <- predict(hwa3p,3)[3]
prev.hwa6p <- predict(hwa6p,6)[6]
prev.hwa12p <- predict(hwa12p,12)[12]

Erro relativo da previsão

abs((observado-prev.hwa1p))/observado*100

## [1] 38.01108

abs((observado-prev.hwa3p))/observado*100

## [1] 78.50087

abs((observado-prev.hwa6p))/observado*100

## [1] 22.59027

abs((observado-prev.hwa12p))/observado*100

## [1] 13.1265

Holt-Winters multiplicativo

hwm1p <- HoltWinters(serie1p, seasonal = "multiplicative")
hwm3p <- HoltWinters(serie3p, seasonal = "multiplicative")
hwm6p <- HoltWinters(serie6p, seasonal = "multiplicative")
hwm12p <- HoltWinters(serie12p, seasonal = "multiplicative")

Previsões

prev.hwm1p <- predict(hwm1p,1)[1]
prev.hwm3p <- predict(hwm3p,3)[3]
prev.hwm6p <- predict(hwm6p,6)[6]
prev.hwm12p <- predict(hwm12p,12)[12]

Erro relativo da previsão

abs((observado-prev.hwm1p))/observado*100

## [1] 4.110066

abs((observado-prev.hwm3p))/observado*100

## [1] 30.09018

abs((observado-prev.hwm6p))/observado*100

## [1] 24.47744

abs((observado-prev.hwm12p))/observado*100

## [1] 7.617659

Nota-se pelo erro relativo da previsão que o modelo multiplicativo foi quem mais se aproximou na previsão dos verdadeiros valores.

Gráficos

#Holt-Winters Aditivo
hwa1pp <- hw(serie1p)
hwa3pp <- hw(serie3p)
hwa6pp <- hw(serie6p)
hwa12pp <- hw(serie12p)

#Holt-Winters Multiplicativo
hwm1pp <- hw(serie1p, damped = T, seasonal = "multiplicative")
hwm3pp <- hw(serie3p, damped = T, seasonal = "multiplicative")
hwm6pp <- hw(serie6p, damped = T, seasonal = "multiplicative")
hwm12pp <- hw(serie12p, damped = T, seasonal = "multiplicative")

Gráficos para HW aditivo

plot(hwa12p, xlab = "Periodo", ylab = "Observado/Ajustado", main = "Holt-Winters Aditivo")

plot(hwa12pp, xlab = "Periodo", ylab = "Percentual", main = "Previsão de Holt-Winters Aditivo")

Observa-se pelo plot do modelo de Holt-Winters Aditivo que o modelo se adequou bem aos dados, porem os intervalos de predições estão bem largos. ####Gráficos para HW multiplicativo

plot(hwm12p, xlab = "Periodo", ylab = "Observado/Ajustado", main = "Holt-Winters Multiplicativo")

plot(hwm12pp, xlab = "Periodo", ylab = "Percentual", main = "Previsão de Holt-Winters Multiplicativo")

Observa-se pelo plot dos modelos de Holt-Winters Aditivo e multiplicativo que o modelo se adequou bem aos dados, porém os intervalos de predições estão bem largos no modelo aditivo, já no multiplicativo está bem menor esse intervalo, assim considerasse que o modelo multiplicativo tem melhor previsões que o aditivo.

Modelagem Sarima

De mesma forma q foi feito com o método de Hotl-winters será feito com o método de modelagem SARIMA para a variável IRA.

#1p
auto.arima(serie1p)

## Series: serie1p 
## ARIMA(2,1,2)(2,1,2)[12] 
## 
## Coefficients:
##           ar1      ar2     ma1      ma2    sar1    sar2     sma1    sma2
##       -0.5704  -0.0609  0.2281  -0.5144  0.4090  -0.422  -1.0784  0.6042
## s.e.   0.1401   0.1350  0.1227   0.1162  0.2169   0.115   0.2158  0.1880
## 
## sigma^2 estimated as 5015:  log likelihood=-1012.87
## AIC=2043.74   AICc=2044.81   BIC=2072.38

#2p
auto.arima(serie3p)

## Series: serie3p 
## ARIMA(1,1,1)(2,1,1)[12] 
## 
## Coefficients:
##          ar1      ma1     sar1     sar2     sma1
##       0.2946  -0.7295  -0.1160  -0.2228  -0.5842
## s.e.  0.1164   0.0791   0.1348   0.1056   0.1328
## 
## sigma^2 estimated as 5550:  log likelihood=-1010.88
## AIC=2033.76   AICc=2034.26   BIC=2052.78

#6P
auto.arima(serie6p)

## Series: serie6p 
## ARIMA(2,1,1)(2,1,1)[12] 
## 
## Coefficients:
##           ar1      ar2      ma1     sar1     sar2     sma1
##       -0.1693  -0.3334  -0.2006  -0.1534  -0.2483  -0.5322
## s.e.   0.3171   0.1198   0.3538   0.1314   0.1019   0.1310
## 
## sigma^2 estimated as 5524:  log likelihood=-992.53
## AIC=1999.05   AICc=1999.73   BIC=2021.13

#12p
auto.arima(serie12p)

## Series: serie12p 
## ARIMA(3,1,1)(2,1,1)[12] 
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     sar1     sar2     sma1
##       0.3696  -0.1567  0.1950  -0.7565  -0.1331  -0.2334  -0.5474
## s.e.  0.1678   0.0987  0.1059   0.1450   0.1373   0.1056   0.1367
## 
## sigma^2 estimated as 5658:  log likelihood=-959.63
## AIC=1935.26   AICc=1936.18   BIC=1960.21

O modelo escolhido foi o SARIMA(2,1,2)(2,1,2),SARIMA(1,1,1)(2,1,1),SARIMA(2,1,1)(2,1,1) e SARIMA(3,1,1)(2,1,1)para 1p,3p,6p e 12p respectivamente, usando como critério de seleção o modelo de menor AIC.

Erro de relativo de previsão para a modelagem SARIMA

# Modelo selecionado 1 passo a frente
arimafit1p <- arima(serie1p, order = c(2,1,2), seasonal = list(order=c(2,1,2)))
predict(arimafit1p, 1)$pred

##           Jun
## 2010 120.0288

# Erro relativo de previsão
abs((observado-predict(arimafit1p, 1)$pred))/observado*100

##           Jun
## 2010 1.719353

Modelo selecionado 3 passo a frente

arimafit3p <- arima(serie3p, order = c(1,1,1), seasonal = list(order=c(2,1,1)))
predict(arimafit3p, 3)$pred[3]

## [1] 33.10168

# Erro relativo de previsão
abs((observado-predict(arimafit3p, 3)$pred[3]))/observado*100

## [1] 71.94773

Modelo selecionado 6 passo a frente

arimafit6p <- arima(serie6p, order = c(2,1,1), seasonal = list(order=c(2,1,1)))
predict(arimafit6p, 6)$pred[6]

## [1] 42.66261

abs((observado-predict(arimafit6p, 6)$pred[6]))/observado*100

## [1] 63.84524

Modelo selecionado 12 passo a frente

arimafit12p <- arima(serie12p, order = c(3,1,1), seasonal = list(order=c(2,1,1)))
predict(arimafit12p, 12)$pred[12]

## [1] 78.28494

abs((observado-predict(arimafit12p, 12)$pred[12]))/observado*100

## [1] 33.65683

Observando os erros relátivos observa-se que os modelos que tiveram menor erro foi o de 1 e 12 passos.

Decomposição em componentes não-observáveis

serie.stl <- stl(serie, "periodic")
plot(serie.stl)

Gráficos de diagnósticos

Nos modelos SARIMA, é interessante observar o gráfico de diagnóstico dos modelos, Com o objetivo de observar a independência, aleatoriedade e a normalidade dos resíduos. Vendo isso, realizou-se um gráfico para cada série(1p,3p,6p e 12p).

tsdiag(arimafit1p)

tsdiag(arimafit3p)

tsdiag(arimafit6p)

tsdiag(arimafit12p)

De acordo com os gráficos, pode-se observar que os resíduos aparentemente são aleatórios. Como também, o ACF dos resíduos aparentemente não tem nenhuma autocorrelação nas 4 séries. Por fim, não se pode concluir que a estatística suporta os modelos gerados. Para confirmar as suposições acima, é necessário aplicar um teste de aleatoriedade, independência e normalidade nos resíduos. Estes testes são, respectivamente, LB.test, runs e Shapiro.wilk.

Teste de normalidade dos resíduos

1p

LB.test(arimafit1p)

## 
##  Box-Ljung test
## 
## data:  residuals from  arimafit1p
## X-squared = 5.8281, df = 4, p-value = 0.2124

shapiro.test(residuals(arimafit1p))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(arimafit1p)
## W = 0.98132, p-value = 0.0119

runs(residuals(arimafit1p))

## $pvalue
## [1] 0.676
## 
## $observed.runs
## [1] 93
## 
## $expected.runs
## [1] 96.37173
## 
## $n1
## [1] 92
## 
## $n2
## [1] 99
## 
## $k
## [1] 0

3p

LB.test(arimafit3p)

## 
##  Box-Ljung test
## 
## data:  residuals from  arimafit3p
## X-squared = 13.079, df = 7, p-value = 0.0702

shapiro.test(residuals(arimafit3p))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(arimafit3p)
## W = 0.97388, p-value = 0.001318

runs(residuals(arimafit3p))

## $pvalue
## [1] 0.0802
## 
## $observed.runs
## [1] 83
## 
## $expected.runs
## [1] 95.47619
## 
## $n1
## [1] 96
## 
## $n2
## [1] 93
## 
## $k
## [1] 0

6p

LB.test(arimafit6p)

## 
##  Box-Ljung test
## 
## data:  residuals from  arimafit6p
## X-squared = 15.021, df = 6, p-value = 0.02009

shapiro.test(residuals(arimafit6p))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(arimafit6p)
## W = 0.98104, p-value = 0.01259

runs(residuals(arimafit6p))

## $pvalue
## [1] 0.342
## 
## $observed.runs
## [1] 87
## 
## $expected.runs
## [1] 93.95699
## 
## $n1
## [1] 91
## 
## $n2
## [1] 95
## 
## $k
## [1] 0

12p

LB.test(arimafit12p)

## 
##  Box-Ljung test
## 
## data:  residuals from  arimafit12p
## X-squared = 11.215, df = 5, p-value = 0.04728

shapiro.test(residuals(arimafit12p))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(arimafit12p)
## W = 0.97947, p-value = 0.009334

runs(residuals(arimafit12p))

## $pvalue
## [1] 0.0852
## 
## $observed.runs
## [1] 79
## 
## $expected.runs
## [1] 91
## 
## $n1
## [1] 90
## 
## $n2
## [1] 90
## 
## $k
## [1] 0

Observa-se que os resíduos seguem normalidade, como também são independentes para todos os 4 modelos, porém para a aleatoriedade apenas os modelos d e6 e 12 passos tiveram resíduos aleatórios.

Consultoria 3

Adenilson Borba Lopes da Silva

25 de março de 2019

descrição dos dados

Análise Descritiva

Tabela de correlação

Ajuste do modelo com a variável resposta seguindo a distribuição Poisson

Ajustando com a variável resposta seguindo a distribuição Binomial Negativa

Gráfico de sésries temporais IRA vs Variavéis

Série Temporal

Histogramas das séries

correlograma

Modelagem e Previsão

Holt-Winters Aditivo

previsões

Erro relativo da previsão

Holt-Winters multiplicativo

Previsões

Erro relativo da previsão

Gráficos

Gráficos para HW aditivo

Modelagem Sarima

Erro de relativo de previsão para a modelagem SARIMA

Modelo selecionado 3 passo a frente

Modelo selecionado 6 passo a frente

Modelo selecionado 12 passo a frente

Decomposição em componentes não-observáveis

Gráficos de diagnósticos

Teste de normalidade dos resíduos

1p

3p

6p

12p