O banco de dados contém 5 variáveis: IRA = Número de pessoas com infecção respiratória aguda, Precip= nível de precipitação medio por mês em mm, Temp= Temperatura média por mês em °C, Umid=indice de umidade médio por mês em %, Os dados foram coletados entre janeiro de 1999 até dezembro de 2014, com o objetivo de saber se a variável IRA tem ligação com as demais variáveis.
dados<-read.table("C://Users//USER//Downloads//Mt3.txt", header = T)
head(dados)
## Mes.Ano IRA Precip Temp Umid
## 1 jan/99 170 413.8 25.39 86.15
## 2 fev/99 134 81.0 25.96 84.09
## 3 mar/99 241 86.4 25.49 88.12
## 4 abr/99 293 26.5 24.61 75.22
## 5 mai/99 402 3.9 22.17 69.27
## 6 jun/99 324 1.4 22.76 68.59
attach(dados)
require(fBasics)
## Loading required package: fBasics
## Loading required package: timeDate
## Loading required package: timeSeries
library(corrplot)
## corrplot 0.84 loaded
library(nortest)
require(ExpDes)
## Loading required package: ExpDes
##
## Attaching package: 'ExpDes'
## The following object is masked from 'package:stats':
##
## ccf
require(car)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:fBasics':
##
## densityPlot
require(psych)
## Loading required package: psych
##
## Attaching package: 'psych'
## The following object is masked from 'package:car':
##
## logit
## The following object is masked from 'package:fBasics':
##
## tr
## The following object is masked from 'package:timeSeries':
##
## outlier
require(hnp)
## Loading required package: hnp
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:ExpDes':
##
## ginv
Retirando a coluna referente ao tempo.
dadost<-dados[,c(2,3,4,5)]
describe(dadost)
## vars n mean sd median trimmed mad min max range
## IRA 1 192 431.05 215.66 426.50 421.09 263.90 103 904.00 801.00
## Precip 2 192 104.43 108.59 68.65 89.89 100.22 0 572.20 572.20
## Temp 3 192 25.17 1.61 25.59 25.25 1.22 21 28.74 7.74
## Umid 4 192 76.73 12.55 80.73 77.93 11.64 42 95.29 53.29
## skew kurtosis se
## IRA 0.24 -0.98 15.56
## Precip 1.10 0.95 7.84
## Temp -0.52 -0.40 0.12
## Umid -0.74 -0.44 0.91
summary(dadost)
## IRA Precip Temp Umid
## Min. :103.0 Min. : 0.00 Min. :21.00 Min. :42.00
## 1st Qu.:229.5 1st Qu.: 10.93 1st Qu.:24.20 1st Qu.:68.51
## Median :426.5 Median : 68.65 Median :25.59 Median :80.73
## Mean :431.1 Mean :104.43 Mean :25.17 Mean :76.73
## 3rd Qu.:591.8 3rd Qu.:168.18 3rd Qu.:26.21 3rd Qu.:86.94
## Max. :904.0 Max. :572.20 Max. :28.74 Max. :95.29
Observa-se a seguir uma breve analise descritiva dos dados. Nele observam-se as médias, desvios padrões, medianas, valor de máximo e mínimo, como também os valores de assimetria e Curtose. Valores que se destacam entre essas estatísticas são: os valores de Curtoses, desvios padrões e amplitude das variáveis IRA e Precip.
par(mfrow=c(2,2))
boxplot(dadost$IRA, las= 2)
boxplot(dadost$Temp, las= 2)
boxplot(dadost$Precip, las= 2)
boxplot(dadost$Umid, las= 2)
As variáveis Temp e Precip foram as únicas variáveis que tiveram pontos discrepantes.
par(mfrow=c(2,2))
histPlot(as.timeSeries(IRA))
histPlot(as.timeSeries(Precip))
histPlot(as.timeSeries(Temp))
histPlot(as.timeSeries(Umid))
par(mfrow=c(2,2))
densityPlot(as.timeSeries(IRA))
densityPlot(as.timeSeries(Precip))
densityPlot(as.timeSeries(Temp))
densityPlot(as.timeSeries(Umid))
Usando como base os histogramas e funções de densidades, observa-se que nenhuma das quatro variáveis aparenta seguir uma distribuição normal, em seguida irá ser fazer um teste de normalidade, junto ao qqplot de cada variável.
par(mfrow=c(2,2))
qqPlot(IRA)
## [1] 19 45
qqPlot(Temp)
## [1] 19 141
qqPlot(precip)
## Mobile Phoenix
## 1 3
qqPlot(Umid)
## [1] 80 8
ad.test(IRA)
##
## Anderson-Darling normality test
##
## data: IRA
## A = 1.9645, p-value = 5.049e-05
ad.test(Precip)
##
## Anderson-Darling normality test
##
## data: Precip
## A = 8.0584, p-value < 2.2e-16
ad.test(Temp)
##
## Anderson-Darling normality test
##
## data: Temp
## A = 4.3088, p-value = 9.714e-11
ad.test(Umid)
##
## Anderson-Darling normality test
##
## data: Umid
## A = 4.8197, p-value = 5.67e-12
Observando os QQplots das variáveis e também usando o teste de Anderson Darling conclui-se que nenhuma das quatro variáveis seguem normalidade.
cor.test(IRA,Precip, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: IRA and Precip
## z = -4.5062, p-value = 6.599e-06
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## -0.2209575
cor.test(IRA,Temp, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: IRA and Temp
## z = -3.9666, p-value = 7.29e-05
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## -0.19303
cor.test(IRA,Umid, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: IRA and Umid
## z = -6.7018, p-value = 2.059e-11
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## -0.3257263
Usando o teste de Correlação pelo método de Kendall pode-se afirmar que nenhuma das 3 variáveis (Precip, Temp e Umid) tem correlação com IRA.
corrplot(cor(dadost), order = "hclust",tl.col = 'black', tl.cex = 0.75)
fit<- glm(IRA~ Precip + Temp + Umid , family = poisson(link = "log"))
summary(fit)
##
## Call:
## glm(formula = IRA ~ Precip + Temp + Umid, family = poisson(link = "log"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -20.5705 -8.1575 0.1349 6.8616 17.9029
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.346e+00 6.158e-02 151.779 < 2e-16 ***
## Precip 2.348e-04 4.671e-05 5.026 5.01e-07 ***
## Temp -8.277e-02 2.208e-03 -37.479 < 2e-16 ***
## Umid -1.631e-02 3.426e-04 -47.611 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 21614 on 191 degrees of freedom
## Residual deviance: 16300 on 188 degrees of freedom
## AIC: 17796
##
## Number of Fisher Scoring iterations: 4
1-pchisq(16300,188)
## [1] 0
hnp(fit, print = TRUE, verb = TRUE, xlab = "Quantis Teóricos")
## Poisson model
## Simulation 1 out of 99
## Simulation 2 out of 99
## Simulation 3 out of 99
## Simulation 4 out of 99
## Simulation 5 out of 99
## Simulation 6 out of 99
## Simulation 7 out of 99
## Simulation 8 out of 99
## Simulation 9 out of 99
## Simulation 10 out of 99
## Simulation 11 out of 99
## Simulation 12 out of 99
## Simulation 13 out of 99
## Simulation 14 out of 99
## Simulation 15 out of 99
## Simulation 16 out of 99
## Simulation 17 out of 99
## Simulation 18 out of 99
## Simulation 19 out of 99
## Simulation 20 out of 99
## Simulation 21 out of 99
## Simulation 22 out of 99
## Simulation 23 out of 99
## Simulation 24 out of 99
## Simulation 25 out of 99
## Simulation 26 out of 99
## Simulation 27 out of 99
## Simulation 28 out of 99
## Simulation 29 out of 99
## Simulation 30 out of 99
## Simulation 31 out of 99
## Simulation 32 out of 99
## Simulation 33 out of 99
## Simulation 34 out of 99
## Simulation 35 out of 99
## Simulation 36 out of 99
## Simulation 37 out of 99
## Simulation 38 out of 99
## Simulation 39 out of 99
## Simulation 40 out of 99
## Simulation 41 out of 99
## Simulation 42 out of 99
## Simulation 43 out of 99
## Simulation 44 out of 99
## Simulation 45 out of 99
## Simulation 46 out of 99
## Simulation 47 out of 99
## Simulation 48 out of 99
## Simulation 49 out of 99
## Simulation 50 out of 99
## Simulation 51 out of 99
## Simulation 52 out of 99
## Simulation 53 out of 99
## Simulation 54 out of 99
## Simulation 55 out of 99
## Simulation 56 out of 99
## Simulation 57 out of 99
## Simulation 58 out of 99
## Simulation 59 out of 99
## Simulation 60 out of 99
## Simulation 61 out of 99
## Simulation 62 out of 99
## Simulation 63 out of 99
## Simulation 64 out of 99
## Simulation 65 out of 99
## Simulation 66 out of 99
## Simulation 67 out of 99
## Simulation 68 out of 99
## Simulation 69 out of 99
## Simulation 70 out of 99
## Simulation 71 out of 99
## Simulation 72 out of 99
## Simulation 73 out of 99
## Simulation 74 out of 99
## Simulation 75 out of 99
## Simulation 76 out of 99
## Simulation 77 out of 99
## Simulation 78 out of 99
## Simulation 79 out of 99
## Simulation 80 out of 99
## Simulation 81 out of 99
## Simulation 82 out of 99
## Simulation 83 out of 99
## Simulation 84 out of 99
## Simulation 85 out of 99
## Simulation 86 out of 99
## Simulation 87 out of 99
## Simulation 88 out of 99
## Simulation 89 out of 99
## Simulation 90 out of 99
## Simulation 91 out of 99
## Simulation 92 out of 99
## Simulation 93 out of 99
## Simulation 94 out of 99
## Simulation 95 out of 99
## Simulation 96 out of 99
## Simulation 97 out of 99
## Simulation 98 out of 99
## Simulation 99 out of 99
Observa-se no envelope de simulação usando a distribuição Poisson como distribuição para a variável resposta, nota-se que o modelo linear generalizado não se adequou bem aos dados.
fit2<- glm.nb(IRA ~ Precip + Temp + Umid, data = dadost)
summary(fit2)
##
## Call:
## glm.nb(formula = IRA ~ Precip + Temp + Umid, data = dadost, init.theta = 4.439088697,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.25860 -0.90984 -0.00822 0.61173 1.82142
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.6310305 0.6520314 14.771 < 2e-16 ***
## Precip 0.0003234 0.0004354 0.743 0.457649
## Temp -0.0886262 0.0230982 -3.837 0.000125 ***
## Umid -0.0182615 0.0035407 -5.158 2.5e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.4391) family taken to be 1)
##
## Null deviance: 255.05 on 191 degrees of freedom
## Residual deviance: 199.24 on 188 degrees of freedom
## AIC: 2557.1
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.439
## Std. Err.: 0.443
##
## 2 x log-likelihood: -2547.086
1-pchisq(199.24/4.4391, 188)
## [1] 1
hnp(fit2, print = TRUE, verb = TRUE, xlab = "Quantis Teóricos", main ="Envelope de simulação com Precip")
## Negative binomial model (using MASS package)
## Simulation 1 out of 99
## Simulation 2 out of 99
## Simulation 3 out of 99
## Simulation 4 out of 99
## Simulation 5 out of 99
## Simulation 6 out of 99
## Simulation 7 out of 99
## Simulation 8 out of 99
## Simulation 9 out of 99
## Simulation 10 out of 99
## Simulation 11 out of 99
## Simulation 12 out of 99
## Simulation 13 out of 99
## Simulation 14 out of 99
## Simulation 15 out of 99
## Simulation 16 out of 99
## Simulation 17 out of 99
## Simulation 18 out of 99
## Simulation 19 out of 99
## Simulation 20 out of 99
## Simulation 21 out of 99
## Simulation 22 out of 99
## Simulation 23 out of 99
## Simulation 24 out of 99
## Simulation 25 out of 99
## Simulation 26 out of 99
## Simulation 27 out of 99
## Simulation 28 out of 99
## Simulation 29 out of 99
## Simulation 30 out of 99
## Simulation 31 out of 99
## Simulation 32 out of 99
## Simulation 33 out of 99
## Simulation 34 out of 99
## Simulation 35 out of 99
## Simulation 36 out of 99
## Simulation 37 out of 99
## Simulation 38 out of 99
## Simulation 39 out of 99
## Simulation 40 out of 99
## Simulation 41 out of 99
## Simulation 42 out of 99
## Simulation 43 out of 99
## Simulation 44 out of 99
## Simulation 45 out of 99
## Simulation 46 out of 99
## Simulation 47 out of 99
## Simulation 48 out of 99
## Simulation 49 out of 99
## Simulation 50 out of 99
## Simulation 51 out of 99
## Simulation 52 out of 99
## Simulation 53 out of 99
## Simulation 54 out of 99
## Simulation 55 out of 99
## Simulation 56 out of 99
## Simulation 57 out of 99
## Simulation 58 out of 99
## Simulation 59 out of 99
## Simulation 60 out of 99
## Simulation 61 out of 99
## Simulation 62 out of 99
## Simulation 63 out of 99
## Simulation 64 out of 99
## Simulation 65 out of 99
## Simulation 66 out of 99
## Simulation 67 out of 99
## Simulation 68 out of 99
## Simulation 69 out of 99
## Simulation 70 out of 99
## Simulation 71 out of 99
## Simulation 72 out of 99
## Simulation 73 out of 99
## Simulation 74 out of 99
## Simulation 75 out of 99
## Simulation 76 out of 99
## Simulation 77 out of 99
## Simulation 78 out of 99
## Simulation 79 out of 99
## Simulation 80 out of 99
## Simulation 81 out of 99
## Simulation 82 out of 99
## Simulation 83 out of 99
## Simulation 84 out of 99
## Simulation 85 out of 99
## Simulation 86 out of 99
## Simulation 87 out of 99
## Simulation 88 out of 99
## Simulation 89 out of 99
## Simulation 90 out of 99
## Simulation 91 out of 99
## Simulation 92 out of 99
## Simulation 93 out of 99
## Simulation 94 out of 99
## Simulation 95 out of 99
## Simulation 96 out of 99
## Simulation 97 out of 99
## Simulation 98 out of 99
## Simulation 99 out of 99
Já Usando a Binomial Negativa como distribuição da variável resposta obteve-se que se pelo envelope de simulação tem-se um bom MLG para os dados em estudos. Porém, obteve-se uma variável não significativa (Precip). Retirando a variável Precip
fit3<- glm.nb(IRA ~ Temp + Umid, data = dadost)
summary(fit3)
##
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dadost, init.theta = 4.428413664,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.26621 -0.86707 -0.00378 0.61908 1.79729
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.371591 0.559763 16.742 < 2e-16 ***
## Temp -0.082011 0.021540 -3.807 0.00014 ***
## Umid -0.016606 0.002771 -5.993 2.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.4284) family taken to be 1)
##
## Null deviance: 254.44 on 191 degrees of freedom
## Residual deviance: 199.26 on 189 degrees of freedom
## AIC: 2555.6
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.428
## Std. Err.: 0.442
##
## 2 x log-likelihood: -2547.577
1-pchisq(199.26/4.4284, 189)
## [1] 1
hnp(fit3, print = TRUE, verb = TRUE, xlab = "Quantis Teóricos", main ="Envelope de simulação sem Precip")
## Negative binomial model (using MASS package)
## Simulation 1 out of 99
## Simulation 2 out of 99
## Simulation 3 out of 99
## Simulation 4 out of 99
## Simulation 5 out of 99
## Simulation 6 out of 99
## Simulation 7 out of 99
## Simulation 8 out of 99
## Simulation 9 out of 99
## Simulation 10 out of 99
## Simulation 11 out of 99
## Simulation 12 out of 99
## Simulation 13 out of 99
## Simulation 14 out of 99
## Simulation 15 out of 99
## Simulation 16 out of 99
## Simulation 17 out of 99
## Simulation 18 out of 99
## Simulation 19 out of 99
## Simulation 20 out of 99
## Simulation 21 out of 99
## Simulation 22 out of 99
## Simulation 23 out of 99
## Simulation 24 out of 99
## Simulation 25 out of 99
## Simulation 26 out of 99
## Simulation 27 out of 99
## Simulation 28 out of 99
## Simulation 29 out of 99
## Simulation 30 out of 99
## Simulation 31 out of 99
## Simulation 32 out of 99
## Simulation 33 out of 99
## Simulation 34 out of 99
## Simulation 35 out of 99
## Simulation 36 out of 99
## Simulation 37 out of 99
## Simulation 38 out of 99
## Simulation 39 out of 99
## Simulation 40 out of 99
## Simulation 41 out of 99
## Simulation 42 out of 99
## Simulation 43 out of 99
## Simulation 44 out of 99
## Simulation 45 out of 99
## Simulation 46 out of 99
## Simulation 47 out of 99
## Simulation 48 out of 99
## Simulation 49 out of 99
## Simulation 50 out of 99
## Simulation 51 out of 99
## Simulation 52 out of 99
## Simulation 53 out of 99
## Simulation 54 out of 99
## Simulation 55 out of 99
## Simulation 56 out of 99
## Simulation 57 out of 99
## Simulation 58 out of 99
## Simulation 59 out of 99
## Simulation 60 out of 99
## Simulation 61 out of 99
## Simulation 62 out of 99
## Simulation 63 out of 99
## Simulation 64 out of 99
## Simulation 65 out of 99
## Simulation 66 out of 99
## Simulation 67 out of 99
## Simulation 68 out of 99
## Simulation 69 out of 99
## Simulation 70 out of 99
## Simulation 71 out of 99
## Simulation 72 out of 99
## Simulation 73 out of 99
## Simulation 74 out of 99
## Simulation 75 out of 99
## Simulation 76 out of 99
## Simulation 77 out of 99
## Simulation 78 out of 99
## Simulation 79 out of 99
## Simulation 80 out of 99
## Simulation 81 out of 99
## Simulation 82 out of 99
## Simulation 83 out of 99
## Simulation 84 out of 99
## Simulation 85 out of 99
## Simulation 86 out of 99
## Simulation 87 out of 99
## Simulation 88 out of 99
## Simulation 89 out of 99
## Simulation 90 out of 99
## Simulation 91 out of 99
## Simulation 92 out of 99
## Simulation 93 out of 99
## Simulation 94 out of 99
## Simulation 95 out of 99
## Simulation 96 out of 99
## Simulation 97 out of 99
## Simulation 98 out of 99
## Simulation 99 out of 99
Observa-se que com a variável Precip no MLG obteve-se um número maior de pontos fora do envelope, assim o modlo q melhor se adequa aos dados é o modelo sem a variável Precip.
influenceIndexPlot(fit3, vars = c('Studentized','Cook','Hat'), id.n=10)
Retirando os pontos c(29,141,153,175,187) e rodando um novo modelo…
fit4<-glm.nb(IRA~Temp + Umid, data = dadost[-c(29,141,153,175,187),])
summary(fit3)
##
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dadost, init.theta = 4.428413664,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.26621 -0.86707 -0.00378 0.61908 1.79729
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.371591 0.559763 16.742 < 2e-16 ***
## Temp -0.082011 0.021540 -3.807 0.00014 ***
## Umid -0.016606 0.002771 -5.993 2.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.4284) family taken to be 1)
##
## Null deviance: 254.44 on 191 degrees of freedom
## Residual deviance: 199.26 on 189 degrees of freedom
## AIC: 2555.6
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.428
## Std. Err.: 0.442
##
## 2 x log-likelihood: -2547.577
summary(fit4)
##
## Call:
## glm.nb(formula = IRA ~ Temp + Umid, data = dadost[-c(29, 141,
## 153, 175, 187), ], init.theta = 4.681453477, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.17624 -0.86633 -0.02689 0.64024 1.83226
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.339802 0.560124 16.675 < 2e-16 ***
## Temp -0.076033 0.022114 -3.438 0.000585 ***
## Umid -0.018048 0.002801 -6.444 1.16e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.6815) family taken to be 1)
##
## Null deviance: 255.53 on 186 degrees of freedom
## Residual deviance: 193.75 on 184 degrees of freedom
## AIC: 2481.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.681
## Std. Err.: 0.475
##
## 2 x log-likelihood: -2473.856
difinter<- 9.371591-9.339802
porcent<- difinter*100/9.371591
porcent
## [1] 0.339206
#
diftemp<- 0.082011-0.076033
porcent2<- diftemp*100/0.082011
porcent2
## [1] 7.289266
#
difumid<- (0.016606-0.018048)*-1
porcent3<- difumid*100/0.0182615
porcent3
## [1] 7.896394
Como ocorreu uma variação maior que 5% em pelo menos uma das variáveis do modelo (para este caso nas variaveis Temp e Umid) não poderá ser retirado os dados nas posições c(29,141,153,175,187) do MLG!
par(mar = c(4,4,1,4))
plot(IRA, type = "l", col = "red", lwd = 1,
ylab= "Infecção respiratória aguda",
bty = "n",
xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192), las = 2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")
plot(Precip, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "blue")
axis(4,lty=2, col.axis = "black", c(0,100,200,300,400,500,600))
#axis(1)
mtext("Precipitação Pluviométrica média (mm)",side=4 , line = 3)
title(main="IRA vs Precip")
par(mar = c(4,4,1,4))
plot(IRA, type = "l", col = "red", lwd = 1,
ylab= "Infecção respiratória aguda",
bty = "n",
xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192), las = 2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")
plot(Temp, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "blue")
axis(4,lty=2, col.axis = "black", c(20,21,22,23,24,25,26,27,28,29,30))
#axis(1)
mtext("Temperatura média (ºC)",side=4 , line = 3)
title(main="IRA vs Temp ")
par(mar = c(4,4,1,4))
plot(IRA, type = "l", col = "red", lwd = 1,
ylab= "Infecção respiratória aguda",
bty = "n",
xlab="Meses", axes = FALSE)
axis(1,c(0,12,24,36,48,60,72,84,96,108,120,132,144,156,168,180,192), las = 2)
axis(2,c(0,100,200,300,400,500,600,700,800,900,1000))
par(new=T)
#axis(2,col="black", col.axis = "black")
plot(Umid, type='l', frame=T, ann=F, axes = F, lty=2, lwd=1, bty="n", col = "blue")
axis(4,lty=2, col.axis = "black", c(40,50,60,70,80,90,100))
#axis(1)
mtext("Umidade (%)",side=4 , line = 3)
title(main="IRA vs Umid ")
Nota-se pelo gráfico de series temporais pareado que ambas as variavas tem sazonalidade, porém a variável ira decresce com o passar dos meses enquanto as variáveis precip, temp e umid aparentemente também têm sazonalidade, porém suas variações se mantem em torno da média com variâncias versátil ao longo do tempo.
Pacotes Necessários
## Loading required package: TSA
##
## Attaching package: 'TSA'
## The following objects are masked from 'package:timeDate':
##
## kurtosis, skewness
## The following objects are masked from 'package:stats':
##
## acf, arima
## The following object is masked from 'package:utils':
##
## tar
## Loading required package: tseries
## Loading required package: forecast
## Loading required package: urca
##
## Attaching package: 'urca'
## The following object is masked from 'package:ExpDes':
##
## plotres
## Loading required package: uroot
Formatação dos dados para uma série temporal
par(mfrow=c(2,2))
serie <- ts(IRA,start = c(1999,3), frequency=12)
plot(serie,type='o', xlab = "Período", ylab = "IRA", main = "Número de pessoas com infecção respiratória aguda")
serie2 <- ts(precip,start = c(1999,3), frequency=12)
plot(serie2,type='o', xlab = "Período", ylab = "Precipitação", main = "Precipitação(em mm) média por mês")
serie3 <- ts(Temp,start = c(1999,3), frequency=12)
plot(serie3,type='o', xlab = "Período", ylab = "Temperatura", main = "Temperatura(em °C) média por mês")
serie4 <- ts(Umid,start = c(1999,3), frequency=12)
plot(serie4,type='o', xlab = "Período", ylab = "Umidade", main = "Umidade(em %) média por mês")
Como se comentou anteriormente observa-se que a Variável IRA tem uma tendência, já as demais variáveis parecem ter apenas sazonalidade, onde, a variável IRA não demonstra estacionariedade, já as outras três variáveis aparentemente parecem ser estacionarias. Função de autocorrelação e autocorrelação parcial
#IRA
adf.test(serie)
##
## Augmented Dickey-Fuller Test
##
## data: serie
## Dickey-Fuller = -6.073, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
#precip
adf.test(serie2)
##
## Augmented Dickey-Fuller Test
##
## data: serie2
## Dickey-Fuller = -4.3194, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#Temp
adf.test(serie3)
##
## Augmented Dickey-Fuller Test
##
## data: serie3
## Dickey-Fuller = -7.2596, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
#Umid
adf.test(serie4)
##
## Augmented Dickey-Fuller Test
##
## data: serie4
## Dickey-Fuller = -10.443, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Observa-se pelos adf.test ,que ambas as séries são estacionárias no teste aplicado.
par(mfrow=c(2,2))
hist(serie, prob=T, xlab="Percentual", ylab="Probabilidade", main="", col="pink")
hist(serie2, prob=T, xlab="Percentual", ylab="Probabilidade", main="", col="red")
hist(serie3, prob=T, xlab="Percentual", ylab="Probabilidade", main="", col="blue")
hist(serie4, prob=T, xlab="Percentual", ylab="Probabilidade", main="", col="yellow")
a<-decompose(serie, type=c("additive", "multiplicative"), filter=NULL)
plot(a)
b<-decompose(serie2, type=c("additive", "multiplicative"), filter=NULL)
plot(b)
c<-decompose(serie3, type=c("additive", "multiplicative"), filter=NULL)
plot(c)
d<-decompose(serie4, type=c("additive", "multiplicative"), filter=NULL)
plot(d)
Os gráficos acima permite observar a tendência, seguido do gráfico de sazonalidade e por fim o de resíduos.
par(mfrow=c(2,2))
acf(serie, main="Correlograma", lag.max=90)
acf(serie2, main="Correlograma", lag.max=90)
acf(serie3, main="Correlograma", lag.max=90)
acf(serie4, main="Correlograma", lag.max=90)
OObserva-se pelas funções de ACF que a variável Ira,Temp e Humid como suspeitava-se não segue estacionariedade dado ao seu lento decaimento, porém o teste afirma o contrário. Fazendo a diferença tem que…
acf(diff(diff(serie)), main="Correlograma da 2a diferença", lag.max=90)
acf(diff(diff(serie3)), main="Correlograma da 2a diferença", lag.max=90)
acf(diff(diff(serie4)), main="Correlograma da 2a diferença", lag.max=90)
A acf da 2a diferença da série (checar a estacionariedade via teste específico!) evidencia um decaimento rápido. Os picos que aparecem ao longo do gráfico sugerem forte sazonalidade na série e também estacionariedade
par(mfrow=c(2,2))
#IRA
pacf(serie,lag.max=90, main = "Correlograma Parcial IRA")
#Precip
pacf(serie2,lag.max=90, main = "Correlograma Parcial Precip")
#Temp
pacf(serie3,lag.max=90, main = "Correlograma Parcial Temp")
#Umid
pacf(serie4,lag.max=90, main = "Correlograma Parcial Umid")
Os picos na PACF representam as fortes correlações entre os meses da série.
utilizaremos os modelos SARIMA e Holt-Winters para a fazer modelos de previsão da variável IRA já que ela era a variável de interesse no estudo.
# Dado a ser predito
observado <- serie[length(serie)]; observado
## [1] 118
# Previsão 1 passo à frente (Sem última observação)
custo1p <- serie[-length(serie)]
serie1p <- ts(custo1p, start = c(1994,7), frequency = 12)
# Previsão 3 passo à frente (Sem última observação)
custo3p <- serie[-((length(serie)-2):length(serie))]
serie3p <- ts(custo3p, start = c(1994,7), frequency = 12)
# Previsão 6 passo à frente (Sem última observação)
custo6p <- serie[-((length(serie)-5):length(serie))]
serie6p <- ts(custo6p, start = c(1994,7), frequency = 12)
# Previsão 12 passo à frente (Sem última observação)
custo12p <- serie[-((length(serie)-11):length(serie))]
serie12p <- ts(custo12p, start = c(1994,7), frequency = 12)
hwa1p <- HoltWinters(serie1p, seasonal = "additive")
hwa3p <- HoltWinters(serie3p, seasonal = "additive")
hwa6p <- HoltWinters(serie6p, seasonal = "additive")
hwa12p <- HoltWinters(serie12p, seasonal = "additive")
prev.hwa1p <- predict(hwa1p,1)[1]
prev.hwa3p <- predict(hwa3p,3)[3]
prev.hwa6p <- predict(hwa6p,6)[6]
prev.hwa12p <- predict(hwa12p,12)[12]
abs((observado-prev.hwa1p))/observado*100
## [1] 38.01108
abs((observado-prev.hwa3p))/observado*100
## [1] 78.50087
abs((observado-prev.hwa6p))/observado*100
## [1] 22.59027
abs((observado-prev.hwa12p))/observado*100
## [1] 13.1265
hwm1p <- HoltWinters(serie1p, seasonal = "multiplicative")
hwm3p <- HoltWinters(serie3p, seasonal = "multiplicative")
hwm6p <- HoltWinters(serie6p, seasonal = "multiplicative")
hwm12p <- HoltWinters(serie12p, seasonal = "multiplicative")
prev.hwm1p <- predict(hwm1p,1)[1]
prev.hwm3p <- predict(hwm3p,3)[3]
prev.hwm6p <- predict(hwm6p,6)[6]
prev.hwm12p <- predict(hwm12p,12)[12]
abs((observado-prev.hwm1p))/observado*100
## [1] 4.110066
abs((observado-prev.hwm3p))/observado*100
## [1] 30.09018
abs((observado-prev.hwm6p))/observado*100
## [1] 24.47744
abs((observado-prev.hwm12p))/observado*100
## [1] 7.617659
Nota-se pelo erro relativo da previsão que o modelo multiplicativo foi quem mais se aproximou na previsão dos verdadeiros valores.
#Holt-Winters Aditivo
hwa1pp <- hw(serie1p)
hwa3pp <- hw(serie3p)
hwa6pp <- hw(serie6p)
hwa12pp <- hw(serie12p)
#Holt-Winters Multiplicativo
hwm1pp <- hw(serie1p, damped = T, seasonal = "multiplicative")
hwm3pp <- hw(serie3p, damped = T, seasonal = "multiplicative")
hwm6pp <- hw(serie6p, damped = T, seasonal = "multiplicative")
hwm12pp <- hw(serie12p, damped = T, seasonal = "multiplicative")
plot(hwa12p, xlab = "Periodo", ylab = "Observado/Ajustado", main = "Holt-Winters Aditivo")
plot(hwa12pp, xlab = "Periodo", ylab = "Percentual", main = "Previsão de Holt-Winters Aditivo")
Observa-se pelo plot do modelo de Holt-Winters Aditivo que o modelo se adequou bem aos dados, porem os intervalos de predições estão bem largos. ####Gráficos para HW multiplicativo
plot(hwm12p, xlab = "Periodo", ylab = "Observado/Ajustado", main = "Holt-Winters Multiplicativo")
plot(hwm12pp, xlab = "Periodo", ylab = "Percentual", main = "Previsão de Holt-Winters Multiplicativo")
Observa-se pelo plot dos modelos de Holt-Winters Aditivo e multiplicativo que o modelo se adequou bem aos dados, porém os intervalos de predições estão bem largos no modelo aditivo, já no multiplicativo está bem menor esse intervalo, assim considerasse que o modelo multiplicativo tem melhor previsões que o aditivo.
De mesma forma q foi feito com o método de Hotl-winters será feito com o método de modelagem SARIMA para a variável IRA.
#1p
auto.arima(serie1p)
## Series: serie1p
## ARIMA(2,1,2)(2,1,2)[12]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sar2 sma1 sma2
## -0.5704 -0.0609 0.2281 -0.5144 0.4090 -0.422 -1.0784 0.6042
## s.e. 0.1401 0.1350 0.1227 0.1162 0.2169 0.115 0.2158 0.1880
##
## sigma^2 estimated as 5015: log likelihood=-1012.87
## AIC=2043.74 AICc=2044.81 BIC=2072.38
#2p
auto.arima(serie3p)
## Series: serie3p
## ARIMA(1,1,1)(2,1,1)[12]
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1
## 0.2946 -0.7295 -0.1160 -0.2228 -0.5842
## s.e. 0.1164 0.0791 0.1348 0.1056 0.1328
##
## sigma^2 estimated as 5550: log likelihood=-1010.88
## AIC=2033.76 AICc=2034.26 BIC=2052.78
#6P
auto.arima(serie6p)
## Series: serie6p
## ARIMA(2,1,1)(2,1,1)[12]
##
## Coefficients:
## ar1 ar2 ma1 sar1 sar2 sma1
## -0.1693 -0.3334 -0.2006 -0.1534 -0.2483 -0.5322
## s.e. 0.3171 0.1198 0.3538 0.1314 0.1019 0.1310
##
## sigma^2 estimated as 5524: log likelihood=-992.53
## AIC=1999.05 AICc=1999.73 BIC=2021.13
#12p
auto.arima(serie12p)
## Series: serie12p
## ARIMA(3,1,1)(2,1,1)[12]
##
## Coefficients:
## ar1 ar2 ar3 ma1 sar1 sar2 sma1
## 0.3696 -0.1567 0.1950 -0.7565 -0.1331 -0.2334 -0.5474
## s.e. 0.1678 0.0987 0.1059 0.1450 0.1373 0.1056 0.1367
##
## sigma^2 estimated as 5658: log likelihood=-959.63
## AIC=1935.26 AICc=1936.18 BIC=1960.21
O modelo escolhido foi o SARIMA(2,1,2)(2,1,2),SARIMA(1,1,1)(2,1,1),SARIMA(2,1,1)(2,1,1) e SARIMA(3,1,1)(2,1,1)para 1p,3p,6p e 12p respectivamente, usando como critério de seleção o modelo de menor AIC.
# Modelo selecionado 1 passo a frente
arimafit1p <- arima(serie1p, order = c(2,1,2), seasonal = list(order=c(2,1,2)))
predict(arimafit1p, 1)$pred
## Jun
## 2010 120.0288
# Erro relativo de previsão
abs((observado-predict(arimafit1p, 1)$pred))/observado*100
## Jun
## 2010 1.719353
arimafit3p <- arima(serie3p, order = c(1,1,1), seasonal = list(order=c(2,1,1)))
predict(arimafit3p, 3)$pred[3]
## [1] 33.10168
# Erro relativo de previsão
abs((observado-predict(arimafit3p, 3)$pred[3]))/observado*100
## [1] 71.94773
arimafit6p <- arima(serie6p, order = c(2,1,1), seasonal = list(order=c(2,1,1)))
predict(arimafit6p, 6)$pred[6]
## [1] 42.66261
abs((observado-predict(arimafit6p, 6)$pred[6]))/observado*100
## [1] 63.84524
arimafit12p <- arima(serie12p, order = c(3,1,1), seasonal = list(order=c(2,1,1)))
predict(arimafit12p, 12)$pred[12]
## [1] 78.28494
abs((observado-predict(arimafit12p, 12)$pred[12]))/observado*100
## [1] 33.65683
Observando os erros relátivos observa-se que os modelos que tiveram menor erro foi o de 1 e 12 passos.
serie.stl <- stl(serie, "periodic")
plot(serie.stl)
Nos modelos SARIMA, é interessante observar o gráfico de diagnóstico dos modelos, Com o objetivo de observar a independência, aleatoriedade e a normalidade dos resíduos. Vendo isso, realizou-se um gráfico para cada série(1p,3p,6p e 12p).
tsdiag(arimafit1p)
tsdiag(arimafit3p)
tsdiag(arimafit6p)
tsdiag(arimafit12p)
De acordo com os gráficos, pode-se observar que os resíduos aparentemente são aleatórios. Como também, o ACF dos resíduos aparentemente não tem nenhuma autocorrelação nas 4 séries. Por fim, não se pode concluir que a estatística suporta os modelos gerados. Para confirmar as suposições acima, é necessário aplicar um teste de aleatoriedade, independência e normalidade nos resíduos. Estes testes são, respectivamente, LB.test, runs e Shapiro.wilk.
LB.test(arimafit1p)
##
## Box-Ljung test
##
## data: residuals from arimafit1p
## X-squared = 5.8281, df = 4, p-value = 0.2124
shapiro.test(residuals(arimafit1p))
##
## Shapiro-Wilk normality test
##
## data: residuals(arimafit1p)
## W = 0.98132, p-value = 0.0119
runs(residuals(arimafit1p))
## $pvalue
## [1] 0.676
##
## $observed.runs
## [1] 93
##
## $expected.runs
## [1] 96.37173
##
## $n1
## [1] 92
##
## $n2
## [1] 99
##
## $k
## [1] 0
LB.test(arimafit3p)
##
## Box-Ljung test
##
## data: residuals from arimafit3p
## X-squared = 13.079, df = 7, p-value = 0.0702
shapiro.test(residuals(arimafit3p))
##
## Shapiro-Wilk normality test
##
## data: residuals(arimafit3p)
## W = 0.97388, p-value = 0.001318
runs(residuals(arimafit3p))
## $pvalue
## [1] 0.0802
##
## $observed.runs
## [1] 83
##
## $expected.runs
## [1] 95.47619
##
## $n1
## [1] 96
##
## $n2
## [1] 93
##
## $k
## [1] 0
LB.test(arimafit6p)
##
## Box-Ljung test
##
## data: residuals from arimafit6p
## X-squared = 15.021, df = 6, p-value = 0.02009
shapiro.test(residuals(arimafit6p))
##
## Shapiro-Wilk normality test
##
## data: residuals(arimafit6p)
## W = 0.98104, p-value = 0.01259
runs(residuals(arimafit6p))
## $pvalue
## [1] 0.342
##
## $observed.runs
## [1] 87
##
## $expected.runs
## [1] 93.95699
##
## $n1
## [1] 91
##
## $n2
## [1] 95
##
## $k
## [1] 0
LB.test(arimafit12p)
##
## Box-Ljung test
##
## data: residuals from arimafit12p
## X-squared = 11.215, df = 5, p-value = 0.04728
shapiro.test(residuals(arimafit12p))
##
## Shapiro-Wilk normality test
##
## data: residuals(arimafit12p)
## W = 0.97947, p-value = 0.009334
runs(residuals(arimafit12p))
## $pvalue
## [1] 0.0852
##
## $observed.runs
## [1] 79
##
## $expected.runs
## [1] 91
##
## $n1
## [1] 90
##
## $n2
## [1] 90
##
## $k
## [1] 0
Observa-se que os resíduos seguem normalidade, como também são independentes para todos os 4 modelos, porém para a aleatoriedade apenas os modelos d e6 e 12 passos tiveram resíduos aleatórios.