source: https://apps.who.int/flumart/Default?ReportNo=12
# source: https://apps.who.int/flumart/Default?ReportNo=12
usa_flu = read.csv("USA since 2000.csv")
head(usa_flu)
There’re type A and type B types of flu. Type A is the one which causes pandemic. That means: number of people got infected today affects number of people to be infected in the next days. For this reason, I’ll focus on Type A in time series analysis.
library(forecast)
library(vars)
library(tseries)
library(ggplot2)
There’s some missing data in the dataset. I have checked that the data is missing in non-flu periods. For this reason I have changed ‘na’ to zeros.
In the plot we can see a clear annual seasonality, besides the spikes become higher in the recent years.
inf_a=usa_flu$INF_A
inf_a[is.na(inf_a)]=0
plot.ts(inf_a, main = 'Number of Type A cases, 2000-2019')
This charts shows Flu A cases by week in different years. With the exception of 2009 anomaly, we can say that the Flu A spread starts in the last 10 weeks of each year, reaches the peak in the first 10 weeks of the next year (in winter), and completes by week 20 (May).
library(ggplot2)
inf_a_year$Year <- as.factor(inf_a_year$Year)
ggplot(inf_a_year, aes(x=Week, y=INF_A, color=Year)) + geom_line(size=1)
The following chart decomposes the series into overall trend, seasonal trend and residual. We can see the annual seasonality and the upward trend.
ts(inf_a[600:1000], frequency=52) %>%
decompose() %>%
autoplot()
The linear regression against the timeline also confirms the upward trend.
fit.lt<-lm(log(inf_a)~c(1:length(inf_a)))
summary(fit.lt)
Call:
lm(formula = log(inf_a) ~ c(1:length(inf_a)))
Residuals:
Min 1Q Median 3Q Max
-5.0688 -1.7909 -0.2739 1.9220 5.1038
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2755616 0.1397006 16.29 <2e-16 ***
c(1:length(inf_a)) 0.0046631 0.0002318 20.11 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.254 on 1041 degrees of freedom
Multiple R-squared: 0.2799, Adjusted R-squared: 0.2792
F-statistic: 404.6 on 1 and 1041 DF, p-value: < 2.2e-16
Therefore, the initial time series is non-stationary, since it has a trend and its variance increases with time. We’ll have to make transformations.
Now, if we look at transformed series, we can see no trend and no clear change in variance except for some spikes (can be caused by seasonality).
# to make log transformation we have to change zero values to 1
inf_a[inf_a==0]=1
plot.ts(diff(diff(log(inf_a),52)), main = 'Eliminating seasonality and different variance, 2000-2019')
The Augmented Dickey-Fuller provides p-value of 0.01, therefore, we can reject the nul hypothesis of non-stationarity.
adf.test(diff(log(inf_a)))
p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: diff(log(inf_a))
Dickey-Fuller = -8.2256, Lag order = 10, p-value = 0.01
alternative hypothesis: stationary
The distribution of the transformed data is close to normal.
# to make log transformation
plot(density(diff(log(inf_a))), xlim=c(-2, 2))
There’s a clear seasonality in the time series, for this reason I choose to build SARIMA model with 52-week period.
The below ACF and PACF chart helps to choose the order for the model. ACF has a strong spike at 1 and 52. PACF has 3 significant annual spikes (each 52 weeks), strong spike at order=1 and a a large number of significant coefficients at higher order, suggesting there can be additional seasonality (e.g. monthly or quartely)
acf(ts(diff(diff(log(inf_a)),52)), main='Time Series: ACF',lag.max=160)
acf(ts(diff(diff(log(inf_a)),52)), type='partial', main='Time Series: PACF',lag.max=160)
Below I make calculations for models of different order through the loop.
d=1, DD=1, period=52 p - AR order, given that PACF shows significant coefficients till order 10, I try p<=5. q - MA order, ACF shows high correlation at order 1. I choose q<=2 P - seasonal AR order, PACF shows at least 3 significant coefficients at order 52*k, so I try P up to 3 Q - seasonal MA order, Q<2 based on ACF shows only one higher correlation at order 52
The criteria to choose the model are: AIC (takes into account both model fit and number of variables), Standard Square Error, p-value under Ljungâ-Box test (tests the hypothesis that the model residials have no autocorrelation)
d=1
DD=1
per=52
for(p in 1:4){
for(q in 1:2){
for(P in 0:4){
for(Q in 0:2){
if(p+d+q+P+DD+Q<=10){
try({
model<-arima(x=log(inf_a), order = c((p),d,(q)), seasonal = list(order=c((P),DD,(Q)), period=per))
pval<-Box.test(model$residuals, lag=log(length(model$residuals)))
sse<-sum(model$residuals^2)
cat(p,d,q,P,DD,Q,per, 'AIC=', model$aic, ' SSE=',sse,' p-VALUE=', pval$p.value,'\n')
})
}
}
}
}
}
1 1 1 0 1 0 52 AIC= 1952.515 SSE= 414.0464 p-VALUE= 0.0003720067
1 1 1 0 1 1 52 AIC= 1609.494 SSE= 274.4512 p-VALUE= 0.01568693
1 1 1 0 1 2 52 AIC= 1605.789 SSE= 271.2157 p-VALUE= 0.009628169
1 1 1 1 1 0 52 AIC= 1791.348 SSE= 347.6244 p-VALUE= 0.003932652
1 1 1 1 1 1 52 AIC= 1606.255 SSE= 271.3042 p-VALUE= 0.009723602
NaNs produced
1 1 1 1 1 2 52 AIC= 1605.496 SSE= 271.2491 p-VALUE= 0.01186798
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [4]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
1 1 2 0 1 0 52 AIC= 1936.413 SSE= 406.5152 p-VALUE= 0.4656737
1 1 2 0 1 1 52 AIC= 1602.545 SSE= 271.6761 p-VALUE= 0.4186584
1 1 2 0 1 2 52 AIC= 1597.093 SSE= 267.446 p-VALUE= 0.4652095
1 1 2 1 1 0 52 AIC= 1781.761 SSE= 343.6688 p-VALUE= 0.3122438
1 1 2 1 1 1 52 AIC= 1597.651 SSE= 267.4684 p-VALUE= 0.4572092
1 1 2 1 1 2 52 AIC= 1597.087 SSE= 267.8017 p-VALUE= 0.4998312
1 1 2 2 1 0 52 AIC= 1682.049 SSE= 306.2127 p-VALUE= 0.6077648
1 1 2 2 1 1 52 AIC= 1598.934 SSE= 268.1432 p-VALUE= 0.4711275
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [5]
1 1 2 3 1 0 52 AIC= 1655.412 SSE= 295.5971 p-VALUE= 0.6012405
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [6]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
2 1 1 0 1 0 52 AIC= 1940.705 SSE= 408.2935 p-VALUE= 0.2002074
2 1 1 0 1 1 52 AIC= 1604.335 SSE= 272.1275 p-VALUE= 0.2234965
2 1 1 0 1 2 52 AIC= 1599.422 SSE= 268.1926 p-VALUE= 0.2334578
2 1 1 1 1 0 52 AIC= 1785.051 SSE= 344.7981 p-VALUE= 0.1290356
2 1 1 1 1 1 52 AIC= 1599.978 SSE= 268.2554 p-VALUE= 0.2264995
2 1 1 1 1 2 52 AIC= 1600.574 SSE= 268.1825 p-VALUE= 0.2924195
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [4]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
2 1 2 0 1 0 52 AIC= 1934.995 SSE= 405.0921 p-VALUE= 0.4480134
2 1 2 0 1 1 52 AIC= 1604.521 SSE= 271.6893 p-VALUE= 0.4377461
2 1 2 0 1 2 52 AIC= 1598.948 SSE= 267.4154 p-VALUE= 0.5173816
2 1 2 1 1 0 52 AIC= 1780.784 SSE= 342.6031 p-VALUE= 0.3244317
2 1 2 1 1 1 52 AIC= 1599.497 SSE= 267.4322 p-VALUE= 0.5032756
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [2]
2 1 2 2 1 0 52 AIC= 1685.218 SSE= 306.7462 p-VALUE= 0.3515636
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
NaNs produced
2 1 2 3 1 1 52 AIC= 1643.87 SSE= 288.038 p-VALUE= 0.04628945
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
3 1 1 0 1 0 52 AIC= 1940.612 SSE= 407.426 p-VALUE= 0.3054467
3 1 1 0 1 1 52 AIC= 1605.514 SSE= 271.8888 p-VALUE= 0.2885185
3 1 1 0 1 2 52 AIC= 1600.369 SSE= 267.8145 p-VALUE= 0.3162293
3 1 1 1 1 0 52 AIC= 1788.302 SSE= 345.7485 p-VALUE= 0.1530723
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [2]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
NaNs producedError in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
3 1 2 0 1 0 52 AIC= 1926.81 SSE= 400.9044 p-VALUE= 0.7519467
3 1 2 0 1 1 52 AIC= 1596.151 SSE= 269.1894 p-VALUE= 0.8273571
3 1 2 0 1 2 52 AIC= 1590.838 SSE= 265.0733 p-VALUE= 0.8660842
3 1 2 1 1 0 52 AIC= 1788.154 SSE= 344.4835 p-VALUE= 0.1533996
3 1 2 1 1 1 52 AIC= 1603.151 SSE= 267.9077 p-VALUE= 0.28098
3 1 2 1 1 2 52 AIC= 1602.378 SSE= 268.09 p-VALUE= 0.3327347
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [2]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
4 1 1 0 1 0 52 AIC= 1940.17 SSE= 406.4161 p-VALUE= 0.5679942
4 1 1 0 1 1 52 AIC= 1606.284 SSE= 271.6758 p-VALUE= 0.4557181
4 1 1 0 1 2 52 AIC= 1600.999 SSE= 267.5594 p-VALUE= 0.5026699
4 1 1 1 1 0 52 AIC= 1785.099 SSE= 343.4296 p-VALUE= 0.4571696
4 1 1 1 1 1 52 AIC= 1602.919 SSE= 268.7253 p-VALUE= 0.4412913
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [2]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
4 1 2 0 1 0 52 AIC= 1928.249 SSE= 400.6818 p-VALUE= 0.7454794
4 1 2 0 1 1 52 AIC= 1597.707 SSE= 269.0629 p-VALUE= 0.9014931
4 1 2 0 1 2 52 AIC= 1592.571 SSE= 265.0481 p-VALUE= 0.919265
4 1 2 1 1 0 52 AIC= 1773.841 SSE= 338.7486 p-VALUE= 0.9184249
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [7]
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
initial value in 'vmmin' is not finite
The simplest model which provides us satisfactory results is order=c(1,1,2), seasonal order=c(0,1,2)
1 1 2 0 1 2 52 AIC= 1597.093 SSE= 267.446 p-VALUE= 0.4652095
model<-arima(x=log(inf_a), order = c(1,1,2), seasonal = list(order=c(0,1,2), period=52))
summary(model)
Call:
arima(x = log(inf_a), order = c(1, 1, 2), seasonal = list(order = c(0, 1, 2),
period = 52))
Coefficients:
ar1 ma1 ma2 sma1 sma2
0.7086 -0.9123 0.2321 -0.7707 -0.1054
s.e. 0.0846 0.0878 0.0334 0.0409 0.0392
sigma^2 estimated as 0.2701: log likelihood = -792.55, aic = 1597.09
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.004452776 0.5063793 0.3086611 NaN Inf 0.9027238 0.0005539088
Box.test(model$residuals, lag=log(length(model$residuals)))
Box-Pierce test
data: model$residuals
X-squared = 6.6079, df = 6.9499, p-value = 0.4652
Let’s look at the residuals plot. The p-value of 0.46 from Ljungâ“Box test suggests that we cannot reject Null hypothesis that the residuals has no autocorrelation in its elements.
The distribution of residuals is close to normal, as we see on density and qqplot charts.
If we look at ACF and PACF we see some small, yet significant correlation at order 11, suggesting there might be quarterly seasonality. We can try to build a multiple seasonality model further, to take into account both annual and quarterly seasonality.
plot(model$residuals)
plot(density(model$residuals), xlim=c(-2, 2))
qqnorm(model$residuals, pch = 1, frame = FALSE)
qqline(model$residuals, col = "steelblue", lwd = 2)
acf(model$residuals)
acf(model$residuals, type='partial')
The forecast built for the year 2019 is close to the real trend. The real trend goes slightly higher in the number of cases, and goes down a bit later. This is caused by the fact that the peak of the infection happened in slightly different weeks and to different extent in the past years.
model<-arima(x=ts(head(log(inf_a), length(inf_a)-52), frequency=52), order = c(1, 1, 2), seasonal = list(order=c(0,1,2), period=52))
summary(model)
Call:
arima(x = ts(head(log(inf_a), length(inf_a) - 52), frequency = 52), order = c(1,
1, 2), seasonal = list(order = c(0, 1, 2), period = 52))
Coefficients:
ar1 ma1 ma2 sma1 sma2
0.7102 -0.9169 0.2340 -0.7733 -0.1116
s.e. 0.0861 0.0895 0.0344 0.0435 0.0407
sigma^2 estimated as 0.2818: log likelihood = -774.05, aic = 1560.11
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.005028831 0.5164377 0.315342 NaN Inf 0.9041208 0.0008099102
predicted=forecast(model, 52)
plot(predicted, xlim=c(15,21.3), xlab='Years')
lines(ts(tail(log(inf_a), 52), frequency=52, start=c(20, 4), end=c(21, 3)), col="red")
accuracy(predicted$mean,log(inf_a[992:1043]))
ME RMSE MAE MPE MAPE
Test set 0.1967238 0.8177204 0.6845314 2.307872 9.742873
The forecast built through Exponential smoothing of trend and seasonality shows significantly worse performance than ARIMA model.
model.hw<-HoltWinters(ts(head(log(inf_a), length(inf_a)-52), frequency=52), seasonal="additive",
beta=FALSE)
predicted.hw<-forecast(model.hw,52)
plot(predicted.hw, xlim=c(15,21), xlab = "Years")
lines(ts(tail(log(inf_a), 52), frequency=52, start=c(20, 4), end=c(21, 3)), col="red")
Let’s try to take into account both quarterly and annual seasonality, since in annual seasonal model we can see autocorrelation remaining in the residual at lag=11.
The below chart doesn’t show clear quarterly seasonality, especially in the later season. We can try to build the multi-seasonal model and compare the results.
log(inf_a)[500:1000] %>% msts( seasonal.periods = c(12, 52)) %>% mstl() %>% autoplot()
tbats.model = tbats(msts(log(inf_a[1:991]), seasonal.periods = c(52)))
tbats.forecast = forecast(tbats.model,h=52)
plot(tbats.forecast, xlim=c(15,21.3))
lines(ts(tail(log(inf_a), 52), frequency=52, start=c(20, 4), end=c(21, 3)), col="red")
stlm.model = stlm(msts(log(inf_a[1:991]), seasonal.periods = c(52)), s.window = "periodic")
stlm.forecast = forecast(stlm.model, h=52)
plot(stlm.forecast, xlim=c(15, 21))
lines(ts(tail(log(inf_a), 52), frequency=52, start=c(20, 4), end=c(21, 3)), col="red")
The below chart and accuracies shows that we get very similar forecasts from ARIMA, TBATS and STLM models.
plot(predicted, xlim=c(17,21), main='ARIMA and Multiple season models forecast')
lines(ts(log(inf_a)[992:1043], frequency=52, start=c(20, 4), end=c(21, 3)), col="red", lwd=2)
lines(stlm.forecast$mean, xlim=c(17,21), col='green', lwd=2)
lines(tbats.forecast$mean, xlim=c(17,21), col='orange', lwd=2)
accuracy(predicted$mean,log(inf_a[992:1043]))
ME RMSE MAE MPE MAPE
Test set 0.1781318 0.8176029 0.686788 1.977075 9.762576
accuracy(tbats.forecast$mean,inf_a_msts[992:1043])
ME RMSE MAE MPE MAPE
Test set 0.05522672 0.8903086 0.7403348 -0.0780284 10.59412
accuracy(stlm.forecast$mean,inf_a_msts[992:1043])
ME RMSE MAE MPE MAPE
Test set 0.3136793 1.001316 0.8590672 4.036048 12.39876
There studies confirming that Flu transmits rapidly in cold and dry weather. Let’s try to add information on temperature and precipitations and see how it influences our Flu cases data.
library(mvtnorm)
library(vars)
I downloaded the monthly data on mean temperatures and precipitation throughout USA from NOAA https://www.ncdc.noaa.gov/climate-information/statistical-weather-and-climate-information
Therefore, I adjusted the Flu time series accordingly, from weekly to monthly number of cases.
inf_a_month=read.csv("inf_a_month.csv")
temp_month=read.csv("temp_2019.csv")
temp_pcp=read.csv("pcp_2019.csv")
head(temp_pcp)
head(temp_month)
inf_var=cbind(inf_a_month[1:228, 'INF_A'], temp_month[1:228, 'Value'], temp_pcp[1:228, 'Value'])
colnames(inf_var)=c('inf_a', 'tmp', 'pcp')
If we look at the time series plots, we can see very clear annual seasonality in temperature data, and less clear in precipitation. Still, we can see clear structure in precipitation data. Unlike Flu A time series, there’s no clearly changing variation in temperature and precipitation data, therefore, only Flu data reqires log transformation. As for the trend, this can be dealt in VAR model itself.
par(mfrow=c(3,1))
for (i in c(1:3))
{plot.ts(inf_var[,i], main=colnames(inf_var)[i])}
par(mfrow=c(3,1))
for (i in c(1:3))
{acf(inf_var[,i], main=colnames(inf_var)[i])}
par(mfrow=c(3,1))
for (i in c(1:3))
{acf(inf_var[,i], main=colnames(inf_var)[i], type='partial')}
# log transformtation of INF A data
inf_var[,1][inf_var[,1]==0]=1
inf_var[,1]=log(inf_var[,1])
If we look at the pairs of the variables, we can see some correlation between Flu A cases and temperature which can be caused by their common seasonality. Also, there’re some correlation between temperature and precipitation.
pairs(inf_var)
The linear regression between logged Flu A cases and Temperature shows highly significan coefficient. However, if we remove the annual seasonality through differencing and apply the linear regression again, we see that the coefficient is not statistically significant anymore (p-value=0.61)
temp_pcp_lm12<-lm(inf_var[,1]~inf_var[,2])
temp_pcp_lm12<-lm(diff(inf_var[,1], 12)~diff(inf_var[,2], 12))
summary(temp_pcp_lm)
Call:
lm(formula = inf_var[, 1] ~ inf_var[, 2])
Residuals:
Min 1Q Median 3Q Max
-4.8773 -1.4688 0.1648 1.6885 5.9150
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.731312 0.518418 24.56 <2e-16 ***
inf_var[, 2] -0.125024 0.009326 -13.40 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.123 on 226 degrees of freedom
Multiple R-squared: 0.4429, Adjusted R-squared: 0.4405
F-statistic: 179.7 on 1 and 226 DF, p-value: < 2.2e-16
summary(temp_pcp_lm12)
Call:
lm(formula = diff(inf_var[, 1], 12) ~ diff(inf_var[, 2], 12))
Residuals:
Min 1Q Median 3Q Max
-6.9230 -0.9983 -0.2456 0.7899 6.6967
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.26244 0.14133 1.857 0.0647 .
diff(inf_var[, 2], 12) -0.02540 0.05042 -0.504 0.6150
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.077 on 214 degrees of freedom
Multiple R-squared: 0.001184, Adjusted R-squared: -0.003483
F-statistic: 0.2537 on 1 and 214 DF, p-value: 0.615
The correlation matrix with annual seasonality removed also shows very low correlations for all three variables.
cor(diff(inf_var,12))
inf_a tmp pcp
inf_a 1.00000000 -0.03441387 -0.01474485
tmp -0.03441387 1.00000000 -0.13818626
pcp -0.01474485 -0.13818626 1.00000000
Cross-correlation functions show correlations just slightly above the significance level for Flu A cases and Temperature at lags=3, 4, 12, 13
ccf(diff(inf_var[,1],12), diff(inf_var[,2],12), main='Flu A and Temperature')
ccf(diff(inf_var[,1],12), diff(inf_var[,3],12), main='Flu A and Precipitation')
ccf(diff(inf_var[,2],12), diff(inf_var[,3],12), main='Temperature and Precipitation')
The VAR model returns the significant coefficient for Flu A cases only from its own previous period (inf_a.l1), but not from other variables. From this, we can conclude that we can stick to the univariate model, and additional information from Temperature and Precipitation series would not improve the forecast.
summary(var.fit<-VAR(diff(inf_var,12), p=1, season=12))
VAR Estimation Results:
=========================
Endogenous variables: inf_a, tmp, pcp
Deterministic variables: const
Sample size: 215
Log Likelihood: -1064.114
Roots of the characteristic polynomial:
0.8163 0.1664 0.1538
Call:
VAR(y = diff(inf_var, 12), p = 1, season = 12L)
Estimation results for equation inf_a:
======================================
inf_a = inf_a.l1 + tmp.l1 + pcp.l1 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
Estimate Std. Error t value Pr(>|t|)
inf_a.l1 0.823519 0.040213 20.479 <2e-16 ***
tmp.l1 0.031803 0.030031 1.059 0.291
pcp.l1 0.082465 0.134681 0.612 0.541
const 0.043178 0.084000 0.514 0.608
sd1 -0.058655 0.412880 -0.142 0.887
sd2 0.043915 0.406917 0.108 0.914
sd3 0.011554 0.407035 0.028 0.977
sd4 -0.012299 0.406982 -0.030 0.976
sd5 0.101993 0.406987 0.251 0.802
sd6 -0.050036 0.406965 -0.123 0.902
sd7 0.008286 0.406921 0.020 0.984
sd8 -0.001140 0.406948 -0.003 0.998
sd9 0.039489 0.407017 0.097 0.923
sd10 -0.176745 0.407099 -0.434 0.665
sd11 -0.062933 0.406989 -0.155 0.877
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.221 on 200 degrees of freedom
Multiple R-Squared: 0.6775, Adjusted R-squared: 0.655
F-statistic: 30.02 on 14 and 200 DF, p-value: < 2.2e-16
Estimation results for equation tmp:
====================================
tmp = inf_a.l1 + tmp.l1 + pcp.l1 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
Estimate Std. Error t value Pr(>|t|)
inf_a.l1 -0.10706 0.09373 -1.142 0.2547
tmp.l1 0.18150 0.07000 2.593 0.0102 *
pcp.l1 0.06492 0.31392 0.207 0.8364
const 0.04969 0.19579 0.254 0.7999
sd1 -0.37914 0.96236 -0.394 0.6940
sd2 -0.59285 0.94846 -0.625 0.5326
sd3 -0.50746 0.94874 -0.535 0.5933
sd4 -0.52609 0.94861 -0.555 0.5798
sd5 -0.19475 0.94863 -0.205 0.8375
sd6 -0.23611 0.94857 -0.249 0.8037
sd7 -0.29289 0.94847 -0.309 0.7578
sd8 -0.40289 0.94853 -0.425 0.6715
sd9 -0.24408 0.94870 -0.257 0.7972
sd10 -0.45149 0.94889 -0.476 0.6347
sd11 -0.24915 0.94863 -0.263 0.7931
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.845 on 200 degrees of freedom
Multiple R-Squared: 0.04336, Adjusted R-squared: -0.0236
F-statistic: 0.6476 on 14 and 200 DF, p-value: 0.8226
Estimation results for equation pcp:
====================================
pcp = inf_a.l1 + tmp.l1 + pcp.l1 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
Estimate Std. Error t value Pr(>|t|)
inf_a.l1 -0.016692 0.021413 -0.779 0.4366
tmp.l1 -0.009309 0.015992 -0.582 0.5612
pcp.l1 0.131454 0.071718 1.833 0.0683 .
const 0.032740 0.044730 0.732 0.4651
sd1 -0.081332 0.219860 -0.370 0.7118
sd2 -0.042249 0.216685 -0.195 0.8456
sd3 -0.090927 0.216748 -0.420 0.6753
sd4 -0.079183 0.216720 -0.365 0.7152
sd5 -0.067949 0.216722 -0.314 0.7542
sd6 -0.101349 0.216711 -0.468 0.6405
sd7 -0.049541 0.216687 -0.229 0.8194
sd8 -0.027756 0.216701 -0.128 0.8982
sd9 -0.014740 0.216738 -0.068 0.9458
sd10 -0.027302 0.216782 -0.126 0.8999
sd11 -0.096465 0.216723 -0.445 0.6567
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.65 on 200 degrees of freedom
Multiple R-Squared: 0.02561, Adjusted R-squared: -0.0426
F-statistic: 0.3755 on 14 and 200 DF, p-value: 0.9804
Covariance matrix of residuals:
inf_a tmp pcp
inf_a 1.48988 0.1709 0.04028
tmp 0.17085 8.0943 -0.26218
pcp 0.04028 -0.2622 0.42247
Correlation matrix of residuals:
inf_a tmp pcp
inf_a 1.00000 0.0492 0.05077
tmp 0.04920 1.0000 -0.14178
pcp 0.05077 -0.1418 1.00000