I acknowledge I collaborated with Christian Webb.
A log transformation of the river flow data appears Gaussian:
elbe <- read.csv("~/Downloads/elbe.csv", sep = ";")
qqnorm(log(elbe$flow), main = "log(elbe$flow) Normal Q-Q Plot")
A timeseries is created from the transformed data:
elbe.ts = ts(log(elbe$flow), start = 1980, frequency = 12)
The decomposition is plotted below.
elbe.decomp = decompose(elbe.ts)
plot(elbe.decomp)
Qualitatively, the randomness data does appear somewhat like whitenoise.
To get a quick feel for the randomness data, it's normal Q-Q plot was calculated:
qqnorm(elbe.decomp$random)
qqline(elbe.decomp$random)
This data appears to come from a Gaussian distribution. Now, importantly, inspecting the autocovariance:
acf(ts(elbe.decomp$random), na.action = na.pass)
pacf(ts(elbe.decomp$random), na.action = na.pass)
This data is not convicingly white noise. Some ACF values at lag \( h>0 \) fall outside \( \pm 1.96/\sqrt{n} \) in a damped sinusoidal fashion, and there are PACF values significantly different from zero up to a lag of 8.
To confirm this hypothesis, we can compare the ACF/PACF to that of a true white noise process:
norm.ts = ts(rnorm(300))
acf(norm.ts)
pacf(norm.ts)
As expected, for \( h>0 \), this ACF does fall within \( \pm 1.96/\sqrt{n} \) implying the autocovariance is not significantly different from zero. The log of the river data cannot be considered white noise.
A simple loop was written to find the best \( \text{ARMA}(p,q) \) model for \( p+q\le 4 \), deciding based on the AIC. (Note: the question asks for the maximum AIC, but my understanding is it should be minimised.)
min_aic = Inf
best_pdq = c(0, 0)
for (p in 0:4) {
for (q in 0:(4 - p)) {
if (p > 0 || q > 0) {
elbe.arma = arima(elbe.decomp$random, order = c(p, 0, q))
print(c(p, q, elbe.arma$aic))
if (elbe.arma$aic < min_aic) {
min_aic = elbe.arma$aic
best_pdq = c(p, 0, q)
}
}
}
}
## [1] 0.0 1.0 304.3
## [1] 0.0 2.0 297.5
## [1] 0.0 3.0 257.7
## [1] 0.0 4.0 247.1
## [1] 1.0 0.0 306.9
## [1] 1.0 1.0 301.8
## [1] 1.0 2.0 247.2
## [1] 1.0 3.0 248.3
## [1] 2 0 299
## [1] 2.0 1.0 242.1
## [1] 2.0 2.0 238.7
## [1] 3 0 300
## [1] 3 1 243
## [1] 4.0 0.0 290.1
print(best_pdq)
## [1] 2 0 2
The ACF and PACF for the best model fit, \( \text{ARMA}(2, 2) \), were plotted:
elbe.arma = arima(elbe.decomp$random, order = best_pdq)
acf(elbe.arma$residuals, na.action = na.pass)
pacf(elbe.arma$residuals, na.action = na.pass)
The ACF and PACF appear within bounds and show no significant pattern.
A 12-month \( \text{ARIMA}(0,0,0)\times(0,1,0) \) process was fitted to the data:
elbe.arima <- arima(elbe.ts, order = c(0, 0, 0), seasonal = c(0, 1, 0))
acf(elbe.arima$residuals)
pacf(elbe.arima$residuals)
This time series satisfies \( (1-B^{12})X_t=Z_t \). As this is just the lag 12 difference series, there will be a strong negative correlation between pairs of data exactly one season apart.
The sample ACF is significantly non-zero for interger lags 0 and 1. This might imply a \( \text{AR}(1) \) model. The sample PACF is also significantly non-zero for integer lags 0 and 1. This might imply \( \text{MA}(1) \) model. Using this as the starting point, and examing the AIC for the fits, a \( \text{ARIMA}(0,0,0)\times(0,1,1) \) model is chosen. It yields the following ACF and PACF:
elbe.arima <- arima(elbe.ts, order = c(0, 0, 0), seasonal = c(0, 1, 1))
acf(elbe.arima$residuals)
pacf(elbe.arima$residuals)
This pulls the between-season ACF and PACF values closer to zero. Clearly there is still correlation between adjacent months to be explained.
The ACF in part 3b might suggest a AR process with \( p\in 2,\cdots,5 \). The PACF might suggest a MA process with \( q=2 \). A \( \text{ARIMA}(2,0,2)\times(0,1,1) \) model was chosen, yielding the following ACF and PACF:
elbe.arima <- arima(elbe.ts, order = c(2, 0, 2), seasonal = c(0, 1, 1))
acf(elbe.arima$residuals)
pacf(elbe.arima$residuals)
The ACF and PACF appear very similar to that seen of white noise in the second half of question 2b.
The coefficients can be found by printing the object:
elbe.arima
##
## Call:
## arima(x = elbe.ts, order = c(2, 0, 2), seasonal = c(0, 1, 1))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## 0.489 0.393 -0.083 -0.450 -0.935
## s.e. 0.293 0.247 0.277 0.142 0.064
##
## sigma^2 estimated as 0.223: log likelihood = -205, aic = 421.9
The model equation will be of the form:
\[ Y_t =(1-B^{12}) X_t \\ (1-\phi_1 B-\phi_2 B^2) Y_t = (1+\theta_1 B+\theta_2 B^2)(1+\Theta_1 B^{12})Z_t \]
From our fit, we find \( \phi_1=0.4887, \phi_2=0.3933, \theta_1=-0.0827, \theta_2=-0.4502, \Theta_1=-0.9351 \). Hence, the model equation is
\[ (1-0.4887B-0.3933B^2)(1-B^{12})X_t = (1-0.0827B-0.4502B^2)(1-0.9351B^{12})Z_t \]
where \( Z_t \sim WN(0, 0.2234) \).
As required, we fit a \( \text{ARMA}(3,0) \) model to the residuals of the seasonal ARIMA process.
elbe.arima <- arima(elbe.ts, order = c(0, 0, 0), seasonal = c(0, 1, 1))
elbe.arima.res.arma <- arima(elbe.arima$residuals, order = c(3, 0, 0))
elbe.arima.res.arma
##
## Call:
## arima(x = elbe.arima$residuals, order = c(3, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 intercept
## 0.412 -0.028 0.205 -0.097
## s.e. 0.057 0.061 0.057 0.067
##
## sigma^2 estimated as 0.228: log likelihood = -204.2, aic = 418.4
I was somewhat confused by the second half of this question, but my interpretation was to calculate Yule-Walker estimates for the time series of seasonal ARIMA residuals:
elbe.arima.res.yule <- ar(elbe.arima$residuals, method = "yule-walker")
elbe.arima.res.yule
##
## Call:
## ar(x = elbe.arima$residuals, method = "yule-walker")
##
## Coefficients:
## 1 2 3
## 0.410 -0.034 0.200
##
## Order selected 3 sigma^2 estimated as 0.233
A 95% confidence intermal of the \( \text{ARMA}(3,0) \) coefficients can be calculated:
# AR1
c(elbe.arima.res.arma$coef[1] - 1.96 * sqrt(elbe.arima.res.arma$var.coef[1,
1]), elbe.arima.res.arma$coef[1] + 1.96 * sqrt(elbe.arima.res.arma$var.coef[1,
1]))
## ar1 ar1
## 0.3013 0.5230
# AR2
c(elbe.arima.res.arma$coef[2] - 1.96 * sqrt(elbe.arima.res.arma$var.coef[2,
2]), elbe.arima.res.arma$coef[2] + 1.96 * sqrt(elbe.arima.res.arma$var.coef[2,
2]))
## ar2 ar2
## -0.14851 0.09225
# AR3
c(elbe.arima.res.arma$coef[3] - 1.96 * sqrt(elbe.arima.res.arma$var.coef[3,
3]), elbe.arima.res.arma$coef[3] + 1.96 * sqrt(elbe.arima.res.arma$var.coef[3,
3]))
## ar3 ar3
## 0.09352 0.31627
The Yule-Walker estimates fall within this interval (and likely in a interval of much higher confidence). Both models are not significantly different.