MATH3841 Assignment 2

Mitchell Ward 3330753

I acknowledge I collaborated with Christian Webb.

Part 1: time Series

Question 1

A log transformation of the river flow data appears Gaussian:

elbe <- read.csv("~/Downloads/elbe.csv", sep = ";")
qqnorm(log(elbe$flow), main = "log(elbe$flow) Normal Q-Q Plot")

plot of chunk unnamed-chunk-1

A timeseries is created from the transformed data:

elbe.ts = ts(log(elbe$flow), start = 1980, frequency = 12)

Question 2a

The decomposition is plotted below.

elbe.decomp = decompose(elbe.ts)
plot(elbe.decomp)

plot of chunk unnamed-chunk-3

Qualitatively, the randomness data does appear somewhat like whitenoise.

Question 2b

To get a quick feel for the randomness data, it's normal Q-Q plot was calculated:

qqnorm(elbe.decomp$random)
qqline(elbe.decomp$random)

plot of chunk unnamed-chunk-4

This data appears to come from a Gaussian distribution. Now, importantly, inspecting the autocovariance:

acf(ts(elbe.decomp$random), na.action = na.pass)

plot of chunk unnamed-chunk-5

pacf(ts(elbe.decomp$random), na.action = na.pass)

plot of chunk unnamed-chunk-5

This data is not convicingly white noise. Some ACF values at lag \( h>0 \) fall outside \( \pm 1.96/\sqrt{n} \) in a damped sinusoidal fashion, and there are PACF values significantly different from zero up to a lag of 8.

To confirm this hypothesis, we can compare the ACF/PACF to that of a true white noise process:

norm.ts = ts(rnorm(300))
acf(norm.ts)

plot of chunk unnamed-chunk-6

pacf(norm.ts)

plot of chunk unnamed-chunk-6

As expected, for \( h>0 \), this ACF does fall within \( \pm 1.96/\sqrt{n} \) implying the autocovariance is not significantly different from zero. The log of the river data cannot be considered white noise.

Question 2c

A simple loop was written to find the best \( \text{ARMA}(p,q) \) model for \( p+q\le 4 \), deciding based on the AIC. (Note: the question asks for the maximum AIC, but my understanding is it should be minimised.)

min_aic = Inf
best_pdq = c(0, 0)
for (p in 0:4) {
    for (q in 0:(4 - p)) {
        if (p > 0 || q > 0) {
            elbe.arma = arima(elbe.decomp$random, order = c(p, 0, q))
            print(c(p, q, elbe.arma$aic))
            if (elbe.arma$aic < min_aic) {
                min_aic = elbe.arma$aic
                best_pdq = c(p, 0, q)
            }
        }
    }
}
## [1]   0.0   1.0 304.3
## [1]   0.0   2.0 297.5
## [1]   0.0   3.0 257.7
## [1]   0.0   4.0 247.1
## [1]   1.0   0.0 306.9
## [1]   1.0   1.0 301.8
## [1]   1.0   2.0 247.2
## [1]   1.0   3.0 248.3
## [1]   2   0 299
## [1]   2.0   1.0 242.1
## [1]   2.0   2.0 238.7
## [1]   3   0 300
## [1]   3   1 243
## [1]   4.0   0.0 290.1

print(best_pdq)
## [1] 2 0 2

The ACF and PACF for the best model fit, \( \text{ARMA}(2, 2) \), were plotted:

elbe.arma = arima(elbe.decomp$random, order = best_pdq)
acf(elbe.arma$residuals, na.action = na.pass)

plot of chunk unnamed-chunk-8

pacf(elbe.arma$residuals, na.action = na.pass)

plot of chunk unnamed-chunk-8

The ACF and PACF appear within bounds and show no significant pattern.

Question 3a

A 12-month \( \text{ARIMA}(0,0,0)\times(0,1,0) \) process was fitted to the data:

elbe.arima <- arima(elbe.ts, order = c(0, 0, 0), seasonal = c(0, 1, 0))
acf(elbe.arima$residuals)

plot of chunk unnamed-chunk-9

pacf(elbe.arima$residuals)

plot of chunk unnamed-chunk-9

This time series satisfies \( (1-B^{12})X_t=Z_t \). As this is just the lag 12 difference series, there will be a strong negative correlation between pairs of data exactly one season apart.

Question 3b

The sample ACF is significantly non-zero for interger lags 0 and 1. This might imply a \( \text{AR}(1) \) model. The sample PACF is also significantly non-zero for integer lags 0 and 1. This might imply \( \text{MA}(1) \) model. Using this as the starting point, and examing the AIC for the fits, a \( \text{ARIMA}(0,0,0)\times(0,1,1) \) model is chosen. It yields the following ACF and PACF:

elbe.arima <- arima(elbe.ts, order = c(0, 0, 0), seasonal = c(0, 1, 1))
acf(elbe.arima$residuals)

plot of chunk unnamed-chunk-10

pacf(elbe.arima$residuals)

plot of chunk unnamed-chunk-10

This pulls the between-season ACF and PACF values closer to zero. Clearly there is still correlation between adjacent months to be explained.

Question 3c

The ACF in part 3b might suggest a AR process with \( p\in 2,\cdots,5 \). The PACF might suggest a MA process with \( q=2 \). A \( \text{ARIMA}(2,0,2)\times(0,1,1) \) model was chosen, yielding the following ACF and PACF:

elbe.arima <- arima(elbe.ts, order = c(2, 0, 2), seasonal = c(0, 1, 1))
acf(elbe.arima$residuals)

plot of chunk unnamed-chunk-11

pacf(elbe.arima$residuals)

plot of chunk unnamed-chunk-11

The ACF and PACF appear very similar to that seen of white noise in the second half of question 2b.

Question 3d

The coefficients can be found by printing the object:

elbe.arima
## 
## Call:
## arima(x = elbe.ts, order = c(2, 0, 2), seasonal = c(0, 1, 1))
## 
## Coefficients:
##         ar1    ar2     ma1     ma2    sma1
##       0.489  0.393  -0.083  -0.450  -0.935
## s.e.  0.293  0.247   0.277   0.142   0.064
## 
## sigma^2 estimated as 0.223:  log likelihood = -205,  aic = 421.9

The model equation will be of the form:

\[ Y_t =(1-B^{12}) X_t \\ (1-\phi_1 B-\phi_2 B^2) Y_t = (1+\theta_1 B+\theta_2 B^2)(1+\Theta_1 B^{12})Z_t \]

From our fit, we find \( \phi_1=0.4887, \phi_2=0.3933, \theta_1=-0.0827, \theta_2=-0.4502, \Theta_1=-0.9351 \). Hence, the model equation is

\[ (1-0.4887B-0.3933B^2)(1-B^{12})X_t = (1-0.0827B-0.4502B^2)(1-0.9351B^{12})Z_t \]

where \( Z_t \sim WN(0, 0.2234) \).

Question 3e

As required, we fit a \( \text{ARMA}(3,0) \) model to the residuals of the seasonal ARIMA process.

elbe.arima <- arima(elbe.ts, order = c(0, 0, 0), seasonal = c(0, 1, 1))
elbe.arima.res.arma <- arima(elbe.arima$residuals, order = c(3, 0, 0))
elbe.arima.res.arma
## 
## Call:
## arima(x = elbe.arima$residuals, order = c(3, 0, 0))
## 
## Coefficients:
##         ar1     ar2    ar3  intercept
##       0.412  -0.028  0.205     -0.097
## s.e.  0.057   0.061  0.057      0.067
## 
## sigma^2 estimated as 0.228:  log likelihood = -204.2,  aic = 418.4

I was somewhat confused by the second half of this question, but my interpretation was to calculate Yule-Walker estimates for the time series of seasonal ARIMA residuals:

elbe.arima.res.yule <- ar(elbe.arima$residuals, method = "yule-walker")
elbe.arima.res.yule
## 
## Call:
## ar(x = elbe.arima$residuals, method = "yule-walker")
## 
## Coefficients:
##      1       2       3  
##  0.410  -0.034   0.200  
## 
## Order selected 3  sigma^2 estimated as  0.233

A 95% confidence intermal of the \( \text{ARMA}(3,0) \) coefficients can be calculated:

# AR1
c(elbe.arima.res.arma$coef[1] - 1.96 * sqrt(elbe.arima.res.arma$var.coef[1, 
    1]), elbe.arima.res.arma$coef[1] + 1.96 * sqrt(elbe.arima.res.arma$var.coef[1, 
    1]))
##    ar1    ar1 
## 0.3013 0.5230

# AR2
c(elbe.arima.res.arma$coef[2] - 1.96 * sqrt(elbe.arima.res.arma$var.coef[2, 
    2]), elbe.arima.res.arma$coef[2] + 1.96 * sqrt(elbe.arima.res.arma$var.coef[2, 
    2]))
##      ar2      ar2 
## -0.14851  0.09225

# AR3
c(elbe.arima.res.arma$coef[3] - 1.96 * sqrt(elbe.arima.res.arma$var.coef[3, 
    3]), elbe.arima.res.arma$coef[3] + 1.96 * sqrt(elbe.arima.res.arma$var.coef[3, 
    3]))
##     ar3     ar3 
## 0.09352 0.31627

The Yule-Walker estimates fall within this interval (and likely in a interval of much higher confidence). Both models are not significantly different.