This homework assignment includes problems from:
This accompanies readings from KJ 1,2 and 3 and HA 1,2,6,7 and 8.
Figure 8.31 (https://otexts.com/fpp2/arima-exercises.html) shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
According to the book “autocorrelation measures the linear relationship between lagged values of a time series” and the autocorrelation coefficients are plotted to show the autocorrelation function (ACF) in a plot called a correlogram. If there is no autocorrelation, this is called white noise and the autocorrelation should be close to 0. To qualify as white noise, 95% of the spikes in the ACF need to be within +/- 2/ (T^(1/2)) where T is the length of the time series. The bounds are plotted as the blue dashed lines on the correlogram. The difference among the figures is the length of the time series - that is why the blue dashed lines are at different values. For the longer time series, the bounds are the smallest and for the shortest time series the bounds are the largest. Because more than 95% of the spikes lie within the bounds for each of the figures - they all indicate white noise. Series X2 does appear to have potentially 2 spikes just outside of the bounds but we still qualified it as white noise because some of the other spikes are almost too small to see so we estimated that 95% of the spikes did still lie within the bounds.
The critical values, or the dashed blue lines, are at different distances from the mean of zero because the qualification for white noise defines the critical values as +/- 2/ (T^(1/2)), where T is the length of the time series. The longer the time series the narrower the bounds will be. The autocorrelations are different in each figure even though they all refer to white noise because there is random variation in the series (as each series is a series of random numbers) so the autocorrelations between the random numbers will also be random.
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
library(fpp2)
## Warning: package 'fpp2' was built under R version 4.2.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2 3.4.1 ✔ fma 2.5
## ✔ forecast 8.21 ✔ expsmooth 2.3
## Warning: package 'forecast' was built under R version 4.2.3
## Warning: package 'fma' was built under R version 4.2.3
## Warning: package 'expsmooth' was built under R version 4.2.3
##
data(ibmclose)
autoplot(ibmclose) + ggtitle("Daily closing IBM stock price series")
ggAcf(ibmclose)
ggPacf(ibmclose)
According to the book “a stationary time series is one whose properties do not depend on the time at which the series is observed” so therefor if there are trends or seasonality the time series is not stationary. Additionally it says for non-stationary data the ACF decreases slowly towards 0. The correlelogram with equally spaced peaks and slowly descending peaks indicates there is seasonality to the data so it is not stationary. A partial autocorrelation measures the relationship between yt and y(t-k) after removing the effects of the lags. Because of this the first PACF is the same as the first ACF because there is nothing to subtract.The peak of the PACF at 1 indicates there is a high autocorrelation between lag = 1.
Use R to simulate and plot some data from simple ARIMA models.
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
Ann AR(p) model is an autoregressive model with an order of p, in our case the order is 1. The order of 1 indicates how many lagged values of y are included - so in this case it is 1 lagged y value. The code above successfully generated the data from an AR(1) model with the given parameters.
The book says in autoregressive models to stationary data for an AR(1) model -1 < ϕ1 < 1
o1 <- c(-1, -0.5, 0, 0.5, 0.6, 1)
y <- ts(numeric(100))
e <- rnorm(100)
for (o in 1:length(o1)){
for(i in 2:100){
y[i] <- o1[o]*y[i-1] + e[i]
}
p <- autoplot(y)+ ggtitle(paste0("o1 = ", o1[o]))
print(p)
}
According to the book, when you have an AR(1) and ϕ1 = 0, yt is equivalent to white noise. When ϕ1 is 1 and c = 0 you have random walk. When ϕ1 < 0, yt oscillates near the mean. These properties are observed in the plots above.
See below.
theta1 <- c(-1, -0.5, 0, 0.5, 0.6, 1)
y <- ts(numeric(100))
e <- rnorm(100)
for (t in 1:length(theta1)){
for(i in 2:100){
y[i] <- theta1[t]*e[i-1] + e[i]
}
p <- autoplot(y)+ ggtitle(paste0("theta1 = ", theta1[t]))
print(p)
}
When the absolute value of theta is 1 the weights are a constant size and the older observations have equal says on the more recent observations. It is required that the absolute value is less than one so more recent observations have higher weights than past observations.
ARMA stands for autoregressive moving average. So we will combine the autoregressive and moving average functions to get y.
arma <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
arma[i] <- 0.6*arma[i-1] + 0.6*e[i-1] +e[i]
ar_2 <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
ar_2[i] <- -0.8*ar_2[i-1] + 0.3*ar_2[i-2] + e[i]
autoplot(arma) + ggtitle("ARMA(1,1)")
autoplot(ar_2) + ggtitle("AR(2)")
The AR(2) model is non-stationary and has increasing oscillations. Compared to the AR(1,1) model the AR(2) model also has larger values. The AR(1,1) appears stationary.
Consider austa, the total international visitors to Australia (in millions) for the period 1980-2015.
First, we will plot the international visitors to Australia to get an idea for the data.
data(austa)
autoplot(austa) +
ggtitle("International visitors to Australia") +
ylab("Total international visitors to Australia (in millions)")
Now, we can use auto.arima to select a model automatically.
fit <- auto.arima(austa)
summary(fit)
## Series: austa
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## 0.3006 0.1735
## s.e. 0.1647 0.0390
##
## sigma^2 = 0.03376: log likelihood = 10.62
## AIC=-15.24 AICc=-14.46 BIC=-10.57
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008313383 0.1759116 0.1520309 -1.069983 5.513269 0.7461559
## ACF1
## Training set -0.000571993
The model selected was ARIMA(0,1,1) with drift.
We need to check the residuals from the chosen model by plotting the ACF of the residuals and making sure they look like white noise.
residuals(fit) |> ggAcf()
The residuals are all within the bounds and therefor classify as white noise.
The next 10 time periods can be forecasted.
fit %>% forecast(h=10) %>% autoplot()
First we can create forecasts from the ARIMA(0,1,1) model with no drift.
fit2 <- Arima(austa, order= c(0,1,1))
fit2 %>% forecast(h=10) %>% autoplot()
Compared to part A rather than having an increasing trend there is now a flat line projection.
Next we can remove the MA term and replot.
fit3 <- Arima(austa, order= c(0,1,0))
fit3 %>% forecast(h=10) %>% autoplot()
When there is no drift with and without the moving average term the plot is the same.
fit4 <- Arima(austa, order= c(2,1,3), include.constant = TRUE)
fit4 %>% forecast(h=10) %>% autoplot()
Now removing the constant.
# fit5 <- Arima(austa, order= c(2,1,3), include.constant =FALSE)
# fit5 %>% forecast(h=10) %>% autoplot()
When trying to remove the constant an error occurs.
fit6 <- Arima(austa, order= c(0,0,1), include.constant =TRUE)
fit6 %>% forecast(h=10) %>% autoplot()
Now we can remove the MA term and plot again.
fit7 <- Arima(austa, order= c(0,0,0), include.constant =TRUE)
fit7 %>% forecast(h=10) %>% autoplot()
fit8 <- Arima(austa, order= c(0,2,1), include.constant =FALSE)
fit8 %>% forecast(h=10) %>% autoplot()