Question 9.1

Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Part a

Explain the differences among these figures. Do they all indicate that the data are white noise?

All the ACF plots in figure 9.32 indicate that the data are white noise. This is because no picture seems to have a pattern present in the ACF plot, which is a strong indication of white noise.

The main difference between these images is the range between the spikes. The first image on the left has the highest range, while the image on the right has the lowest. This is due to the sample size of each data as the left image only has 36 numbers while the image all the way to the right has 1,000 numbers.

Part b

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

In part a we mentioned that the sample size differse between all three ACF plots which creates a differnt range in the autocorrelation plots. It also creates different critical values. This is because the critical values are calculated by \(\pm \frac{1.96}{\sqrt{T}}\), where \(T\) is the sample size. Additionally, the autocorrelation does not match in each plot because the data was chosen at random. Random data should produce random autocorrelation.

Question 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The three plots above show us why Amazon daily closing price needs to be differenced and is non-stationary data. First if we look at the plot of closing price we notice an upward trend over the year until about 2019 when it starts declining. The trend is enough for us to confirm that the data is non-stationary, however, we can also look at the ACF plot and see that the ACF is not dropping to zero very quick. This is indcative of non-stationary data. The PACF shows very strong autocorrelation for 1-day lag, meaning that there is very strong autocorrelation between one closing price and the closing price of the previous day.

Question 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Part a

Turkish GDP from global_economy.

Using a \(\lambda\) of 0.16, we can normalize the data. To make it stationary we have to difference the data 1 time(s). Because GDP is not seasonal data we do not have to worry about seasonal differencing.

Looking at the plot above we see that using the Box-Cox transformation and a difference of order 1 we were able to create stationary data as the ACF and PACF plots show that the autocorrelation is equivalent to white noise.

Part b

Accommodation takings in the state of Tasmania from aus_accommodation.

Looking at the graph above we see that a \(\lambda\) of -0.05 normalizes the accommodations data for the Tasmania region. This data has very strong seasonal strength with a \(F_s =\) 0.98. This is much higher than the cutoff of \(0.64\) that Hyndman suggests. Therefore, we should apply a seasonal difference. Once applied we check using unitroot_ndiff() to see if a first difference needs to be applied. With a value of 0 we do not need to apply a first difference.

As we can see from the plot above the data looks much more stationary, with the ACF plot approaching zero much quicker than either the original or transformed data. We also note that the PACF plot shows that each subsequent data point is not as correlated with the previous data point.

Part c

Monthly sales from souvenirs.

With a \(\lambda\) of 0 the data has been transformed using a natural log. Because our seasonal strength \(F_s \gt 0.64\) with a value of 0.84 we need to apply seasonal differencing. Additionally, the unitroot_ndiffs() test suggest we should apply a first differencing of order 1.

Transforming the data and applying a seasonal difference and a first difference has transformed this data into a stationary series.

Question 9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The data seems to be slightly seasonal with a definite upward trend, therefore it is non-stationary.

Using a \(\lambda\) of 0.27, we an normalize the variation in seasonality. Because the seasonal strength, \(F_s \gt 0.64\) at 0.74, then 1 seasonal difference is suggested. Once we apply the seasonal differencing we check if we are also required to apply first difference to the data. Since the unit_root() test return a value of 0 we do not need to preform first difference. This means applying the seasonal differencing will be enough to make this data stationary.

Applying a transformation with a \(\lambda\) of 0.27 and seasonal differencing was enough to make this data stationary as we can see in the plots above.

Question 9.6

Simulate and plot some data from simple ARIMA models.

Part a

Use the following R code to generate data from an AR(1) model with \(\phi = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100){
  y[i] <- 0.6*y[i-1] + e[i]
}
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Part b

Produce a time plot for the series. How does the plot change as you change \(\phi_1\) ?

As we can see from the plot above changing \(\phi\) in an AR(1) model will change the variation in \(y\). When \(\phi\) is equal to 0 we get white noise as seen in the charts above. The further away from 0 you get the less stationary the data becomes. Regardless of directionality of \(/phi\). Both the AR(1) plot for \(\phi = 0.6\) and \(\phi = -.5\) have ACF and PACF plots that are not consistent with stationary data.

Part c & d

Write your own code to generate data from an MA(1) model with \(\theta = 0.6\) and \(\sigma^2 = 1\).

ma_model <- function(theta){
  y <- numeric(100)
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- theta*e[i-1] + e[i]
  }
  return(tsibble(idx=seq_len(100),y=y,index=idx))
}

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

Part e & f & g

Generate data from an ARMA(1,1) model with \(\phi_1=0.6,\space\theta_1=0.6\) and \(\sigma^2=1\).

arma_model <- function(phi,theta){
  y <- numeric(100)
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
  }
  return(tsibble(idx=seq_len(100),y=y,index=idx))
}

idx	y
1	0.0000000
2	0.8005605
3	-0.8432629
4	-0.7359091
5	0.0411989
6	-0.1493809

Generate data from an AR(2) model with \(\phi_1 = -0.8\) and \(\phi_2=0.3\) and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

ar2_model <- function(phi1,phi2){
  y <- numeric(100)
  e <- rnorm(100)
  for(i in 3:100){
    y[i] <- phi1*y[i-1] + phi2*y[i-2] + e[i]
  }
  return(tsibble(idx=seq_len(100),y=y,index=idx))
}

idx	y
1	0.0000000
2	0.0000000
3	-0.2089912
4	-1.0642098
5	0.4280669
6	-1.2089298

Graph the latter two series and compare them.

Since the \(\phi_1\) in the AR(2) model is so close to -1 the data should come out as rapidly oscillating from positive to negative, which it does. Since the \(\phi\) in ARMA(1,1) is > 1 then the ARMA(1,1) plot does not oscillate from positive to negative rapidly. The ARMA(1,1) model is both stationary and invertible given that \(|\phi_1| < 1\) and \(|\theta_1| < 1\). For the AR(2) model this data is not stationary since the \(\phi\) values do not fall within the constraints of stationary data. It does abide by the \(|\phi_2|<1\) and \(\phi_1+\phi_2<1\) constraints, but it does not satisfy the \(\phi_2-\phi_1<1\), since \(0.3-(-0.8) = 1.1 > 1\).

Question 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Part a

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

The residuals look like white noise. They follow a fairly normal distribution and none of the ACF values go above the critical values. Furthermore the residual plot does not show any trend or seasonality.

The ARIMA(0,2,1) model plots the data with an indefinite trend upward, up to about 95million passengers by the year 2025.

Part b

Write the model in terms of the backshift operator.

\((1-B)^{2}y_{t} = (1-0.8963\space B)\epsilon_t\)

Part c

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

The ARIMA(0,1,0) model looks very similar to the ARIMA(0,2,1) model in part a. The only difference I am able to see in the forecast is that the slope of the forecasted line is slightly smaller since this model only predicts around 90 million passengers by the year 2025.

Part d

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

The ARIMA(2,1,2) model does not produce any forecasts. It actually shows up as a NULL model in R. This is because there is no autoregressive part to the data since it is not stationary and the AR models are usually used for stationary data.

I am not sure how to add a constant to the model manually. However, it would not make much sense to do it here as the calculation for a constant requires the value for \(\phi\) which we don’t have in this model. Without a constant this forecast is the same as part a, adding a fixed constant will also produce the same forecast as part a.

Question 9.8

For the United States GDP series (from global_economy).

Part a

if necessary, find a suitable Box-Cox transformation for the data

First we want to account for the population changes so we change the variable of interest to GDP per capita.

It seems like a log transformation might be useful. We will use the Box-Cox transformation to see what lambda we get.

Here we see that the ACF is exponentially decaying and that the PACF has a significant spike at the \(p=1\) lag and non beyond that. This suggests that a \(p,\space d, \space 0\) ARIMA model would be a good fit.

Part b

Fit a suitable ARIMA model to the transformed data using ARIMA().

ARIMA() model selected
Model name	Orders
automatic	<ARIMA(1,1,0) w/ drift>

We can see from the model selection criteria that the best model is the automatic model which is equal to and ARIMAR(1,1,0) meaning an AR(1) model with first order differencing.

Part c

Try some other plausible models by experimenting with the orders chosen.

Model selection criteria
.model	AIC	AICc	BIC
automatic	161.6390	162.0919	167.7682
arima010	173.0634	173.2856	177.1495
arima020	171.7163	171.7904	173.7416
arima120	170.0983	170.3247	174.1490
arima210	163.6215	164.3907	171.7937
arima220	168.6100	169.0716	174.6861

The best model to select based on the AIC and AICc criteria, is the ARIMA() function model, which is equivalent to an ARIMA(1,1,0) model, discussed in part b.

Part d

Choose what you think is the best model and check the residual diagnostics.

As discussed in Part c, the ARIMA(1,1,0) model was selected since it had the best AIC and AICc values. Now we will check the residuals of the model and make sure they resemble white noise.

The residuals look like white noise. The histogram has a slight left skew, but it’s generally normal, while the dips we see in the early 1980s and late 2000s are from two recession that his the USA in those years and therefore are not much of a concern.

Part e

Produce forecasts of your fitted model. Do the forecasts look reasonable?

The forecasted data seems reasonable as it follows the same upward trend at a similar rate to historical levels. It also looks like the upward rate might be slowing down a little as the forecast moves further out. I chose \(h=15\) becuase it will be easier to compare to the ETS() model in part f.

Part f

Compare the results with what you would obtain using ETS() (with no transformation).

The best model based on the two plots, the ARIMA(1,1,0) model in part e and then ETS(M,A,N) model in part f, is the ARIMA(1,1,0) model. The main reason I prefer the ARIMA(1,1,0) are the stronger confidence intervals. There is no reason to expect the GDP per capita trend to take such a sharp decline as seen in the confidence intervals of the ETS(M,A,N) model in the next 5 years (bar any economic recession).

ARIMA or ETS selection
.model	RMSE	MAE	MPE	MAPE
arima	603.79	398.68	0.08	1.59
ets	646.88	447.99	0.31	1.71

Furthermore, we can see in the table above that all the error terms for the ARIMA(1,1,0) are better than those of the ETS(M,A,N) model.

DATA 624 - HW 6

Stefano Biguzzi

3/21/2022

Question 9.1

Part a

Part b

Question 9.2

Question 9.3

Part a

Part b

Part c

Question 9.5

Question 9.6

Part a

Part b

Part c & d

Part e & f & g

Question 9.7

Part a

Part b

Part c

Part d

Question 9.8

Part a

Part b

Part c

Part d

Part e

Part f