The above plots are autocorrelation plots for three different time series, showing the autocorrelation coefficient at various lags. The plots suggest that the data is white noise as there are no significant autocorrelation spikes at any of the lags (demonstrated by all the bars being within the two blue lines). We should also check the mean of the residuals (a mean of zero would suggest white noise).
B
The critical values are at different distances from zero because the time series have different amounts of observations - for a 95% confidence level, we divide 1.96 by the square root of the number of observations. The more observations there are, the smaller the value of that fraction will be.
The auto-correlations are different in each figure because these are three different time series that have their own unique characteristics that can lead to different correlation levels at different lags.
9.2
Below is a plot of Amazon’s closing price and the ACF and PACF plots:
Plot variable not specified, automatically selected `y = Close`
The plot of the closing price shows us that there is a positive trend in Amazon’s closing stock price. A presence of trend means the time series is not stationary.
The ACF plot shows us that correlation is consistently large at all lags, meaning the time series is not stationary (a stationary time series would have an ACF where the value drops to zero quickly).
The p-value of 0.1 indicates that the data appears stationary after one order of differencing on the seasonally differenced, Box-Cox transformed sales data.
9.5
For this question, I will select ‘Liquor retailing’ for ‘New South Wales’
aus_liquor_NSW <- aus_retail |>filter(Industry=='Liquor retailing', State=='New South Wales') aus_liquor_NSW |>autoplot(Turnover)
There is positive trend, seasonality, and non-stable variance. I will first start with a Box-Cox transformation.
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_point()`).
The ACF plot suggests the data is not stationary. I will implement the seasonal difference and then do one order of differencing on the seasonally differenced data.
# A tibble: 1 × 4
State Industry kpss_stat kpss_pvalue
<chr> <chr> <dbl> <dbl>
1 New South Wales Liquor retailing 0.0152 0.1
The p-value of 0.1 indicates that the data appears stationary after one order of differencing on the seasonally differenced, Box-Cox transformed turnover data.
9.6
A
I use the code provided to generate the data and also create two alternative versions of y that I will use later
y <-numeric(100)y2 <- yy3 <- ye <-rnorm(100)for(i in2:100) y[i] <-0.6*y[i-1] + e[i]sim <-tsibble(idx =seq_len(100), y = y, index = idx)
B
I produce a time plot of the series:
sim |>autoplot(y) +labs(title ='phi 0.6')
Next, I create two alternative data sets with different phi values: 0.3 and 0.9:
for(i in2:100) y2[i] <-0.3*y2[i-1] + e[i]sim2 <-tsibble(idx =seq_len(100), y = y2, index = idx)sim2 |>autoplot(y) +labs(title ='phi 0.3')
for(i in2:100) y3[i] <-0.9*y3[i-1] + e[i]sim3 <-tsibble(idx =seq_len(100), y = y3, index = idx)sim3 |>autoplot(y) +labs(title ='phi 0.9')
The closer that phi is to 1, the closer the series comes to a random walk. The higher phi is, the more weight there is to more recent values. The higher phi leads to a smoother time series and the time series reaches lower depths than the models with lower phi values.
C, D
I leverage the code above to generate data from an MA(1) model with theta equal to 0.6 and sigma squared = 1:
The residuals to look like white nose - there are no significant auto-correlations, the residuals are centered around zero (though there are some outliers) and there is no discernible pattern in the residuals. Below are the model details:
The model is an ARIMA (0,2,1) with a theta of -0.8963. The equation in terms of the backshift operator is below:
\[
(1 - B)^2 X_t = (1 - 0.8963 B) \epsilon_t
\]
C, D
We are asked to plot forecasts for an ARIMA (0,1,0) with drift model and ARIMA (2,1,2) with drift model and compare these to the automatically selected model in part A.
The ARIMA(0,1,0) with drift and ARIMA(2,1,2) with drift models generate more moderate forecasts than the automatically selected ARIMA (0,2,1) with no drift model.
Warning: 1 error encountered for ARIMA212
[1] non-stationary AR part from CSS
Warning: Removed 10 rows containing missing values or values outside the scale range
(`geom_line()`).
The ARIMA (2,1,2) model fails to generate
E
We are tasked with plotting forecasts from an ARIMA (0,2,1) model with a constant. It should be noted that it is discouraged to include a constant for models with a difference greater than 1 as it leads to a quadratic or higher order trend. I also include the automatically selected ARIMA for comparison purposes.
Warning: Model specification induces a quadratic or higher order polynomial trend.
This is generally discouraged, consider removing the constant or reducing the number of differences.
The forecast increases at a faster rate than the automatically selected ARIMA (0,2,1).
The variance seems more stable, so I will continue with the BoxCox transformed data. The lambda value is:
lambdaUSGDP
[1] 0.2819443
B, C
We are tasked with fitting an automatically selected ARIMA model as well as other plausible models. I will start by first identifying a model manually.
The time series needs to be differenced to achieve stationarity. Below is what one order of differencing looks like.
usGDP |>autoplot(difference(BoxCoxGDP))
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
The last significant partial autocorrelation is at lag 1, which suggests we can use an AR term of 1. Because we used one order of differencing, the manually selected model would be an ARIMA(1,1,0).
Next, I fit the manual model and an automatically selected model to the data:
# A mable: 1 x 2
arima110 auto
<model> <model>
1 <ARIMA(1,1,0)> <ARIMA(1,1,0) w/ drift>
Interestingly, the automatically selected ARIMA model has the same parameters as the manually selected model though it includes a constant. When the constant is not equal to zero and d is equal to one, the long-term forecasts will follow a straight line.
In the above, we see that the AICc for the automatically selected model is lower than the manual model, suggesting that is the better model.
I check the residual diagnostics of both models below:
usGDP_fit |>select(arima110) |>gg_tsresiduals()
usGDP_fit |>select(auto) |>gg_tsresiduals()
Residuals for both models seem to be white noise. The manually selected model has one significant autocorrelation spike, however this is likely okay. We can conduct a Ljung-Box test with lag = 10.
usGDP_fit |>select(arima110) |>augment() |>features(.innov, ljung_box, lag =10)
The forecasts seems reasonable based on the observed values of the time series - the forecast continues the upward trend in GDP.
F
We are tasked with comparing the ARIMA model with an ETS model on the non-transformed data. I leverage the textbook code to use time series cross-validation to compare the models