This set of correlograms shows white noise for different sets of random numbers. The dashed blue lines represent the level at which the autocorelation (y-axis) becomes significantly different than 0. These plots do not have any lag periods (x-axis) past the blue dashed line, which means that For plot a, we do see a more storng auto-correlation at a lag period of \(k=4\)
The blue lines are different distances from zero because fo reach dataset there may be a different level of statistical significance for autocorrelation. Hence, different
First let’s plot the amazon closing stock price time series:
amazon <- gafa_stock %>% filter(Symbol == "AMZN")
amazon %>% autoplot(Close) +
labs(x = "Date", y="Closing Price ($)", title="Amazon Closing Stock Prices")
This plot is non stationary because the level of the time series changes with time, so properties of the time series (in this case, the level) do depend on time.
Now we can plot the ACF and PACF of this time series
amazon |> ACF(Close) |>
autoplot() + labs(x = "Lag", y="Autocorrelation", title="Autocorrelation of Amazon Closing Stock Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
In this case, we see stroong autocorrelation (above the threshold) for
all lags, indicating a non-stationary time series. This validates what
we saw above in the simple time series plot
No we can plot the partial autocorrelation function (PACF) plot
# Plot PACF
amazon %>% PACF(Close) %>%
autoplot() + labs(x = "Lag", y="Autocorrelation", title="Autocorrelation of Amazon Closing Stock Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
In a partial autocorrelation function plot, the lags prior to a given
lag period are removed. In this case, we see a strong correlation with a
lag \(k=1\), which makes sense given
the strong trend int his time series. Besides that, there are some
stronger autocorrelations at \(k = 5, 19,
25\), but fewer lags whihc would be considered statistically
significant. In this case, we’re still observing non-stationarity, given
the autocorrelations present, but this plot presents a cleaner
picture.
# First set up our turkish time series
turkey <- global_economy %>% filter(Country == "Turkey")
turkey %>% autoplot(GDP) +
labs(x="Year", y="GDP", title="Turkish GDP")
There isn’t much seasonality here, but there is a strong upwards trend,
especially after 1990.
lambda_turkey <- turkey |> features(GDP, features = guerrero) |>
pull(lambda_guerrero)
# Print out optimal lambda value
print(lambda_turkey)
## [1] 0.1572187
turkey$GDP_transformed <- box_cox(turkey$GDP, lambda_turkey)
turkey %>% autoplot(GDP_transformed) +
labs(x="Year", y="GDP (Box-Cox)", title="Turkish GDP: Box-Cox Transformed")
Now we can use differencing to stabilize the mean of our time series
turkey$GDP_diff <- difference(turkey$GDP_transformed)
turkey %>% ACF(difference(GDP_transformed)) %>% autoplot()
With a differencing of our box-cox transformed GDP data, we see no
autocorrelation now
turkey %>% autoplot(GDP_diff)
## Warning: Removed 1 row containing missing values (`geom_line()`).
Visually, this looks to be much more stationary data!
tasmania <- aus_accommodation %>% filter(State == "Tasmania")
tasmania %>% autoplot(Takings)
Given this time series’ variance with time, we’ll want to undergo a Box-cox transformation to reduce that variance over time
lambda_tasmania <- tasmania |> features(Takings, features = guerrero) |>
pull(lambda_guerrero)
# Print out optimal lambda value
print(lambda_tasmania)
## [1] 0.001819643
tasmania$Takings_transformed <- box_cox(tasmania$Takings, lambda_tasmania)
tasmania %>% autoplot(Takings_transformed) +
labs(x="Year", y="Takings (transformed)", title="Tasmanian Accommodations: Box-Cox Transformed")
This helps to even out the level of the variance with time, now we can
apply differencing to get stationary data
tasmania$takings_diff <- difference(tasmania$Takings_transformed)
# Plot differenced data
tasmania %>% autoplot(takings_diff)
This data looks much more stationary, which we want!
First, let’s plot the monthly souvenir sales data
souvenirs %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Sales`
Again, the level of variance for this time series does change with time,
so we’ll have to apply a transformation, we’ll use Box-Cox
lambda_souvenir <- souvenirs |> features(Sales, features = guerrero) |>
pull(lambda_guerrero)
# Print out optimal lambda value
print(lambda_souvenir)
## [1] 0.002118221
souvenirs$sales_transformed <- box_cox(souvenirs$Sales, lambda_souvenir)
souvenirs %>% autoplot(sales_transformed ) +
labs(x="Year", y="Souvenir Sales (transformed)", title="Monthly Souvenir Sales: Box-Cox Transformed")
Now we’ll apply differencing to the box-cox transformed souvenir sales data
souvenirs$sales_diff <- difference(souvenirs$sales_transformed)
# Plot differenced data
souvenirs %>% autoplot(sales_diff) + labs(x="Month", y="Sales (Differenced & Transformed)", title="Monthly Souvenir Sales: Differenced & Transformed")
Again, we see stationarity here, which we can then pass to an ARIMA
model.
First, let’s set up the time series from Exercise 2.7
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
Given the variance of our seasonal changes is not constant, we’ll likely
need to transform this data before differencing. In this case,
let’s try a simple logarithm transformation
Now, let’s apply both seasonal differencing and double differencing (applied to seasonally differenced data). We’ll mirror the method used in Hyndman to create our plots
myseries %>%
transmute(
`Turnover` = `Turnover`,
`Log Turnover` = log(Turnover),
`Seasonal Diff` = difference(log(Turnover), 12),
`Doubly Differenced` =
difference(difference(log(Turnover), 12), 1)
) |>
pivot_longer(-Month, names_to="Type", values_to="Turnover") |>
mutate(
Type = factor(Type, levels = c(
"Turnover",
"Log Turnover",
"Seasonal Diff",
"Doubly Differenced"))
) |>
ggplot(aes(x = Month, y = Turnover)) +
geom_line() +
facet_grid(vars(Type), scales = "free_y") +
labs(title = "Retail Turnover", y = NULL)
The double-differencing plot appears to be the most stationary, so a differencing order of 2 seems to be appropriate here.
# Code form Hyndman
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
# Plot time plot
sim %>% autoplot()
## Plot variable not specified, automatically selected `.vars = y`
Let’s alter our value to be \(\phi_1 =
0.95\) value to see how that changes the plot
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.95*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot()
## Plot variable not specified, automatically selected `.vars = y`
Here we see bigger variations and data that could quite possibly not be
stationary.
Now we’ll geenrate data from our own MA(1) model and plot it
# Custom MA(1) model
y <- numeric(100)
e <- rnorm(100)
theta <- 0.6
for(i in 2:100)
y[i] <- y[i-1] + theta * e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot() + labs(title="Moving Avergae (1) Model")
## Plot variable not specified, automatically selected `.vars = y`
Now let’s try a different value for \(\theta_1
= 0.8\)
y <- numeric(100)
e <- rnorm(100)
theta <- 0.8
for(i in 2:100)
y[i] <- y[i-1] + theta * e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot() + labs(title="MA(1) Model: Altered Theta")
## Plot variable not specified, automatically selected `.vars = y`
There’s much more of a trend in this generated data. Now we can generate
data from an ARMA(1,1) model
# ARMA(1, 1)
y <- numeric(100)
e <- rnorm(100)
theta <- 0.6
phi <- 0.6
for(i in 2:100)
y[i] <- phi * y[i-1] + theta * e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot() + labs(title="ARMA(1,1) Model Data Generation")
## Plot variable not specified, automatically selected `.vars = y`
Now we’ll generate from an AR(2) model. This includes a term fo \(y_{t-2}\), which
y <- numeric(100)
e <- rnorm(100)
phi_1 <- 0.6
phi_2 <- 0.3
for(i in 3:100)
y[i] <- phi_1 * y[i-1] + phi_2 * y[i - 2] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot()
## Plot variable not specified, automatically selected `.vars = y`
With these parameters, this series does in fact look to be
non-stationary, whereas the data generated from the ARIMA model above
does appear to be more stationary overall (no significant jumps in the
level of the time series at a given time).
We’ll let ARIMA select parameters for the
aus_airpassengers time series
# Allow ARIMA to select parameters
aus_fit <- aus_airpassengers %>% model(ARIMA(Passengers))
aus_fit %>% forecast(h=10) %>% autoplot(aus_airpassengers)
In this case a value was selected of ARIMA(0, 2, 1) for
this time series. We have no autoregressive terms in our model, and an
order of 2 for the differencing term. There is one moving average term.
Lastly, Using backshift notation, this model could be written as:
Where \(\epsilon_t\) is our error (white noise) and B is the backshift operator (\(By_t = y_{t-1}\)). When expanded out, we get the equation:
\[\begin{aligned} 0 = y_t(1 + \theta B^3 - \theta B - B^2) + \epsilon_t \newline = y_t - \theta y_{t-3} - \theta y_{t-1} - y_{t-2} + \epsilon_t \end{aligned}\]Now let’s create an ARIMA(0, 1, 0) model with drift for
comparison
# Allow ARIMA to select parameters
aus_fit <- aus_airpassengers %>% model(ARIMA(Passengers ~ pdq(0,1,0)))
aus_fit %>% forecast(h=10) %>% autoplot(aus_airpassengers)
The forecast from an ARIMA(0, 1, 0) model is a bit flatter than in part a.
# Adding a constant to the formula
aus_fit <- aus_airpassengers %>% model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))
aus_fit %>% forecast(h=10) %>% autoplot(aus_airpassengers)
For the ARMA(2,1,2) model, we see some “wiggle” within the prediction interval. This is different than the forecasts above, which tended to be more linear. We can remove the constant
# Removing a constant from the formula
aus_fit <- aus_airpassengers %>% model(ARIMA(Passengers ~ PDQ(2,1,2)))
aus_fit %>% forecast(h=10) %>% autoplot(aus_airpassengers)
With no constant in front, there’s a more linear prediction, with a
stronger increase.
# Removing a constant from the formula
aus_fit <- aus_airpassengers %>% model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
aus_fit %>% forecast(h=10) %>% autoplot(aus_airpassengers)
In this case we see an even sharper slope in our prediction.
First, we’ll visualize the series representing the US GDP over time
us_economy <- global_economy %>% filter(Country == "United States")
# Plot the series
us_economy %>% autoplot(GDP) + labs(x="Year", y="GDP", title="United States GDP over time")