Time Series

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'dplyr' was built under R version 4.4.2

## Warning: package 'tsibble' was built under R version 4.4.3

## Warning: package 'tsibbledata' was built under R version 4.4.3

## Warning: package 'feasts' was built under R version 4.4.3

## Warning: package 'fabletools' was built under R version 4.4.3

## Warning: package 'fable' was built under R version 4.4.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(fable)
library(ggplot2)

1. Section 1.8 Problem

1. For cases 3 and 4 in Section 1.5, list the possible predictor variables that might be useful, assuming that the relevant data are available.

Case 3

Vehicle make and mode
Year of manufacture
Original purchase price
Mileage at resale time
Age of vehicle
Service history
Condition at sale
Vehicle type
Fuel type
Transmission type
Lease return condition

Case 4

Route
Week of the year
Class of passenger
Historical trend
Fare price
Weather events
Cancellation rates
Flight frequency
Fuel prices
Exchange rates

2. Section 2.10 Problem #6

The aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US.

data(aus_arrivals)

Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries.

Can you identify any unusual observations?

aus_arrivals |>
  autoplot(Arrivals)

Looking at the autoplot(), we can see that there is a clear seasonality pattern, with regular ups and downs each year. New Zealand shows a steady increase in arrivals over time, while Japan starts strong but then begins to decline after the mid-1990s.

aus_arrivals |>
  gg_season(Arrivals)

Looking at the gg_season() plot, we can see Japan has more arrivals in Q3 but drops in Q2, New_Zealand clearly peaks in Q1 and Q3, US stays kind of steady but still bumps up in Q3, and UK also shows a small rise in Q3 but not as strong as the others.

aus_arrivals |>
  gg_subseries(Arrivals)

Looking at the gg_subseries() plot, we can see Japan had high arrivals in Q3 earlier on, but they’ve dropped in recent years, New_Zealand keeps growing in all quarters, especially Q3 and Q4, UK grows steadily with Q4 being the strongest, and US steadily grow in all Quarters.

3. Section 3.7 Problem #7.

Consider the last five years of the Gas data from aus_production.

data("aus_production")
gas <- tail(aus_production, 5*4) |> 
  select(Gas)

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas|>
  autoplot(Gas)

There is a seasonal pattern where Q2 and Q3 have the highest values, Q1 has the lowest, and Q4 is slightly higher than Q1.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_decomp <- gas |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_decomp) |> autoplot()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

c. Do the results support the graphical interpretation from part a?

Yes the result support the graphical interpretation from part a

d. Compute and plot the seasonally adjusted data.

gas_adjusted <- components(gas_decomp) |> 
  select(Gas, season_adjust)

gas_adjusted |> 
  autoplot(Gas, color = "black") +
  autolayer(gas_adjusted, season_adjust, color = "red") +
  labs(y = "Gas Production", x = "Quarter") +
  scale_color_manual(values = c("black", "red"))

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_outlier <- gas |> 
  mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))

gas_decomp_outlier <- gas_outlier |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

components_outlier <- components(gas_decomp_outlier)

components_outlier |> 
  autoplot(components_outlier)

The outlier made a big spike in the Gas data, which pulled the trend line up unnaturally. The seasonal pattern stayed mostly the same, but the random part and the adjusted data were clearly affected. This shows that even one outlier can mess up the results and should be fixed before doing decomposition.

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_outlier <- gas |> 
  mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))

gas_decomp_outlier <- gas_outlier |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_decomp_outlier) |> 
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas), colour = "grey") +
  geom_line(aes(y = season_adjust), colour = "blue")

The outlier in the middle made a big spike and messed up the adjusted line. If the outlier was at the end, it wouldn’t change the trend as much. So yes, where the outlier is matters.

4. Section 5.11 Problem #7

For your retail time series (from Exercise 7 in Section 2.10)

set.seed(28)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

a. Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |>
  filter(year(Month) < 2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

d. Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

The ACF drops fast after lag 1, so the residuals don’t show strong patterns over time. That means they’re mostly uncorrelated. But the histogram is skewed with outliers, so the residuals are not normally shaped.

e. Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

f. Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral… Other r… SNAIV… Trai… 0.296 0.675 0.445  6.68  10.4     1     1 0.654

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Aust… Other r… Test  -1.10  1.83  1.50 -17.3  20.9  3.36  2.71 0.624

g. How sensitive are the accuracy measures to the amount of training data used?

The accuracy of the model depends on how much training data is used. More training data usually helps the model perform better, reducing errors. With less training data, the model might make more mistakes and not work as well on new data.

5. Section 7.10 Problem #6

The annual population of Afghanistan is available in the global_economy data set.

data(global_economy)

afghan_population <- global_economy |>
  filter(Country == "Afghanistan")

a. Plot the data and comment on its features. Can you observe the effect of the Soviet-Afghan war?

afghan_population |>
  autoplot(Population)

We can see during Soviet-Afghan war the population dips

b. Fit a linear trend model and compare this to a piecewise linear trend model with knots at 1980 and 1989.

afghan_models = afghan_population %>% 
  model(
    linear = TSLM(Population ~ trend()),
    piecewise = TSLM(Population ~ trend(knots = c(1980, 1989)))
  )

augment(afghan_models) |>
  autoplot(.fitted) +
  geom_line(aes(y = Population), color = 'black')

c. Generate forecasts from these two models for the five years after the end of the data, and comment on the results.

forecasts <- afghan_models |>
  forecast(h = 5)

autoplot(forecasts) +
  autolayer(afghan_population, Population) +
  labs(title = "Afghanistan Population Forecast: Linear vs Piecewise",
       y = "Population")

The linear model shows a faster population growth, but it might be too optimistic. The piecewise model grows more slowly and likely fits recent trends better.

6. Section 8.8 Problem #5Links to an external site.

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

nigeria_econ <- global_economy %>% 
  filter(Country == "Nigeria")

a. Plot the Exports series and discuss the main features of the data.

nigeria_econ |>
  autoplot(Exports)

The plot shows Nigeria’s exports going up and down over the years. There are big jumps in exports during some years, especially in the 1970s, 1980s, and early 2000s. The data is pretty volatile, with exports rising quickly and then dropping just as fast. There’s no clear trend – it goes through cycles of growth and decline.

b. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

nigeria_ann <- nigeria_econ |>
  model(ETS(Exports ~ error('A') + trend('N') + season('N')))

forcast_ann <- nigeria_ann |>
  forecast(h = '5 years')

forcast_ann |>
  autoplot(nigeria_econ)

c. Compute the RMSE values for the training data.

accuracy(nigeria_ann)

## # A tibble: 1 × 11
##   Country .model         .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <fct>   <chr>          <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Nigeria "ETS(Exports … Trai… 0.0853  5.94  4.32 -7.97  27.5 0.938 0.943 0.0246

d. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

nigeria_aan <- nigeria_econ |>
  model(ETS(Exports ~ error('A') + trend('A') + season('N')))

forecast_aan <- nigeria_aan |>
  forecast(h = 5)

forecast_aan |>
  autoplot(nigeria_econ)

accuracy(nigeria_aan)

## # A tibble: 1 × 11
##   Country .model       .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <fct>   <chr>        <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Nigeria "ETS(Export… Trai… -0.00883  5.94  4.32 -8.58  27.7 0.939 0.943 0.0269

ETS(A,N,N): It’s simpler and works well if there’s no clear trend. It’s easier to understand but might miss important changes in the data.

ETS(A,A,N): It includes a trend, which is useful when the data is growing or shrinking over time. It’s a bit more complex but can give better results if there’s a trend in the data.

The ETS(A,A,N) model is likely better for your data because there seems to be a trend in exports, but the ETS(A,N,N) model is easier and simpler to use when no trend is present.

e. Compare the forecasts from both methods. Which do you think is best?

 nigeria_econ |>
  model(
     ETS(Exports ~ error("A") + trend("N") + season("N")),
     ETS(Exports ~ error("A") + trend("A") + season("N"))
  ) |>
  forecast(h = 5) |>
  autoplot(nigeria_econ, level = NULL)

I believe ETS(A,A,N) is the better model in this case, as it accounts for the slight trend observed in the data. The ETS(A,N,N) model, while simpler, is not capturing the potential upward movement in exports.

f. Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

rmse_ets_ann <- 5.943412  
rmse_ets_aan <- 5.944553

nigeria_ann <- nigeria_econ |>
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))

nigeria_aan <- nigeria_econ |>
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

nigeria_ann_forecast <- forecast(nigeria_ann, h = 5)
nigeria_aan_forecast <- forecast(nigeria_aan, h = 5)

first_forecast_ann <- as.numeric(nigeria_ann_forecast$.mean[1])  
first_forecast_aan <- as.numeric(nigeria_aan_forecast$.mean[1]) 

lower_ann <- first_forecast_ann - 1.96 * rmse_ets_ann
upper_ann <- first_forecast_ann + 1.96 * rmse_ets_ann

lower_aan <- first_forecast_aan - 1.96 * rmse_ets_aan
upper_aan <- first_forecast_aan + 1.96 * rmse_ets_aan

prediction_intervals <- data.frame(
  Model = c("ETS(A,N,N)", "ETS(A,A,N)"),
  Lower = c(lower_ann, lower_aan),
  Upper = c(upper_ann, upper_aan)
)

print(prediction_intervals)

##        Model     Lower    Upper
## 1 ETS(A,N,N) 0.7102325 24.00841
## 2 ETS(A,A,N) 0.8303321 24.13298

7. Section 9.11 Problem #6

Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with ϕ1=0.6ϕ1=0.6 and σ2=1σ2=1. The process starts with y1=0y1=0.

set.seed(20)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

ggplot(sim, aes(x = idx, y = y)) + 
  geom_line()

As 𝜙 1 ϕ 1 increases, the series becomes more influenced by past values. With a lower 𝜙 1 ϕ 1 , the series looks more random and changes a lot. As 𝜙 1 ϕ 1 gets higher, the series starts to follow a smoother pattern, with past values having a stronger effect. The higher 𝜙 1 ϕ 1 is, the less random the series appears, making it more predictable.

c. Write your own code to generate data from an MA(1) model with θ1=0.6θ1=0.6 and σ2=1σ2=1.

set.seed(20)
y_ma <- numeric(100)
e_ma <- rnorm(100)
for(i in 2:100) {
  y_ma[i] <- e_ma[i] + 0.6 * e_ma[i-1]
}
sim_1 <- tsibble(idx = seq_len(100), y = y_ma, index = idx)

d. Produce a time plot for the series. How does the plot change as you change θ1?

ggplot(sim_1, aes(x = idx, y = y)) + 
  geom_line()

The plot displays fluctuations around the zero line with some smooth trends, which is characteristic of moving average processes. There are occasional sharp spikes and rapid changes, which is typical when the series is influenced by previous white noise values.

e. Generate data from an ARMA(1,1) model with ϕ1=0.6ϕ1=0.6, θ1=0.6θ1=0.6 and σ2=1σ2=1.

set.seed(20)
y_arma <- numeric(100)
e_arma <- rnorm(100)
for(i in 2:100) {
  y_arma[i] <- 0.6 * y_arma[i-1] + e_arma[i] + 0.6 * e_arma[i-1]
}
sim_arma <- tsibble(idx = seq_len(100), y = y_arma, index = idx)

f. Generate data from an AR(2) model with ϕ1=−0.8ϕ1=−0.8, ϕ2=0.3ϕ2=0.3 and σ2=1σ2=1. (Note that these parameters will give a non-stationary series.)

set.seed(20)
y_ar <- numeric(100)
e_ar <- rnorm(100)
for(i in 3:100) {
  y_ar[i] <- -0.8 * y_ar[i-1] + 0.3 * y_ar[i-2] + e_ar[i]
}
sim_ar <- tsibble(idx = seq_len(100), y = y_ar, index = idx)

g. Graph the latter two series and compare them.

library(ggplot2)
ggplot(sim_arma, aes(x = idx, y = y)) +
  geom_line()

ggplot(sim_ar, aes(x = idx, y = y)) +
  geom_line()