Exercise 1.8

Problem 1

For cases 3 and 4 in Section 1.5, list the possible predictor variables that might be useful, assuming that the relevant data are available.

Case 3 A large car fleet company asked us to help them forecast vehicle resale values. They purchase new vehicles, lease them out for three years, and then sell them. Better forecasts of vehicle sales values would mean better control of profits; understanding what affects resale values may allow leasing and sales policies to be developed in order to maximise profits.

At the time, the resale values were being forecast by a group of specialists. Unfortunately, they saw any statistical model as a threat to their jobs, and were uncooperative in providing information. Nevertheless, the company provided a large amount of data on previous vehicles and their eventual resale values.

Useful predictors would include: -Make and model -Year -Engine type -Fuel Efficiency & Cost of vehicle -MIleage at resale -Service History -Lease terms -Market demans affecting specific models -Fuel price -Days to sell -geographical locations

Case 4 In this project, we needed to develop a model for forecasting weekly air passenger traffic on major domestic routes for one of Australia’s leading airlines. The company required forecasts of passenger numbers for each major domestic route and for each class of passenger (economy class, business class and first class). The company provided weekly traffic data from the previous six years.

Air passenger numbers are affected by school holidays, major sporting events, advertising campaigns, competition behaviour, etc. School holidays often do not coincide in different Australian cities, and sporting events sometimes move from one city to another. During the period of the historical data, there was a major pilots’ strike during which there was no traffic for several months. A new cut-price airline also launched and folded. Towards the end of the historical data, the airline had trialled a redistribution of some economy class seats to business class, and some business class seats to first class. After several months, however, the seat classifications reverted to the original distribution.

Useful predictors would include: -Week of the year -Day of the week -School holiday schedules -Public holidays -Major events -Conventions/expos -Marketing Domestic/International -Natural disasters -Union Actions -Competor Actions -Unemployment rates -Economic growth

Exercise 2.10

Problem 6

The aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US

Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries Can you identify any unusual observations?

library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(ggplot2)
library(tibble)
library(dplyr)
arrivals_four <- aus_arrivals |>
  filter(Origin %in% c("Japan", "New Zealand", "UK", "US"))
arrivals_four |>
  autoplot(Arrivals) +
  labs(title = "Quarterly International Arrivals to Australia",
       y = "Arrivals",
       x = "Year") +
  facet_wrap(~ Origin, scales = "free_y")

arrivals_four |>
  gg_season(Arrivals) +
  labs(title = "Seasonal Plot: International Arrivals by Country")

arrivals_four |>
  gg_subseries(Arrivals) +
  labs(title = "Subseries Plot: Arrivals to Australia")

the most notable trend is travel from Japan falls off around the year 2000, Travel from the UK is highest in their winter months which is Australia’s summers months, and finally, Travel from the US increases at a fairly consistant rate.

Exercise 3.7

Problem 7

Consider the last five years of the Gas data from aus_production.

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas <- tail(aus_production, 5 * 4) |>
  select(Gas)


gas |>
  autoplot(Gas) +
  labs(title = "Quarterly Gas Production (Last 5 Years)",
       y = "Terajoules")

Seasonal fluctuations with a upward trend

b.Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices

gas_decomp <- gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  )


components(gas_decomp) |>
  autoplot() +
  labs(title = "Multiplicative Decomposition of Gas Data")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

c. Do the results support the graphical interpretation from part a? yes, it shows the upward trend in one graph and the seasonality in another graph

d.Compute and plot the seasonally adjusted data.

gas_decomp |>
  components() |>
  autoplot(season_adjust) +
  labs(title = "Seasonally Adjusted Gas Production")

e.Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_outlier <- gas
gas_outlier$Gas[10] <- gas_outlier$Gas[10] + 300


gas_outlier_decomp <- gas_outlier |>
  model(classical_decomposition(Gas, type = "multiplicative"))


components(gas_outlier_decomp) |>
  autoplot(season_adjust) +
  labs(title = "Seasonally Adjusted Gas Production (With Outlier in Middle)")

the outlier distorts the trend cycle and adjusted values become less reliable

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_outlier_end <- gas
gas_outlier_end$Gas[20] <- gas_outlier_end$Gas[20] + 300


gas_outlier_end_decomp <- gas_outlier_end |>
  model(classical_decomposition(Gas, type = "multiplicative"))


components(gas_outlier_end_decomp) |>
  autoplot(season_adjust) +
  labs(title = "Seasonally Adjusted Gas Production (Outlier at End)")

putting it at the end affects fewer values because they have less smoothing from the moving average. the impact becomes more localized

Exercise 5.11

Problem 7

For your retail time series (from Exercise 7 in Section 2.10):

myseries <- aus_retail |>
  filter(
    Industry == "Cafes, restaurants and catering services",
    State == "New South Wales"
  )

a. Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |>
  filter(year(Month) < 2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

d. Check the residuals.

fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

in the innovation residuals, the seasonality grows over time (as the population increases) for the lag, the residuals are autocorrelated, for the histograph, it’s mostly bell shaped and is mostly normally distributed

e. Produce forcasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)

f. Compare the accuracy of your forecasts against the actual values

fit |> accuracy()
## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New Sou… Cafes, … SNAIV… Trai…  14.1  42.1  31.5  6.07  12.7     1     1 0.847
fc |> accuracy(myseries)
## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… New … Cafes, … Test   175.  202.  176.  25.9  26.0  5.58  4.81 0.924

g. How sensitive are the accuracy measures to the amount of training data used? The Mean Error shoots up from 14.13 before 2011 to 174.92 after 2011 the Root Mean Squared Error also shoots up from 42.05 to 202.27 this could be fromchanges in business or counsumer trends and economic conditions also this model assumes the future is exactly the past which is almost never true

Exercise 7.10

Problem 6

The annual population of Afghanistan is available in the global_economy data set.

a. Plot the data and comment on its features. Can you observe the effect of the Soviet-Afghan war?

library(tidyverse)
## Warning: package 'purrr' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.4     ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

afghan_pop <- global_economy |>
  dplyr::filter(Country == "Afghanistan") |>
  dplyr::select(Year, Population)

afghan_pop |>
  autoplot(Population) +
  labs(
    title = "Annual Population of Afghanistan (1960 onward)",
    y = "Population (millions)"
  )

there is a dip in population from about 1980 to about 1988. the Soviet-Afghan War offically was Dec 1979 to Feb 1989, so yes the effects of the war are shown in the population.

b.Fit a linear trend model and compare this to a piecewise linear trend model with knots at 1980 and 1989.

afghan_pop <- global_economy |>
  dplyr::filter(Country == "Afghanistan") 

linear_model <- afghan_pop |>
  model(Linear = TSLM(Population ~ trend))
## Warning: 1 error encountered for Linear
## [1] object is not a matrix
afghan_piecewise <- global_economy |>
  filter(Country == "Afghanistan") |>
  mutate(
    trend = row_number(),
    after_1980 = pmax(0, Year - 1980),
    after_1989 = pmax(0, Year - 1989)
  ) |>
  as_tsibble(index = Year)

piecewise_model <- afghan_piecewise |>
  model(Piecewise = TSLM(Population ~ trend + after_1980 + after_1989))
fitted_values <- bind_rows(
  augment(linear_model),
  augment(piecewise_model)
)

afghan_pop |>
  autoplot(Population) +
  geom_line(data = fitted_values, aes(x = Year, y = .fitted, color = .model)) +
  labs(
    title = "Linear vs Piecewise Linear Trend Models",
    y = "Population (millions)",
    color = "Model"
  )
## Warning: Removed 58 rows containing missing values or values outside the scale range
## (`geom_line()`).

c.Generate forecasts from these two models for the five years after the end of the data, and comment on the results.

future_data <- new_data(afghan_pop, n = 5) |>
  mutate(
    trend = row_number(),
    after_1980 = pmax(0, Year - 1980),
    after_1989 = pmax(0, Year - 1989)
  )


fc_linear <- forecast(linear_model, new_data = future_data)


fc_piecewise <- forecast(piecewise_model, new_data = future_data)
afghan_pop |>
  autoplot(Population) +
  autolayer(fc_linear, colour = "blue", linetype = "dashed") +
  autolayer(fc_piecewise, colour = "red", linetype = "dotted") +
  labs(
    title = "Forecasts of Afghanistan Population (2018–2022)",
    y = "Population (millions)",
    colour = "Model"
  )
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

Exercise 8.8

Problem 5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse. ### New Zealand

a. Plot the Exports series and discuss the main features of the data.

country_data <- global_economy |>
  filter(Country == "New Zealand") |>
  dplyr::select(Year, Exports)

country_data <- country_data |> filter(!is.na(Exports))

country_data |>
  autoplot(Exports) +
  labs(title = "New Zealand Exports Over Time",
       y = "Exports (% of GDP)")

This shows the Exports as a percentage of their GDP, and the changes can come from both increases or decreases in exports or total GDP.

b. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

# Fit ETS(A,N,N)
fit_ANN <- country_data |>
  model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N")))

# Forecast
fc_ANN <- fit_ANN |> forecast(h = 5)

# Plot
fc_ANN |>
  autoplot(country_data) +
  labs(title = "ETS(A,N,N) Forecast: New Zealand Exports")

c. Compute the RMSE values for the training data.

fit_AAN <- country_data |>
  model(AAN = ETS(Exports ~ error("A") + trend("A") + season("N")))

# Forecast
fc_AAN <- fit_AAN |> forecast(h = 5)

# Plot
fc_AAN |>
  autoplot(country_data) +
  labs(title = "ETS(A,A,N) Forecast: New Zealand Exports")

d. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

bind_rows(
  accuracy(fit_ANN),
  accuracy(fit_AAN)
)
## # A tibble: 2 × 10
##   .model .type         ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <chr>  <chr>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 ANN    Training  0.0885  1.79  1.35  0.173  4.75 0.979 0.989 0.0558
## 2 AAN    Training -0.268   1.82  1.44 -1.10   5.09 1.04  1.01  0.0357

_this would suggest the ANN model is slightly better__

e. Compare the forecasts from both methods. Which do you think is best?

bind_rows(
  fc_ANN |> mutate(Model = "ETS(A,N,N)"),
  fc_AAN |> mutate(Model = "ETS(A,A,N)")
) |>
  autoplot(country_data, level = 95) +
  facet_wrap(~ Model) +
  labs(title = "Forecast Comparison: ETS(A,N,N) vs ETS(A,A,N)")

these graphs look to be the same output

f. Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

rmse_ann <- accuracy(fit_ANN) |> filter(.model == "ANN") |> pull(RMSE)
rmse_aan <- accuracy(fit_AAN) |> filter(.model == "AAN") |> pull(RMSE)


first_fc_ann <- fc_ANN |> as_tibble() |> slice(1) |> pull(.mean)
first_fc_aan <- fc_AAN |> as_tibble() |> slice(1) |> pull(.mean)


ann_interval <- c(first_fc_ann - 1.96 * rmse_ann,
                  first_fc_ann + 1.96 * rmse_ann)

aan_interval <- c(first_fc_aan - 1.96 * rmse_aan,
                  first_fc_aan + 1.96 * rmse_aan)

ann_interval
## [1] 22.31021 29.32558
aan_interval
## [1] 22.24511 29.39117

both models are very similar, the AAN model is slightly wider, and with the overlapping intervals suggests low short-term divergence, but that could change for longer timelines.

Exercise 9.11

Problem 6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ2. = 1. The process starts with y1 = 0

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

autoplot(sim, y) + labs(title = "Simluated AR(1) Series (ϕ = 0.6)")

a. Produce a time plot for the series. How does the plot change as you change ϕ1?

simulate_ar1 <- function(phi, n = 100, seed = 123) {
  set.seed(seed)
  y <- numeric(n)
  e <- rnorm(n)
  for (i in 2:n) {
    y[i] <- phi * y[i - 1] + e[i]
  }
  
  tibble(idx = 1:n, y = y, phi = paste0("ϕ = ", phi))
}


phis <- c(0, 0.3, 0.6, 0.9, -0.6)
simulations <- purrr::map_dfr(phis, simulate_ar1)


simulations |>
  ggplot(aes(x = idx, y = y)) +
  geom_line() +
  facet_wrap(~ phi, scales = "free_y") +
  labs(
    title = "AR(1) Time Series for Different Values of ϕ",
    x = "Time", y = "y[t]"
  )

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1 .

e <- rnorm(101, sd = 1)
y <- numeric(100)

for (i in 1:100) {
  y[i] <- e[i + 1] + 0.6 * e[i]
}

sim_ma1 <- tsibble(idx = 1:100, y = y, index = idx)

autoplot(sim_ma1, y) +
  labs(title = "Simulated MA(1) Series (θ = 0.6)")

d. Produce a time plot for the series. How does the plot change as you change θ1 ?

library(fpp3)
library(tibble)
library(ggplot2)
library(purrr)

simulate_ma1 <- function(theta, n = 100, seed = 123) {
  set.seed(seed)
  e <- rnorm(n + 1)  # One extra for lag
  y <- numeric(n)
  
  for (i in 1:n) {
    y[i] <- e[i + 1] + theta * e[i]
  }
  
  tibble(idx = 1:n, y = y, theta = paste0("θ = ", theta))
}

# Try various theta values
thetas <- c(0, 0.3, 0.6, 0.9, -0.6)
simulations <- map_dfr(thetas, simulate_ma1)

# Plot
ggplot(simulations, aes(x = idx, y = y)) +
  geom_line() +
  facet_wrap(~ theta, scales = "free_y") +
  labs(
    title = "Simulated MA(1) Time Series for Different Values of θ",
    x = "Time", y = "y[t]"
  )

e. Generate data from an ARMA(1,1) model with ϕ1 = 0.6, θ1= 0.6, and σ2=1.

e <- rnorm(101, sd = 1)
y <- numeric(100)

for (i in 2:100) {
  y[i] <- 0.6 * y[i - 1] + e[i] + 0.6 * e[i - 1]
}

sim_arma11 <- tsibble(idx = 1:100, y = y, index = idx)

autoplot(sim_arma11, y) +
  labs(title = "Simulated ARMA(1,1) Series (ϕ = 0.6, θ = 0.6)")

f. Generate data from an AR(2) model with ϕ1 = −0.8, ϕ2 = 0.3, andσ2= 1 .(Note that these parameters will give a non-stationary series.)

e <- rnorm(100, sd = 1)
y <- numeric(100)

for (i in 3:100) {
  y[i] <- -0.8 * y[i - 1] + 0.3 * y[i - 2] + e[i]
}

sim_ar2_ns <- tsibble(idx = 1:100, y = y, index = idx)

autoplot(sim_ar2_ns, y) +
  labs(title = "Simulated AR(2) Series (ϕ₁ = -0.8, ϕ₂ = 0.3)")

g. Graph the latter two series and compare them.

sim_arma11 <- sim_arma11 |> 
  as_tibble() |> 
  mutate(model = "ARMA(1,1)")

sim_ar2_ns <- sim_ar2_ns |> 
  as_tibble() |> 
  mutate(model = "AR(2)")

combined <- bind_rows(sim_arma11, sim_ar2_ns)

combined_ts <- combined |> 
  as_tsibble(index = idx, key = model)

ggplot(combined, aes(x = idx, y = y)) +
  geom_line() +
  facet_wrap(~ model, ncol = 1, scales = "free_y") +
  labs(title = "ARMA(1,1) vs AR(2)", x = "Time", y = "Value")

the AR2 is oscillating and grows in intensity. the ARMA1,1 model fluctuates randomly around the mean