library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## Warning: package 'dplyr' was built under R version 4.4.2
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(fable)
library(ggplot2)

1. Section 1.8 Problem

1. For cases 3 and 4 in Section 1.5, list the possible predictor variables that might be useful, assuming that the relevant data are available.

Case 3

  • Vehicle make and mode

  • Year of manufacture

  • Original purchase price

  • Mileage at resale time

  • Age of vehicle

  • Service history

  • Condition at sale

  • Vehicle type

  • Fuel type

  • Transmission type

  • Lease return condition

Case 4

  • Route

  • Week of the year

  • Class of passenger

  • Historical trend

  • Fare price

  • Weather events

  • Cancellation rates

  • Flight frequency

  • Fuel prices

  • Exchange rates

2. Section 2.10 Problem #6

The aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US.

data(aus_arrivals)

Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries.

Can you identify any unusual observations?

aus_arrivals |>
  autoplot(Arrivals)

Looking at the autoplot(), we can see that there is a clear seasonality pattern, with regular ups and downs each year. New Zealand shows a steady increase in arrivals over time, while Japan starts strong but then begins to decline after the mid-1990s.

aus_arrivals |>
  gg_season(Arrivals)

Looking at the gg_season() plot, we can see Japan has more arrivals in Q3 but drops in Q2, New_Zealand clearly peaks in Q1 and Q3, US stays kind of steady but still bumps up in Q3, and UK also shows a small rise in Q3 but not as strong as the others.

aus_arrivals |>
  gg_subseries(Arrivals)

Looking at the gg_subseries() plot, we can see Japan had high arrivals in Q3 earlier on, but they’ve dropped in recent years, New_Zealand keeps growing in all quarters, especially Q3 and Q4, UK grows steadily with Q4 being the strongest, and US steadily grow in all Quarters.

3. Section 3.7 Problem #7.

Consider the last five years of the Gas data from aus_production.

data("aus_production")
gas <- tail(aus_production, 5*4) |> 
  select(Gas)

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas|>
  autoplot(Gas)

There is a seasonal pattern where Q2 and Q3 have the highest values, Q1 has the lowest, and Q4 is slightly higher than Q1.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_decomp <- gas |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_decomp) |> autoplot()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

c. Do the results support the graphical interpretation from part a?

Yes the result support the graphical interpretation from part a

d. Compute and plot the seasonally adjusted data.

gas_adjusted <- components(gas_decomp) |> 
  select(Gas, season_adjust)

gas_adjusted |> 
  autoplot(Gas, color = "black") +
  autolayer(gas_adjusted, season_adjust, color = "red") +
  labs(y = "Gas Production", x = "Quarter") +
  scale_color_manual(values = c("black", "red"))

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_outlier <- gas |> 
  mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))

gas_decomp_outlier <- gas_outlier |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

components_outlier <- components(gas_decomp_outlier)

components_outlier |> 
  autoplot(components_outlier)

The outlier made a big spike in the Gas data, which pulled the trend line up unnaturally. The seasonal pattern stayed mostly the same, but the random part and the adjusted data were clearly affected. This shows that even one outlier can mess up the results and should be fixed before doing decomposition.

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_outlier <- gas |> 
  mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))

gas_decomp_outlier <- gas_outlier |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_decomp_outlier) |> 
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas), colour = "grey") +
  geom_line(aes(y = season_adjust), colour = "blue")

The outlier in the middle made a big spike and messed up the adjusted line. If the outlier was at the end, it wouldn’t change the trend as much. So yes, where the outlier is matters.

4. Section 5.11 Problem #7

For your retail time series (from Exercise 7 in Section 2.10)

set.seed(28)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

a. Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |>
  filter(year(Month) < 2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

d. Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?

fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

The ACF drops fast after lag 1, so the residuals don’t show strong patterns over time. That means they’re mostly uncorrelated. But the histogram is skewed with outliers, so the residuals are not normally shaped.

e. Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)

f. Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()
## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral… Other r… SNAIV… Trai… 0.296 0.675 0.445  6.68  10.4     1     1 0.654
fc |> accuracy(myseries)
## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Aust… Other r… Test  -1.10  1.83  1.50 -17.3  20.9  3.36  2.71 0.624

g. How sensitive are the accuracy measures to the amount of training data used?

The accuracy of the model depends on how much training data is used. More training data usually helps the model perform better, reducing errors. With less training data, the model might make more mistakes and not work as well on new data.

5. Section 7.10 Problem #6

The annual population of Afghanistan is available in the global_economy data set.

data(global_economy)
afghan_population <- global_economy |>
  filter(Country == "Afghanistan")

a. Plot the data and comment on its features. Can you observe the effect of the Soviet-Afghan war?

afghan_population |>
  autoplot(Population)

We can see during Soviet-Afghan war the population dips

b. Fit a linear trend model and compare this to a piecewise linear trend model with knots at 1980 and 1989.

afghan_models = afghan_population %>% 
  model(
    linear = TSLM(Population ~ trend()),
    piecewise = TSLM(Population ~ trend(knots = c(1980, 1989)))
  )
augment(afghan_models) |>
  autoplot(.fitted) +
  geom_line(aes(y = Population), color = 'black')

c. Generate forecasts from these two models for the five years after the end of the data, and comment on the results.

forecasts <- afghan_models |>
  forecast(h = 5)

autoplot(forecasts) +
  autolayer(afghan_population, Population) +
  labs(title = "Afghanistan Population Forecast: Linear vs Piecewise",
       y = "Population")

The linear model shows a faster population growth, but it might be too optimistic. The piecewise model grows more slowly and likely fits recent trends better.

7. Section 9.11 Problem #6

Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with ϕ1=0.6ϕ1=0.6 and σ2=1σ2=1. The process starts with y1=0y1=0.

set.seed(20)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

ggplot(sim, aes(x = idx, y = y)) + 
  geom_line()

As 𝜙 1 ϕ 1 ​ increases, the series becomes more influenced by past values. With a lower 𝜙 1 ϕ 1 ​ , the series looks more random and changes a lot. As 𝜙 1 ϕ 1 ​ gets higher, the series starts to follow a smoother pattern, with past values having a stronger effect. The higher 𝜙 1 ϕ 1 ​ is, the less random the series appears, making it more predictable.

c. Write your own code to generate data from an MA(1) model with θ1=0.6θ1=0.6 and σ2=1σ2=1.

set.seed(20)
y_ma <- numeric(100)
e_ma <- rnorm(100)
for(i in 2:100) {
  y_ma[i] <- e_ma[i] + 0.6 * e_ma[i-1]
}
sim_1 <- tsibble(idx = seq_len(100), y = y_ma, index = idx)

d. Produce a time plot for the series. How does the plot change as you change θ1?

ggplot(sim_1, aes(x = idx, y = y)) + 
  geom_line()

The plot displays fluctuations around the zero line with some smooth trends, which is characteristic of moving average processes. There are occasional sharp spikes and rapid changes, which is typical when the series is influenced by previous white noise values.

e. Generate data from an ARMA(1,1) model with ϕ1=0.6ϕ1=0.6, θ1=0.6θ1=0.6 and σ2=1σ2=1.

set.seed(20)
y_arma <- numeric(100)
e_arma <- rnorm(100)
for(i in 2:100) {
  y_arma[i] <- 0.6 * y_arma[i-1] + e_arma[i] + 0.6 * e_arma[i-1]
}
sim_arma <- tsibble(idx = seq_len(100), y = y_arma, index = idx)

f. Generate data from an AR(2) model with ϕ1=−0.8ϕ1=−0.8, ϕ2=0.3ϕ2=0.3 and σ2=1σ2=1. (Note that these parameters will give a non-stationary series.)

set.seed(20)
y_ar <- numeric(100)
e_ar <- rnorm(100)
for(i in 3:100) {
  y_ar[i] <- -0.8 * y_ar[i-1] + 0.3 * y_ar[i-2] + e_ar[i]
}
sim_ar <- tsibble(idx = seq_len(100), y = y_ar, index = idx)

g. Graph the latter two series and compare them.

library(ggplot2)
ggplot(sim_arma, aes(x = idx, y = y)) +
  geom_line()

ggplot(sim_ar, aes(x = idx, y = y)) +
  geom_line()