library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## Warning: package 'dplyr' was built under R version 4.4.2
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(fable)
library(ggplot2)
Case 3
Vehicle make and mode
Year of manufacture
Original purchase price
Mileage at resale time
Age of vehicle
Service history
Condition at sale
Vehicle type
Fuel type
Transmission type
Lease return condition
Case 4
Route
Week of the year
Class of passenger
Historical trend
Fare price
Weather events
Cancellation rates
Flight frequency
Fuel prices
Exchange rates
data(aus_arrivals)
aus_arrivals |>
autoplot(Arrivals)
Looking at the autoplot(), we can see that there is a
clear seasonality pattern, with regular ups and downs each year.
New Zealand shows a steady increase in arrivals over
time, while Japan starts strong but then begins to
decline after the mid-1990s.
aus_arrivals |>
gg_season(Arrivals)
Looking at the gg_season() plot, we can see
Japan has more arrivals in Q3 but drops in Q2,
New_Zealand clearly peaks in Q1 and Q3,
US stays kind of steady but still bumps up in Q3, and
UK also shows a small rise in Q3 but not as strong as
the others.
aus_arrivals |>
gg_subseries(Arrivals)
Looking at the gg_subseries() plot, we can see
Japan had high arrivals in Q3 earlier on, but they’ve
dropped in recent years, New_Zealand keeps growing in
all quarters, especially Q3 and Q4, UK grows steadily
with Q4 being the strongest, and US steadily grow in
all Quarters.
data("aus_production")
gas <- tail(aus_production, 5*4) |>
select(Gas)
gas|>
autoplot(Gas)
There is a seasonal pattern where Q2 and Q3 have the highest values, Q1 has the lowest, and Q4 is slightly higher than Q1.
gas_decomp <- gas |>
model(classical_decomposition(Gas, type = "multiplicative"))
components(gas_decomp) |> autoplot()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Yes the result support the graphical interpretation from part a
gas_adjusted <- components(gas_decomp) |>
select(Gas, season_adjust)
gas_adjusted |>
autoplot(Gas, color = "black") +
autolayer(gas_adjusted, season_adjust, color = "red") +
labs(y = "Gas Production", x = "Quarter") +
scale_color_manual(values = c("black", "red"))
gas_outlier <- gas |>
mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))
gas_decomp_outlier <- gas_outlier |>
model(classical_decomposition(Gas, type = "multiplicative"))
components_outlier <- components(gas_decomp_outlier)
components_outlier |>
autoplot(components_outlier)
The outlier made a big spike in the Gas data, which pulled the trend line up unnaturally. The seasonal pattern stayed mostly the same, but the random part and the adjusted data were clearly affected. This shows that even one outlier can mess up the results and should be fixed before doing decomposition.
gas_outlier <- gas |>
mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))
gas_decomp_outlier <- gas_outlier |>
model(classical_decomposition(Gas, type = "multiplicative"))
components(gas_decomp_outlier) |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas), colour = "grey") +
geom_line(aes(y = season_adjust), colour = "blue")
The outlier in the middle made a big spike and messed up the adjusted line. If the outlier was at the end, it wouldn’t change the trend as much. So yes, where the outlier is matters.
set.seed(28)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
fit <- myseries_train |>
model(SNAIVE(Turnover))
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)
fit |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral… Other r… SNAIV… Trai… 0.296 0.675 0.445 6.68 10.4 1 1 0.654
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Aust… Other r… Test -1.10 1.83 1.50 -17.3 20.9 3.36 2.71 0.624
The accuracy of the model depends on how much training data is used. More training data usually helps the model perform better, reducing errors. With less training data, the model might make more mistakes and not work as well on new data.
data(global_economy)
afghan_population <- global_economy |>
filter(Country == "Afghanistan")
afghan_population |>
autoplot(Population)
We can see during Soviet-Afghan war the population dips
afghan_models = afghan_population %>%
model(
linear = TSLM(Population ~ trend()),
piecewise = TSLM(Population ~ trend(knots = c(1980, 1989)))
)
augment(afghan_models) |>
autoplot(.fitted) +
geom_line(aes(y = Population), color = 'black')
forecasts <- afghan_models |>
forecast(h = 5)
autoplot(forecasts) +
autolayer(afghan_population, Population) +
labs(title = "Afghanistan Population Forecast: Linear vs Piecewise",
y = "Population")
The linear model shows a faster population growth, but it might be too optimistic. The piecewise model grows more slowly and likely fits recent trends better.
nigeria_econ <- global_economy %>%
filter(Country == "Nigeria")
nigeria_econ |>
autoplot(Exports)
The plot shows Nigeria’s exports going up and down over the years. There are big jumps in exports during some years, especially in the 1970s, 1980s, and early 2000s. The data is pretty volatile, with exports rising quickly and then dropping just as fast. There’s no clear trend – it goes through cycles of growth and decline.
nigeria_ann <- nigeria_econ |>
model(ETS(Exports ~ error('A') + trend('N') + season('N')))
forcast_ann <- nigeria_ann |>
forecast(h = '5 years')
forcast_ann |>
autoplot(nigeria_econ)
accuracy(nigeria_ann)
## # A tibble: 1 × 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Nigeria "ETS(Exports … Trai… 0.0853 5.94 4.32 -7.97 27.5 0.938 0.943 0.0246
nigeria_aan <- nigeria_econ |>
model(ETS(Exports ~ error('A') + trend('A') + season('N')))
forecast_aan <- nigeria_aan |>
forecast(h = 5)
forecast_aan |>
autoplot(nigeria_econ)
accuracy(nigeria_aan)
## # A tibble: 1 × 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Nigeria "ETS(Export… Trai… -0.00883 5.94 4.32 -8.58 27.7 0.939 0.943 0.0269
ETS(A,N,N): It’s simpler and works well if there’s no clear trend. It’s easier to understand but might miss important changes in the data.
ETS(A,A,N): It includes a trend, which is useful when the data is growing or shrinking over time. It’s a bit more complex but can give better results if there’s a trend in the data.
The ETS(A,A,N) model is likely better for your data because there seems to be a trend in exports, but the ETS(A,N,N) model is easier and simpler to use when no trend is present.
nigeria_econ |>
model(
ETS(Exports ~ error("A") + trend("N") + season("N")),
ETS(Exports ~ error("A") + trend("A") + season("N"))
) |>
forecast(h = 5) |>
autoplot(nigeria_econ, level = NULL)
I believe ETS(A,A,N) is the better model in this case, as it accounts for the slight trend observed in the data. The ETS(A,N,N) model, while simpler, is not capturing the potential upward movement in exports.
rmse_ets_ann <- 5.943412
rmse_ets_aan <- 5.944553
nigeria_ann <- nigeria_econ |>
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
nigeria_aan <- nigeria_econ |>
model(ETS(Exports ~ error("A") + trend("A") + season("N")))
nigeria_ann_forecast <- forecast(nigeria_ann, h = 5)
nigeria_aan_forecast <- forecast(nigeria_aan, h = 5)
first_forecast_ann <- as.numeric(nigeria_ann_forecast$.mean[1])
first_forecast_aan <- as.numeric(nigeria_aan_forecast$.mean[1])
lower_ann <- first_forecast_ann - 1.96 * rmse_ets_ann
upper_ann <- first_forecast_ann + 1.96 * rmse_ets_ann
lower_aan <- first_forecast_aan - 1.96 * rmse_ets_aan
upper_aan <- first_forecast_aan + 1.96 * rmse_ets_aan
prediction_intervals <- data.frame(
Model = c("ETS(A,N,N)", "ETS(A,A,N)"),
Lower = c(lower_ann, lower_aan),
Upper = c(upper_ann, upper_aan)
)
print(prediction_intervals)
## Model Lower Upper
## 1 ETS(A,N,N) 0.7102325 24.00841
## 2 ETS(A,A,N) 0.8303321 24.13298
set.seed(20)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
ggplot(sim, aes(x = idx, y = y)) +
geom_line()
As 𝜙 1 ϕ 1 increases, the series becomes more influenced by past values. With a lower 𝜙 1 ϕ 1 , the series looks more random and changes a lot. As 𝜙 1 ϕ 1 gets higher, the series starts to follow a smoother pattern, with past values having a stronger effect. The higher 𝜙 1 ϕ 1 is, the less random the series appears, making it more predictable.
set.seed(20)
y_ma <- numeric(100)
e_ma <- rnorm(100)
for(i in 2:100) {
y_ma[i] <- e_ma[i] + 0.6 * e_ma[i-1]
}
sim_1 <- tsibble(idx = seq_len(100), y = y_ma, index = idx)
ggplot(sim_1, aes(x = idx, y = y)) +
geom_line()
The plot displays fluctuations around the zero line with some smooth trends, which is characteristic of moving average processes. There are occasional sharp spikes and rapid changes, which is typical when the series is influenced by previous white noise values.
set.seed(20)
y_arma <- numeric(100)
e_arma <- rnorm(100)
for(i in 2:100) {
y_arma[i] <- 0.6 * y_arma[i-1] + e_arma[i] + 0.6 * e_arma[i-1]
}
sim_arma <- tsibble(idx = seq_len(100), y = y_arma, index = idx)
set.seed(20)
y_ar <- numeric(100)
e_ar <- rnorm(100)
for(i in 3:100) {
y_ar[i] <- -0.8 * y_ar[i-1] + 0.3 * y_ar[i-2] + e_ar[i]
}
sim_ar <- tsibble(idx = seq_len(100), y = y_ar, index = idx)
library(ggplot2)
ggplot(sim_arma, aes(x = idx, y = y)) +
geom_line()
ggplot(sim_ar, aes(x = idx, y = y)) +
geom_line()