install the package fpp3
and make sure to install
dependencies
#install.packages('fpp3', dep=T)
library(fpp3)
Warning: package ‘fpp3’ was built under R version 4.4.3Registered S3 method overwritten by 'tsibble':
method from
as_tibble.grouped_df dplyr
── Attaching packages ─────────────────────────── fpp3 1.0.1 ──
✔ tibble 3.2.1 ✔ tsibble 1.1.6
✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
✔ tidyr 1.3.1 ✔ feasts 0.4.1
✔ lubridate 1.9.4 ✔ fable 0.4.1
✔ ggplot2 3.5.1
Warning: package ‘tsibble’ was built under R version 4.4.3Warning: package ‘tsibbledata’ was built under R version 4.4.3Warning: package ‘feasts’ was built under R version 4.4.3Warning: package ‘fabletools’ was built under R version 4.4.3Warning: package ‘fable’ was built under R version 4.4.3── Conflicts ──────────────────────────────── fpp3_conflicts ──
✖ lubridate::date() masks base::date()
✖ dplyr::filter() masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval() masks lubridate::interval()
✖ dplyr::lag() masks stats::lag()
✖ tsibble::setdiff() masks base::setdiff()
✖ tsibble::union() masks base::union()
Let’s do some problems from Rob Hyndman’s Forecasting Principles and Practice Book.
Case 3 A large car fleet company asked us to help them forecast vehicle resale values. They purchase new vehicles, lease them out for three years, and then sell them. Better forecasts of vehicle sales values would mean better control of profits; understanding what affects resale values may allow leasing and sales policies to be developed in order to maximise profits.
At the time, the resale values were being forecast by a group of specialists. Unfortunately, they saw any statistical model as a threat to their jobs, and were uncooperative in providing information. Nevertheless, the company provided a large amount of data on previous vehicles and their eventual resale values.
1. Problem definition
2. Gathering information
Data set of past sales should be obtained, including surrounding information such as the way data were gathered, possible outliers and incorrect records, special values in the data.
Expertise knowledge should be obtained from people responsible for the sales such as seasonal price fluctuations, if there is dependency of the price on the situation in economy, also finding other possible factors which can influence the price.
3. Preliminary (exploratory) analysis
Possible outliers and inconsistent information should be found (for example very small, zero or even negative prices).
Graphs which show dependency of the sale price on different predictor variables should be considered.
Dependency of the sale price on month of the year should be plot.
4. Choosing and fitting models
A model to start from (for example a linear model) and predictor variables which most likely affect the forecasts should be chosen. Predicting performance of the model should be evaluated.
The model should be changed (for example by transforming parameters, adding or removing predictor variables) and it’s performance evaluated. This should be done iteratively a few times until a satisfactory model is found.
5. Using and evaluating a forecasting model
The appropriate software should be deployed to the company and relevant people should be educated how to use this software.
Forecasting accuracy should be checked against new sales. If necessary the model should be updated and then the deployed software.
autoplot()
,
gg_season()
, gg_subseries()
,
gg_lag()
, ACF()
and explore features from the
following time series: “Total Private” Employed from us_employment,
Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and
Barrels from us_gasoline.Total Private Employment in the US
us_employment |>
filter(Title == "Total Private") |>
autoplot(Employed)
There is a strong trend and seasonality. Some cyclic behaviour is seen, with a big drop due to the global financial crisis.
us_employment |>
filter(Title == "Total Private") |>
gg_season(Employed)
us_employment |>
filter(Title == "Total Private") |>
gg_subseries(Employed)
us_employment |>
filter(Title == "Total Private") |>
gg_lag(Employed)
us_employment |>
filter(Title == "Total Private") |>
ACF(Employed) |>
autoplot()
In all of these plots, the trend is so dominant that it is hard to see anything else. We need to remove the trend so we can explore the other features of the data.
Brick production in Australia
aus_production |>
autoplot(Bricks)
A positive trend in the first 20 years, and a negative trend in the next 25 years. Strong quarterly seasonality, with some cyclicity – note the recessions in the 1970s and 1980s.
aus_production |>
gg_season(Bricks)
Brick production tends to be lowest in the first quarter and peak in either quarter 2 or quarter 3.
aus_production |>
gg_subseries(Bricks)
The decrease in the last 25 years has been weakest in Q1.
aus_production |>
gg_lag(Bricks, geom='point')
Warning: Removed 20 rows containing missing values (gg_lag).
aus_production |>
ACF(Bricks) |> autoplot()
The seasonality shows up as peaks at lags 4, 8, 12, 16, 20,… The trend is seen with the slow decline on the positive side.
Snow hare trappings in Canada
pelt |>
autoplot(Hare)
There is some cyclic behaviour with substantial variation in the length of the period.
pelt |>
gg_lag(Hare, geom='point')
pelt |>
ACF(Hare) |> autoplot()
The cyclic period seems to have an average of about 10 (due to the local maximum in ACF at lag 10).
H02 sales in Australia
There are four series corresponding to H02 sales, so we will add them
together.
h02 <- PBS |>
filter(ATC2 == "H02") |>
group_by(ATC2) |>
summarise(Cost = sum(Cost)) |>
ungroup()
h02 |>
autoplot(Cost)
A positive trend with strong monthly seasonality, dropping suddenly every February.
h02 |>
gg_season(Cost)
h02 |>
gg_subseries(Cost)
The trends have been greater in the higher peaking months – this leads to increasing seasonal variation.
h02 |>
gg_lag(Cost, geom='point', lags=1:16)
h02 |>
ACF(Cost) |> autoplot()
The large January sales show up as a separate cluster of points in the lag plots. The strong seasonality is clear in the ACF plot.
US gasoline sales
us_gasoline |>
autoplot(Barrels)
A positive trend until 2008, and then the global financial crisis led to a drop in sales until 2012. The shape of the seasonality seems to have changed over time.
us_gasoline |>
gg_season(Barrels)
There is a lot of noise making it hard to see the overall seasonal pattern. However, it seems to drop towards the end of quarter 4.
us_gasoline |>
gg_subseries(Barrels)
The blue lines are helpful in seeing the average seasonal pattern.
us_gasoline |>
gg_lag(Barrels, geom='point', lags=1:16)
us_gasoline |>
ACF(Barrels, lag_max = 150) |> autoplot()
The seasonality is seen if we increase the lags to at least 2 years (approx 104 weeks)
The seasonal
package is required for
X_13ARIMA_SEATS()
, which is part of the fable
framework for time series analysis.
#install.packages('seasonal', dep=T)
library(seasonal)
Warning: package ‘seasonal’ was built under R version 4.4.3
Attaching package: ‘seasonal’
The following object is masked from ‘package:tibble’:
view
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
decomp <- myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
decomp |> autoplot()
Two outliers are now evident in the “irregular” component — in December 1995 and July 2010.