library(fpp3)

Introduction

In this document, we will be going through exercises 2.1, 2.2, 2.3, 2.4, 2.5, and 2.8 from Forecasting: Principles and Practice (3rd ed).

Exercises

2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

  • Use ? (or help()) to find out about the data in each series.
  • What is the time interval of each series?
  • Use autoplot() to produce a time plot of each series.
  • For the last plot, modify the axis labels and title.
#help("aus_production")
aus_production |>
  select(Bricks) |>
  head()
## # A tsibble: 6 x 2 [1Q]
##   Bricks Quarter
##    <dbl>   <qtr>
## 1    189 1956 Q1
## 2    204 1956 Q2
## 3    208 1956 Q3
## 4    197 1956 Q4
## 5    187 1957 Q1
## 6    214 1957 Q2

The aus_production dataset is the estimates of selected indicators of manufacturing production in Australia. For bricks, the clay brick production in millions of bricks is recorded. The time interval for this series is quarterly.

Below is a time plot of the series:

aus_production |>
  select(Bricks) |>
  autoplot(Bricks, na.rm = TRUE)

#help("pelt")
pelt |>
  select(Lynx) |>
  head()
## # A tsibble: 6 x 2 [1Y]
##    Lynx  Year
##   <dbl> <dbl>
## 1 30090  1845
## 2 45150  1846
## 3 49150  1847
## 4 39520  1848
## 5 21230  1849
## 6  8420  1850

The pelt dataset includes Hudson Bay Company trading records for the amount of Snowshoe Hare and Canadian Lynx furs traded from 1845 to 1935. For Lynx, it is the number of Canadian Lynx pelts traded. The interval for the series is annual.

Below is a timeplot of the series:

pelt |>
  select(Lynx) |>
  autoplot(Lynx)

#help("gafa_stock")
gafa_stock |>
  select(Close) |>
  head()
## # A tsibble: 6 x 3 [!]
## # Key:       Symbol [1]
##   Close Date       Symbol
##   <dbl> <date>     <chr> 
## 1  79.0 2014-01-02 AAPL  
## 2  77.3 2014-01-03 AAPL  
## 3  77.7 2014-01-06 AAPL  
## 4  77.1 2014-01-07 AAPL  
## 5  77.6 2014-01-08 AAPL  
## 6  76.6 2014-01-09 AAPL

The gafa_stock dataset includes historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. For Close, it is the closing price of each stock. The interval for each series is daily.

Below is a timeplot including each series:

gafa_stock |>
  select(Close) |>
  autoplot(Close)

#help("vic_elec")
vic_elec |>
  select(Demand) |>
  head()
## # A tsibble: 6 x 2 [30m] <Australia/Melbourne>
##   Demand Time               
##    <dbl> <dttm>             
## 1  4383. 2012-01-01 00:00:00
## 2  4263. 2012-01-01 00:30:00
## 3  4049. 2012-01-01 01:00:00
## 4  3878. 2012-01-01 01:30:00
## 5  4036. 2012-01-01 02:00:00
## 6  3866. 2012-01-01 02:30:00

The vic_elec dataset includes operational electricity demand for Victoria, Australia. For Demand, it is the total electricity demand in MWh. The interval for the series is half-hourly.

Below is a timeplot of the series:

vic_elec |>
  select(Demand) |>
  autoplot(Demand) + labs(title = "Half-Hourly Electricity Demand for Victoria, Australia", y = "Electricity Demand (MWh)")

2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock |>
  select(Close) |>
  group_by(Symbol) |>
  filter(Close == max(Close))
## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Close Date       Symbol
##   <dbl> <date>     <chr> 
## 1  232. 2018-10-03 AAPL  
## 2 2040. 2018-09-04 AMZN  
## 3  218. 2018-07-25 FB    
## 4 1268. 2018-07-26 GOOG
  • $232.07 on 2018-10-03 for Apple
  • $2039.51 on 2018-09-04 for Amazon
  • $217.50 on 2018-07-25 for Facebook
  • $1268.33 on 2018-07-26 for Google

2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  • Read the data into R
tute1 <- readr::read_csv("https://bit.ly/fpptute1", show_col_types = FALSE)
head(tute1)
## # A tibble: 6 × 4
##   Quarter    Sales AdBudget   GDP
##   <date>     <dbl>    <dbl> <dbl>
## 1 1981-03-01 1020.     659.  252.
## 2 1981-06-01  889.     589   291.
## 3 1981-09-01  795      512.  291.
## 4 1981-12-01 1004.     614.  292.
## 5 1982-03-01 1058.     647.  279.
## 6 1982-06-01  944.     602   254
  • Convert the data to time series
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)
  • Construct time series plots of each of the three series
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

  • Check what happens when you don’t include facet_grid().
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

2.4

The USgas package contains data on the demand for natural gas in the US.

  • Install the USgas package.
install.packages("USgas")
  • Create a tsibble from us_total with year as the index and state as the key.
mytimeseries <- USgas::us_total |>
  as_tsibble(index = year, key = state)
  • Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
new_england <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
mytimeseries |>
  filter(state %in% new_england) |>
  autoplot(y) + labs(title = "Annual Natural Gas Consumption For The New England Area", y = "Gas Consumption (cu. ft)")

2.5

  • Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
mytimeseries <- openxlsx::read.xlsx("https://bit.ly/fpptourism")
head(mytimeseries)
##      Quarter   Region           State  Purpose    Trips
## 1 1998-01-01 Adelaide South Australia Business 135.0777
## 2 1998-04-01 Adelaide South Australia Business 109.9873
## 3 1998-07-01 Adelaide South Australia Business 166.0347
## 4 1998-10-01 Adelaide South Australia Business 127.1605
## 5 1999-01-01 Adelaide South Australia Business 137.4485
## 6 1999-04-01 Adelaide South Australia Business 199.9126
  • Create a tsibble which is identical to the tourism tsibble from the tsibble package.
head(tsibble::tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.
mytimeseries |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter, key = c(Region, State, Purpose)) -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.
  • Find what combination of Region and Purpose had the maximum number of overnight trips on average.
suppressWarnings(
  mytimeseries |>
    group_by(Region, Purpose) |>
    summarise(Trips = mean(Trips)) |>
    arrange(desc(Trips)) |>
    head(5)
)
## # A tsibble: 5 x 4 [1Q]
## # Key:       Region, Purpose [3]
## # Groups:    Region [2]
##   Region    Purpose  Quarter Trips
##   <chr>     <chr>      <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4  985.
## 2 Sydney    Business 2001 Q4  948.
## 3 Sydney    Visiting 2016 Q4  921.
## 4 Sydney    Visiting 2017 Q4  920.
## 5 Sydney    Visiting 2017 Q1  916.

If averaging per quarter, then the combination of region and purpose with the maximum number of overnight trips is Melbourne and Visiting in the 2017 Q4 with 985.28 trips.

mytimeseries |>
  index_by(year = 0) |>
  group_by(Region, Purpose) |>
  summarise(Trips = mean(Trips)) |>
  arrange(desc(Trips)) |>
  head(5)
## # A tsibble: 5 x 4 [?]
## # Key:       Region, Purpose [5]
## # Groups:    Region [3]
##   Region          Purpose   year Trips
##   <chr>           <chr>    <dbl> <dbl>
## 1 Sydney          Visiting     0  747.
## 2 Melbourne       Visiting     0  619.
## 3 Sydney          Business     0  602.
## 4 North Coast NSW Holiday      0  588.
## 5 Sydney          Holiday      0  550.

If averaging without time considerations, then the combination of region and purpose with the maximum number of overnight trips is Sydney and Visiting with 747.27 trips.

  • Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
mytimeseries |>
  group_by(State) |>
  summarise(Trips = sum(Trips)) |>
  head()
## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
##   State Quarter Trips
##   <chr>   <qtr> <dbl>
## 1 ACT   1998 Q1  551.
## 2 ACT   1998 Q2  416.
## 3 ACT   1998 Q3  436.
## 4 ACT   1998 Q4  450.
## 5 ACT   1999 Q1  379.
## 6 ACT   1999 Q2  558.

2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

us_employment |>
  filter(Title == "Total Private") -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 4 [1M]
## # Key:       Series_ID [1]
##      Month Series_ID     Title         Employed
##      <mth> <chr>         <chr>            <dbl>
## 1 1939 Jan CEU0500000001 Total Private    25338
## 2 1939 Feb CEU0500000001 Total Private    25447
## 3 1939 Mar CEU0500000001 Total Private    25833
## 4 1939 Apr CEU0500000001 Total Private    25801
## 5 1939 May CEU0500000001 Total Private    26113
## 6 1939 Jun CEU0500000001 Total Private    26485
suppressMessages(
  mytimeseries |>
    autoplot()
)

suppressMessages(
  mytimeseries |>
    gg_season()
)

suppressMessages(
  mytimeseries |>
    gg_subseries()
)

suppressMessages(
  mytimeseries |>
    gg_lag()
)

suppressMessages(
  mytimeseries |>
    ACF(Employed) |>
      autoplot()
)

  • Can you spot any seasonality, cyclicity and trend?

There is a clear upward trend in total private employment. What initially appears to be seasonality in employment from the time series plot turns out to be multi-year cyclical boom and busts.

  • What do you learn about the series?

Despite being consistently on the rise, unemployment is always fighting against the cyclical nature of the economy.

  • What can you say about the seasonal patterns?

The very slight increase in the average over the seasons is the only consistent pattern which I would attribute to the general upward trend.

  • Can you identify any unusual years?

Any years which have a recession, like 2008, see a sharp cut in private employment numbers.

aus_production |>
  select(Bricks) -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 2 [1Q]
##   Bricks Quarter
##    <dbl>   <qtr>
## 1    189 1956 Q1
## 2    204 1956 Q2
## 3    208 1956 Q3
## 4    197 1956 Q4
## 5    187 1957 Q1
## 6    214 1957 Q2
suppressMessages(
  mytimeseries |>
    autoplot(na.rm = TRUE)
)

suppressMessages(
  mytimeseries |>
    gg_season(na.rm = TRUE)
)

suppressMessages(
  mytimeseries |>
    gg_subseries(na.rm = TRUE)
)

suppressMessages(
  mytimeseries |>
    gg_lag()
)
## Warning: Removed 20 rows containing missing values (gg_lag).

suppressMessages(
  mytimeseries |>
    ACF(Bricks) |>
      autoplot(na.rm = TRUE)
)

  • Can you spot any seasonality, cyclicity and trend?

The trend was increasing production until the 1980s and then it seemed to have stabilized. There is seasonality with quarter 2 and 3 having higher production than 1 and 4. There do seem to be boom and bust cycles as well.

  • What do you learn about the series?

Brick production has petered off for some reason which is strange considering housing is quite often almost always being made wherever it is habitable.

  • What can you say about the seasonal patterns?

The more moderate temperatures in quarter 2 and 3 likely lead to increased demand for bricks as more construction work is taking place.

  • Can you identify any unusual years?

1992 in particular seemed to have a bad crash in production.

pelt |>
  select(Hare) -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 2 [1Y]
##    Hare  Year
##   <dbl> <dbl>
## 1 19580  1845
## 2 19600  1846
## 3 19610  1847
## 4 11990  1848
## 5 28040  1849
## 6 58000  1850
suppressMessages(
  mytimeseries |>
    autoplot()
)

suppressMessages(
  mytimeseries |>
    gg_season()
)

Annual data does not have seasons, thus we can not run a season graph.

suppressMessages(
  mytimeseries |>
    gg_subseries()
)

suppressMessages(
  mytimeseries |>
    gg_lag()
)

suppressMessages(
  mytimeseries |>
    ACF(Hare) |>
      autoplot()
)

  • Can you spot any seasonality, cyclicity and trend?

There is no seasonality with this series being annual. There are boom and bust cycles apparent. There does not seem to be a trend either.

  • What do you learn about the series?

The harvesting of hare pelts is almost certainly tied to a lifecycle that can be seen in the correlogram. Every 5 years hares seem to cycle between lowest population and maximum.

  • What can you say about the seasonal patterns?

There is no seasonality with this series being annual.

  • Can you identify any unusual years?

1863 seems to be a particularly standout year with a large harvest of hare pelts.

PBS |>
  select(Cost) |>
  filter(ATC2 == "H02") |>
  group_by(ATC2) |>
  summarise(Cost = sum(Cost)) -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 3 [1M]
## # Key:       ATC2 [1]
##   ATC2     Month   Cost
##   <chr>    <mth>  <dbl>
## 1 H02   1991 Jul 429795
## 2 H02   1991 Aug 400906
## 3 H02   1991 Sep 432159
## 4 H02   1991 Oct 492543
## 5 H02   1991 Nov 502369
## 6 H02   1991 Dec 602652
suppressMessages(
  mytimeseries |>
    autoplot()
)

suppressMessages(
  mytimeseries |>
    gg_season()
)

suppressMessages(
  mytimeseries |>
    gg_subseries()
)

suppressMessages(
  mytimeseries |>
    gg_lag()
)

suppressMessages(
  mytimeseries |>
    ACF(Cost) |>
      autoplot()
)

  • Can you spot any seasonality, cyclicity and trend?

There is seasonality and upward trend apparent. However, cyclicity does not seem to be obvious.

  • What do you learn about the series?

The costs for H02 class prescriptions seem to sharply decrease when February comes around and build back up to the peak at January.

  • What can you say about the seasonal patterns?

The costs for H02 class prescriptions seem to sharply decrease when February comes around and build back up to the peak at January.

  • Can you identify any unusual years?

Around 2007 there seems to be a separation from the norm with the last quarter of the year declining in cost rather than increasing.

us_gasoline -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 2 [1W]
##       Week Barrels
##     <week>   <dbl>
## 1 1991 W06    6.62
## 2 1991 W07    6.43
## 3 1991 W08    6.58
## 4 1991 W09    7.22
## 5 1991 W10    6.88
## 6 1991 W11    6.95
suppressMessages(
  mytimeseries |>
    autoplot()
)

suppressMessages(
  mytimeseries |>
    gg_season()
)

suppressMessages(
  mytimeseries |>
    gg_subseries()
)

suppressMessages(
  mytimeseries |>
    gg_lag()
)

suppressMessages(
  mytimeseries |>
    ACF(Barrels) |>
      autoplot()
)

  • Can you spot any seasonality, cyclicity and trend?

There is a general upward trend, however no obvious seasonality. Yet there does seem to be cyclicity.

  • What do you learn about the series?

Oil barrel prices are very similar to stock prices in their high correlation between each other and unpredictability.

  • What can you say about the seasonal patterns?

It is surprising that prices do not seem to fluctuate with periods such as summer that people would be using their cars to travel more.

  • Can you identify any unusual years?

The inflection point of 2007 where the trend reverses temporarily.