Setup

library(fpp3)
library(stringr)

Assignment

Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .pdf file with your code.

Exercises

2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

  • Use ? (or help()) to find out about the data in each series.
help(aus_production)
class(aus_production)
help(pelt)
class(pelt)
help(gafa_stock)
class(gafa_stock)
help(vic_elec)
class(vic_elec)
  • What is the time interval of each series?
Time Series Interval
aus_production Quarterly
pelt Annual
gafa_stock Daily
vic_elec 1/2 hour

I used the following to help find the interval of each series.

index(aus_production)
index(pelt)
index(gafa_stock)
head(gafa_stock)
index(vic_elec)
head(vic_elec)
  • Use autoplot() to produce a time plot of each series.
aus_production |> autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

pelt |> autoplot(Lynx)

gafa_stock |> autoplot(Close)

vic_elec |> autoplot(Demand)

  • For the last plot, modify the axis labels and title.
vic_elec |> 
  autoplot(Demand) +
  labs(
    title = "Electricity Demand in Australia",
    x = "Time (1/2 Hour Intervals)",
    y = "Demand"
  )

2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock |>
  group_by(Symbol) |>
  filter(Close == max(Close)) |>
  select(Symbol, Date, Close)
## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  1. You can read the data into R with the following script:
tute1 <- readr::read_csv("data/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(tute1)
## # A tibble: 6 × 4
##   Quarter    Sales AdBudget   GDP
##   <date>     <dbl>    <dbl> <dbl>
## 1 1981-03-01 1020.     659.  252.
## 2 1981-06-01  889.     589   291.
## 3 1981-09-01  795      512.  291.
## 4 1981-12-01 1004.     614.  292.
## 5 1982-03-01 1058.     647.  279.
## 6 1982-06-01  944.     602   254
  1. Convert the data to time series
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)
  1. Construct time series plots of each of the three series
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

“AdBudget”, “GDP” and “Sales” are not facet’ed off into their own tiles. Rather, they are presented on a unified grid. Actually, this view is really nice for side by side comparison, though each appears to be in league of its own.

2.4

  1. Install the USgas package.
install.packages("USgas")
library(USgas)
head(us_total)
##   year   state      y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
  1. Create a tsibble from us_total with year as the index and state as the key.
ts_us_total <- us_total |>
  as_tsibble(index = year, key = c(state))
  1. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
ts_us_total |>
  filter(state %in% c(
    "Maine", 
    "Vermont", 
    "New Hampshire", 
    "Massachusetts", 
    "Connecticut", 
    "Rhode Island"
  )) |>
  ggplot(mapping = aes(x = year, y = y, color = state)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Gas Consumption by State",
    subtitle = "New England",
    x = "Year",
    y = "Volume (Million Cubic Feet)"
  )

2.5

  1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
tourism <- readxl::read_excel("./data/tourism.xlsx")
  1. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
tourism |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter, key = c(State, Region, Purpose))
## # A tsibble: 24,320 x 5 [1Q]
## # Key:       State, Region, Purpose [304]
##    Quarter Region   State Purpose  Trips
##      <qtr> <chr>    <chr> <chr>    <dbl>
##  1 1998 Q1 Canberra ACT   Business 150. 
##  2 1998 Q2 Canberra ACT   Business  99.9
##  3 1998 Q3 Canberra ACT   Business 130. 
##  4 1998 Q4 Canberra ACT   Business 102. 
##  5 1999 Q1 Canberra ACT   Business  95.5
##  6 1999 Q2 Canberra ACT   Business 229. 
##  7 1999 Q3 Canberra ACT   Business 109. 
##  8 1999 Q4 Canberra ACT   Business 159. 
##  9 2000 Q1 Canberra ACT   Business 105. 
## 10 2000 Q2 Canberra ACT   Business 202. 
## # ℹ 24,310 more rows
  1. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
tourism |>
  group_by(Region, Purpose) |>
  summarize(
    TripAverage = mean(Trips),
    .groups = "keep"
  ) |>
  arrange(desc(TripAverage)) |>
  head(n = 1)
## # A tibble: 1 × 3
## # Groups:   Region, Purpose [1]
##   Region Purpose  TripAverage
##   <chr>  <chr>          <dbl>
## 1 Sydney Visiting        747.
  1. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
tourism |>
  group_by(State, Quarter) |>
  summarize(
    TripTotal = sum(Trips),
    .groups = "keep"
  ) |>
  mutate(
    Quarter = yearquarter(Quarter)
  ) |>
  as_tsibble(index = Quarter, key = c(State))
## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
## # Groups:    State @ Quarter [640]
##    State Quarter TripTotal
##    <chr>   <qtr>     <dbl>
##  1 ACT   1998 Q1      551.
##  2 ACT   1998 Q2      416.
##  3 ACT   1998 Q3      436.
##  4 ACT   1998 Q4      450.
##  5 ACT   1999 Q1      379.
##  6 ACT   1999 Q2      558.
##  7 ACT   1999 Q3      449.
##  8 ACT   1999 Q4      595.
##  9 ACT   2000 Q1      600.
## 10 ACT   2000 Q2      557.
## # ℹ 630 more rows

2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Helpers

dump_ts <- function(ts, title) {
  print(title)
  cat("index: ")
  print(index(ts))
  cat("key: ")
  print(key(ts))
  print(head(ts))
}

autoplot_ts <- function(ts, variable, title) {
  dump_ts(ts, title)
  autoplot(ts, {{ variable }}) +
    labs(
      title = title,
      x = "Time",
      y = "Value"
    )
  
  ACF(ts, {{ variable }}) |>
    autoplot() +
    labs(title = str_c("ACF: ", title))
}

seasonal_plot <- function(ts, variable, title) {
  gg_season(ts, {{ variable }}) +
    labs(
      title = str_c("Seasonal Plot: ", title),
      x = "Time",
      y = "Value"
    )
}

Filtering

Apply filtering to us_employment and PBS to get the time series we want.

us_employment_tp <- us_employment |>
  filter(Title == "Total Private")
PBS_H02 <- PBS |>
  filter(ATC2 == "H02")

Exploration

Let’s have a look at each one of these time series

autoplot_ts(us_employment_tp, Employed, "Total Private Employment in US")
## [1] "Total Private Employment in US"
## index: Month
## key: [[1]]
## Series_ID
## 
## # A tsibble: 6 x 4 [1M]
## # Key:       Series_ID [1]
##      Month Series_ID     Title         Employed
##      <mth> <chr>         <chr>            <dbl>
## 1 1939 Jan CEU0500000001 Total Private    25338
## 2 1939 Feb CEU0500000001 Total Private    25447
## 3 1939 Mar CEU0500000001 Total Private    25833
## 4 1939 Apr CEU0500000001 Total Private    25801
## 5 1939 May CEU0500000001 Total Private    26113
## 6 1939 Jun CEU0500000001 Total Private    26485

seasonal_plot(us_employment_tp, Employed, "Total Private Employment in US")

gg_subseries(us_employment_tp, Employed)

gg_lag(us_employment_tp, Employed)

autoplot_ts(aus_production, Bricks, "Brick Production in Australia")
## [1] "Brick Production in Australia"
## index: Quarter
## key: list()
## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7

seasonal_plot(aus_production, Bricks, "Brick Production in Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries(aus_production, Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_lag(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).

autoplot_ts(pelt, Hare, "Hare Pelt Trading in Canada")
## [1] "Hare Pelt Trading in Canada"
## index: Year
## key: list()
## # A tsibble: 6 x 3 [1Y]
##    Year  Hare  Lynx
##   <dbl> <dbl> <dbl>
## 1  1845 19580 30090
## 2  1846 19600 45150
## 3  1847 19610 49150
## 4  1848 11990 39520
## 5  1849 28040 21230
## 6  1850 58000  8420

# Error: The data must contain at least one observation per seasonal period. 
gg_subseries(pelt, Hare)

gg_lag(pelt, Hare)

autoplot_ts(PBS_H02, Cost, 'Medicare "H02" Cost in Australia')
## [1] "Medicare \"H02\" Cost in Australia"
## index: Month
## key: [[1]]
## Concession
## 
## [[2]]
## Type
## 
## [[3]]
## ATC1
## 
## [[4]]
## ATC2
## 
## # A tsibble: 6 x 9 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [1]
##      Month Concession   Type      ATC1  ATC1_desc ATC2  ATC2_desc Scripts   Cost
##      <mth> <chr>        <chr>     <chr> <chr>     <chr> <chr>       <dbl>  <dbl>
## 1 1991 Jul Concessional Co-payme… H     Systemic… H02   CORTICOS…   63261 317384
## 2 1991 Aug Concessional Co-payme… H     Systemic… H02   CORTICOS…   53528 269891
## 3 1991 Sep Concessional Co-payme… H     Systemic… H02   CORTICOS…   52822 269703
## 4 1991 Oct Concessional Co-payme… H     Systemic… H02   CORTICOS…   54016 280418
## 5 1991 Nov Concessional Co-payme… H     Systemic… H02   CORTICOS…   49281 268070
## 6 1991 Dec Concessional Co-payme… H     Systemic… H02   CORTICOS…   51798 277139

seasonal_plot(PBS_H02, Cost, 'Medicare "H02" Cost in Australia')

gg_subseries(PBS_H02, Cost)

# gg_lag(PBS_H02, Cost)

autoplot_ts(us_gasoline, Barrels, "Gasoline Production in US")
## [1] "Gasoline Production in US"
## index: Week
## key: list()
## # A tsibble: 6 x 2 [1W]
##       Week Barrels
##     <week>   <dbl>
## 1 1991 W06    6.62
## 2 1991 W07    6.43
## 3 1991 W08    6.58
## 4 1991 W09    7.22
## 5 1991 W10    6.88
## 6 1991 W11    6.95

# Let's zoom in and see if we can make sense of the seasonal/cyclic motion
us_gasoline |>
  filter(
    Week >= yearweek(as.Date("1992-01-01")) & 
      Week <= yearweek(as.Date("1993-01-01"))
  ) |>
  autoplot(Barrels) + 
  labs(
    title = "Gasoline Production in US (1992 - 1993)",
    x = "Time",
    y = "Barrels"
  )

seasonal_plot(us_gasoline, Barrels, "Gasoline Production in US")

gg_subseries(us_gasoline, Barrels)

gg_lag(us_gasoline, Barrels)

2.8.1

Can you spot any seasonality, cyclicity and trend?

Time Series Seasonality Cyclicity Trend
us_employment_tp$Employed Yearly No Steady increase
aus_production$Bricks Yearly It looks like there is a long term cycle of ~5 years. It’s more evident after ~1978. Ascending to 1980, then descending
pelt$Hare No Yes, it approximates 10 years No
PBS_H02$Cost All but “General/Co-payments” have a clear annual period No Ascending
us_gasoline$Barrels Yes, yearly I can’t tell whether it’s noise or a cycle. It’s erratic, but always present. It cycles ~10/14 days Steady increase between (1990, 2007) and then it falls and rises again between (2007, 2017)

2.8.2

What do you learn about the series?

You learn the trends and tendencies of the past. I imagine this is the basis for predictions about the future. It’s going to be interesting to see it all come together. Specifically, we learn:

  • about the time series seasonality, cyclicity and trends,
  • the time interval of each series,
  • that autoplot() is a great starting place for exploring time series data,
  • that it can be difficult to distinguish between seasonality and cyclicity.

2.8.3

What can you say about the seasonal patterns?

us_employment_tp$Employed: The seasonal pattern is subtle and a little elusive. It’s clear to see in the autoplot() but difficult to discern in the gg_season() plot. What can be said about this pattern? I could imagine a few reasons for this pattern:

  • Summer and Fall are busy seasons for many industries such as farming.
  • The holiday season is a busy time for many industries.
  • Election season could account for job creation.

aus_production$Bricks: The seasonal pattern is clear and consistent. It looks like the brick industry in Australia peaks in the 3rd quarter. I can only come up with one reason for this pattern, and that is the weather is nice in the 3rd quarter, so it’s a good time to build.

pelt$Hare: There is no seasonal pattern. The number of hares traded each year is erratic.

PBS_H02$Cost: The seasonal pattern is clear and consistent, though its pattern varies depending on the Type. Speaking of which, I was having trouble understanding how autoplot knew about the Type, but I see that it is part of the key of PBS_H02. Breaking it down by Type:

  • Concessional/Co-payments: It looks like the number of concessional co-payments peaks in the 2nd quarter.
  • Concessional/Safety net: The number of concessional safety net co-payments dips quickly in the 1st quarter and bottoms out in the 2nd quarter. It then climbs back over the 3rd and 4th quarter.
  • General/Co-payments: It looks like noise. I can’t make sense of the pattern.
  • General/Safety net: Almost identical to Concessional/Safety net.

I wonder if the inverted patterns have to do with balancing budgets.

2.8.4

Can you identify any unusual years?

Yes:

  • ~2010 is an unusual year for the employment industry in the US. It looks like there was a significant drop in employment. I suspect the 2008 financial crisis might have something to do with this.
  • ~1982 is an unusual year for the brick industry in Australia. Production plummets.
  • ~1862 and ~1885 are both unusual years for the pelt industry in Canada. The number of hares traded spike significanly during these years.