Overview

This homework assignment is focused on Time Series and includes Exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman text.

Setup

options(repos = list(CRAN="http://cran.rstudio.com/"))

knitr::opts_chunk$set(echo = TRUE)

install.packages("USgas")
## 
## The downloaded binary packages are in
##  /var/folders/4l/182nghl547v3mxtj6p_9dgph0000gn/T//Rtmp6K08Ut/downloaded_packages
library(USgas)
install.packages("fpp3")
## 
## The downloaded binary packages are in
##  /var/folders/4l/182nghl547v3mxtj6p_9dgph0000gn/T//Rtmp6K08Ut/downloaded_packages
library(fpp3)
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.3
## ✔ dplyr       1.1.3     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.2     ✔ fable       0.3.3
## ✔ ggplot2     3.4.2     ✔ fabletools  0.3.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
#library(tidyverse)

Problem 2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

?aus_production
?pelt
?gafa_stock
?vic_elec

What is the time interval of each series?

  1. aus_production –> quarterly
  2. pelt –> yearly
  3. gafa_stock –> irregular trading days
  4. vic_elec –> half hourly

Use autoplot() to produce a time plot of each series.

aus_production %>% autoplot(Bricks)
## Warning: Removed 20 rows containing missing values (`geom_line()`).

pelt %>% autoplot(Lynx)

gafa_stock %>% autoplot(Close)

vic_elec %>% autoplot(Demand)

For the last plot, modify the axis labels and title.

vic_elec %>% 
  autoplot(Demand) +
  labs(title = "Total Electricity Demand in MWh",
       subtitle = "Victoria, Australia",
       y = "Demand (MWh)",
       x = "Time (30-min)")

Problem 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>% filter(Symbol == 'AAPL') %>% slice(which.max(Close))
## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
gafa_stock %>% filter(Symbol == 'GOOG') %>% slice(which.max(Close))
## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close  Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
## 1 GOOG   2018-07-26  1251 1270. 1249. 1268.     1268. 2405600
gafa_stock %>% filter(Symbol == 'AMZN') %>% slice(which.max(Close))
## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close  Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
## 1 AMZN   2018-09-04 2026. 2050.  2013 2040.     2040. 5721100
gafa_stock %>% filter(Symbol == 'FB') %>% slice(which.max(Close))
## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 FB     2018-07-25  216.  219.  214.  218.      218. 58954200
  1. AAPL - 10/3/2018
  2. GOOG - 7/26/2018
  3. AMZN - 9/4/2018
  4. FB - 7/25/2018

Problem 2.3

tute1 <- readr::read_csv("Homework #1/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mytimeseries <- tute1 %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index=Quarter)


mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x=Quarter,y = value, color = name)) +
  geom_line() +
  facet_grid(name ~ ., scale = "free_y")

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x=Quarter,y = value, color = name)) +
  geom_line() 

When you don’t include facet_grid() in the last chart, each of the series are included in the same chart, and identified using a color and legend.

Problem 2.4

The USgas package contains data on the demand for natural gas in the US.

us_total %>%
  as_tsibble(index=year,
             key=state)
## # A tsibble: 1,266 x 3 [1Y]
## # Key:       state [53]
##     year state        y
##    <int> <chr>    <int>
##  1  1997 Alabama 324158
##  2  1998 Alabama 329134
##  3  1999 Alabama 337270
##  4  2000 Alabama 353614
##  5  2001 Alabama 332693
##  6  2002 Alabama 379343
##  7  2003 Alabama 350345
##  8  2004 Alabama 382367
##  9  2005 Alabama 353156
## 10  2006 Alabama 391093
## # ℹ 1,256 more rows
us_total %>% 
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
                      "Connecticut", "Rhode Island")) %>%
  rename("value" = "y") %>%
  ggplot(aes(x = year, y = value, color=state)) +
  geom_line() +
  facet_grid(state ~ ., scales="free_y")

us_total %>% 
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
                      "Connecticut", "Rhode Island")) %>%
  rename("value" = "y") %>%
  ggplot(aes(x = year, y = value, color=state)) +
  geom_line()

Problem 2.5

  1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
tourism_data <- readxl::read_excel("Homework #1/tourism.xlsx")
  1. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
tourism_data <- tourism_data %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index=Quarter,
             key = c("Region", "State", "Purpose"))
  1. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
as_tibble(tourism_data) %>%
  group_by(Region, Purpose) %>%
  summarize(avg_trips = mean(Trips)) %>%
  arrange(desc(avg_trips), by_group=TRUE)
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 304 × 3
## # Groups:   Region [76]
##    Region          Purpose  avg_trips
##    <chr>           <chr>        <dbl>
##  1 Sydney          Visiting      747.
##  2 Melbourne       Visiting      619.
##  3 Sydney          Business      602.
##  4 North Coast NSW Holiday       588.
##  5 Sydney          Holiday       550.
##  6 Gold Coast      Holiday       528.
##  7 Melbourne       Holiday       507.
##  8 South Coast     Holiday       495.
##  9 Brisbane        Visiting      493.
## 10 Melbourne       Business      478.
## # ℹ 294 more rows

The Combination of Sydney and Visiting had the maximum number of overnight trips on average.

  1. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
as_tibble(tourism_data) %>%
  group_by(Quarter, State) %>%
  summarize(`Total Trips` = sum(Trips)) %>%
  tsibble(index=Quarter,
          key = State)
## `summarise()` has grouped output by 'Quarter'. You can override using the
## `.groups` argument.
## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
##    Quarter State `Total Trips`
##      <qtr> <chr>         <dbl>
##  1 1998 Q1 ACT            551.
##  2 1998 Q2 ACT            416.
##  3 1998 Q3 ACT            436.
##  4 1998 Q4 ACT            450.
##  5 1999 Q1 ACT            379.
##  6 1999 Q2 ACT            558.
##  7 1999 Q3 ACT            449.
##  8 1999 Q4 ACT            595.
##  9 2000 Q1 ACT            600.
## 10 2000 Q2 ACT            557.
## # ℹ 630 more rows

Problem 2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

  1. Can you spot any seasonality, cyclicity and trend?

  2. What do you learn about the series?

  3. What can you say about the seasonal patterns?

  4. Can you identify any unusual years?

  5. US Employment

us_employment %>% 
  filter(Title == "Total Private") %>%
  autoplot(Employed) +
  labs(title = "Time Series of US Employment",
       subtitle = "Total Private")

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_season(Employed)

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_subseries(Employed)

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_lag(Employed)

us_employment %>% 
  filter(Title == "Total Private") %>%
  ACF(Employed) %>%
  autoplot()

Looking at the plots of the US Employment data, we see that there has been an overall positive trends during the period covered by the time series. There isn’t any indication in the data that the data is seasonal, as the data appears to have the same characteristics across each of the months. When looking at both the lag and autocorrelation plots, it seems as though the overall data is heavily driven by the overall macro trend - with each data point being highly correlated to the prior/previous data points in the time series.

When looking at the longer time series, you can see there are a number of years in which there’s a pullback in the data. However, these peaks and troughs appear to occur at such irregular intervals that it becomes hard to define them as cyclical periods and/or to define the duration of such cycles if they do exist.

  1. Australian Brick Production
aus_production %>% 
  autoplot(Bricks) +
  labs(title = "Time Series of Australian Brick Production")
## Warning: Removed 20 rows containing missing values (`geom_line()`).

aus_production %>% 
  gg_season(Bricks)
## Warning: Removed 20 rows containing missing values (`geom_line()`).

aus_production %>% 
  gg_subseries(Bricks)
## Warning: Removed 5 rows containing missing values (`geom_line()`).

aus_production %>% 
  gg_lag(Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>% 
  ACF(Bricks) %>%
  autoplot()

The data has exhibited a strong positive trend up until the peak in 1980, and has sense been operating at a monthly negative trend. For the most part the data has demonstrated seasonal behavior, with the production seeing it’s peack in Q3 and it’s nadir in Q1. Additionally the data seems to suggest the presence of cyclicality.

  1. Snowshoe Hare trading
pelt %>%
  autoplot(Hare) +
  labs(title = "Time Series of Snowshoe Hare Trading by Hudson Bay Company",
       subtitle = "1845 - 1935")

#pelt %>% gg_season(Hare, period="year")

pelt %>%
  gg_subseries(Hare)

pelt %>%
  gg_lag(Hare)

pelt %>%
  ACF(Hare) %>%
  autoplot()

When looking at the Snowshoe Hare data, we were unable to view seasonality trends given that the data was not broken down into intra-year seasonal periods (e.g. days, weeks, months, quarter, etc.). However, while there is no evidence of an overall trends, the data is definitely cyclcal. When looking at the ACF plot, we see that the cyclicality appears to take place over a 10 year period - with the data seeing a peak everry 10 years, and the data is also seeing troughts 5-years later, that are occurring every 10-years as well.

  1. Australian Prescription costs
PBS %>% 
  filter(ATC2 == "H02") %>%
  autoplot(Cost) + 
  labs(title = "Time Series of Australian Monthly Medicare Prescription Cost", 
       subtitle = "Limited to H02 Anatomical Therapeutic Chemical Index")

PBS %>% 
  filter(ATC2 == "H02") %>%
  gg_season(Cost)

PBS %>% 
  filter(ATC2 == "H02") %>%
  gg_subseries(Cost)

#PBS %>% 
#  filter(ATC2 == "H02") %>%
#  gg_lag(Cost)

PBS %>% 
  filter(ATC2 == "H02") %>%
  ACF(Cost) %>%
  autoplot()

  1. US motor gasoline supply
us_gasoline %>%
  autoplot(Barrels) +
  labs(title = "TIme Series of US Finished Motor Gasoline Supplied")

us_gasoline %>%
  gg_season(Barrels)

us_gasoline %>%
  gg_subseries(Barrels)

us_gasoline %>%
  gg_lag(Barrels)

us_gasoline %>%
  ACF(Barrels) %>%
  autoplot()

The general trend was positive up until the early 2000s, before declining and then rising again. However, it appears that we are at the beginnings of a decline.