Loading libraries

library(fpp3)

Question 2.1

Part A:

Use autoplot() to plot some of the series in these data sets.

gafa_stock shows the daily open close prices of different stocks. Below is a graph showing the daily high price for each stock.

PBS shows sales data on pharmaceuticals in Australia. Below we have a graph showing the total co-pay costs per month from Jul 1991 to Jun 2008. There seams to be some seasonality within each year and an upward trend for all the years.

vic_elec has data for electric demand in Victoria, Australia from 2012 to 2015. The graph below shows the seasonal trends in electrical demand, with both the coldest and hottest months having the highest demand.

pelt contains data about the pelt sales by year for Hare and Lynx pelts. The graph below shows total pelt sales by year from 1845 to 1935. There seems to be a 10 year cycle for pelt sales with the peak being occurring on the 5th year.

Part B:

What is the time interval of each series?
gafa_stock: Has daily stock prices for days when the stock market is open (no weekends).
PBS: Monthly pharmaceutical sales.
vic_elec: Has data on demand for electricity every 30 minutes.
pelt: Has yearly pelt sales data.

Question 2.2:

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close))
## # A tsibble: 4 x 8 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
## 2 AMZN   2018-09-04 2026. 2050. 2013  2040.     2040.  5721100
## 3 FB     2018-07-25  216.  219.  214.  218.      218. 58954200
## 4 GOOG   2018-07-26 1251  1270. 1249. 1268.     1268.  2405600

AAPL: $232
AMZN: $2040
FB: $218
GOOG: $1268

Question 2.3:

Download the file tute1.csv here, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

Part A:

You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")
tute1
## # A tibble: 100 x 4
##    Quarter    Sales AdBudget   GDP
##    <date>     <dbl>    <dbl> <dbl>
##  1 1981-03-01 1020.     659.  252.
##  2 1981-06-01  889.     589   291.
##  3 1981-09-01  795      512.  291.
##  4 1981-12-01 1004.     614.  292.
##  5 1982-03-01 1058.     647.  279.
##  6 1982-06-01  944.     602   254 
##  7 1982-09-01  778.     531.  296.
##  8 1982-12-01  932.     608.  272.
##  9 1983-03-01  996.     638.  260.
## 10 1983-06-01  908.     582.  280.
## # ... with 90 more rows

Part B:

Convert the data to time series

mytimeseries <- tute1 %>%
  mutate(Quarter = yearmonth(Quarter)) %>%
  as_tsibble(index = Quarter)

Part C:

Construct time series plots of each of the three series

Without facet_grid each line gets placed on the same graph. Rather than 3 graphs being created (AdBudget, GDP, Sales) only one graph gets created.

Question 2.4:

The USgas package contains data on the demand for natural gas in the US.

Part A:

Install the USgas package.

# install.packages("USgas")
library(USgas)

Part B:

Create a tsibble from us_total with year as the index and state as the key.

us_total <- us_total %>%
  as_tsibble(key = state,
             index = year)

Part C:

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

The graph above shows the annual natural gas consumption by state in New England. We can see that the two biggest consumers are Connecticut and Massachusetts. They both seem to have an upward trend over the years, while the other states seem to be on a downward trend or have no trend at all.

Question 2.5

Part A:

Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

tourism <- readxl::read_excel('tourism.xlsx')

Part B:

Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_tsib <- tourism %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region,State,Purpose),
             index = Quarter)

Part C:

Find what combination of Region and Purpose had the maximum number of overnight trips on average.

head(tourism %>%
  group_by(Region,Purpose) %>%
  summarise(TripsAvg = mean(Trips)) %>%
  arrange(desc(TripsAvg)), n = 1)
## # A tibble: 1 x 3
## # Groups:   Region [1]
##   Region Purpose  TripsAvg
##   <chr>  <chr>       <dbl>
## 1 Sydney Visiting     747.

Part D:

Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism_tsib_2 <- tourism %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  select(-c(Region,Purpose)) %>%
  group_by(Quarter,State) %>%
  summarise(Trips = sum(Trips)) %>%
  as_tsibble(key = c(State),
             index = Quarter)

tourism_tsib_2
## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
## # Groups:    @ Quarter [80]
##    Quarter State Trips
##      <qtr> <chr> <dbl>
##  1 1998 Q1 ACT    551.
##  2 1998 Q2 ACT    416.
##  3 1998 Q3 ACT    436.
##  4 1998 Q4 ACT    450.
##  5 1999 Q1 ACT    379.
##  6 1999 Q2 ACT    558.
##  7 1999 Q3 ACT    449.
##  8 1999 Q4 ACT    595.
##  9 2000 Q1 ACT    600.
## 10 2000 Q2 ACT    557.
## # ... with 630 more rows

Question 2.8:

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Using the autoplot function we can clearly see an upward trend as time goes on, however, it is somewhat hard to see any time of cyclical or seasonal patterns.

Looking at the gg_season and gg_subseries plots we can see a spike in sales during the holiday season starting around November and into December and then a dip in January. We also see a spike in sales around March, but I don’t have an explanation for that.

Finally, examining gg_lag, but more so, the ACF graph, we can see the trend since the ACF slowly decreases as the lags increase. We can also see the slight seasonality with the small spikes every 12 months giving the ACF graph a “scalloped” shape.