Load Libraries

library(tidyverse)
library(fpp3)
library(readxl)

2.1 Exercise

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

?aus_production  # quarterly time interval
?pelt            # annual time interval
?gafa_stock     # irregular trading days from 2014-2018
?vic_elec       # every 30 min daily

The time intervals for aus_production is quarterly time intervals for Beer, Tobacco, Bricks, Cement, Electricity and Gas.

The time interval for pelt is annual interval for trading for Lynx and Hare.

The time interval for gafa_stock is irregular trading days from 2014-2018 for Google, Amazon, Facebook and Apple.

The time interval for vic_elec is every 30 minutes tracking the demand, temperature and if it’s a holiday.

Bricks

aus_production %>%
  autoplot(Bricks) +
  labs( y = "million units",
        title = "Australian clay brick production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

Lynx

pelt %>%
  autoplot(Lynx)+
  labs( y = "Number of trades",
        title = "Annual Canadian Lynx Tradings")

Close

gafa_stock %>%
  autoplot(Close) +
  labs( y = "$US",
        title = "Closing stock price")

Demand

vic_elec %>%
  autoplot(Demand)+
  labs( y = "total electricity demand in MWh",
        title = "Half-hourly electricity demand for Victoria, Australia")

2.2 Exercise

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

# filter for closing price by grouping the symbol
gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  select(Symbol, Date, Close) %>%
  arrange(desc(Close))
## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AMZN   2018-09-04 2040.
## 2 GOOG   2018-07-26 1268.
## 3 AAPL   2018-10-03  232.
## 4 FB     2018-07-25  218.

The highest closing price was Amazon with a closing price of 2039.51 on 09/04/2018 followed by Google, Apple and Facebook.

2.3 Exercise

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  1. You can read the data into R with the following script:
tute1 <- read_csv("https://raw.githubusercontent.com/AnnaMoy/Data-624/main/tute1.csv")
  1. Convert the data to time series
mytimeseries <- tute1 %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter)
  1. Construct time series plots of each of the three series

Check what happens when you don’t include facet_grid().

When you take out facet_grid then all the plots will be on one chart instead of their individual charts with their own respective value. This is helpful to use when you have similar data but with different ranges and the data can be seen individually instead of all on the same chart. With all the data on the same chart it’s hard to tell the ranges of each data for Sales, GDP, AdBudget.

2.4 Exercises

The USgas package contains data on the demand for natural gas in the US.

  1. Install the USgas package.

  2. Create a tsibble from us_total with year as the index and state as the key.

  3. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

library(USgas)
# create tsibble
#index = time, key = categorical information
us_total <-us_total %>%
  as_tsibble(key = state,
             index = year)

# filter for particular states
us_total %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
  autoplot(y) +
  labs( y = "gas consumption",
        title = "Annual Natural gas consumption by state")

2.5 Exercises

  1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
# read the excel file from computer
tour <- read_excel("/Users/zhianna/Downloads/tourism.xlsx")
  1. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
# create tsibble identical to tourism tsibble
tour1 <- tour %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, State, Purpose),
             index = Quarter)
  1. Find what combination of Region and Purpose had the maximum number of overnight trips on average.

The maximum number of overnight trips on average was Region: Melbourne and Purpose: Visting.

#Combination of Purpose and Region to find max avg trips
purpose_region <- tour1 %>%
  group_by(Purpose, Region) %>%
  summarise(avg_trips = mean(Trips)) %>%
  arrange(desc(avg_trips))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Purpose`, `Region`, `Quarter` first.
purpose_region
## # A tsibble: 24,320 x 4 [1Q]
## # Key:       Purpose, Region [304]
## # Groups:    Purpose [4]
##    Purpose  Region          Quarter avg_trips
##    <chr>    <chr>             <qtr>     <dbl>
##  1 Visiting Melbourne       2017 Q4      985.
##  2 Business Sydney          2001 Q4      948.
##  3 Visiting Sydney          2016 Q4      921.
##  4 Visiting Sydney          2017 Q4      920.
##  5 Visiting Sydney          2017 Q1      916.
##  6 Holiday  South Coast     1998 Q1      915.
##  7 Holiday  North Coast NSW 2016 Q1      906.
##  8 Business Sydney          2017 Q3      892.
##  9 Business Sydney          2017 Q2      884.
## 10 Visiting Sydney          2013 Q4      882.
## # ℹ 24,310 more rows
  1. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

The combine of Purposes, Region and States had a total trip of 985.2784.

# New tsibble Purpose and Regions total trips by state
tour2 <- tour %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, Purpose),
             index = Quarter) %>%
  group_by(Purpose, Region, State) %>%
  summarise(sum_trips = sum(Trips)) %>%
  arrange(desc(sum_trips))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Purpose`, `Region`, `State`, `Quarter` first.
tour2
## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Purpose, Region, State [304]
## # Groups:    Purpose, Region [304]
##    Purpose  Region          State           Quarter sum_trips
##    <chr>    <chr>           <chr>             <qtr>     <dbl>
##  1 Visiting Melbourne       Victoria        2017 Q4      985.
##  2 Business Sydney          New South Wales 2001 Q4      948.
##  3 Visiting Sydney          New South Wales 2016 Q4      921.
##  4 Visiting Sydney          New South Wales 2017 Q4      920.
##  5 Visiting Sydney          New South Wales 2017 Q1      916.
##  6 Holiday  South Coast     New South Wales 1998 Q1      915.
##  7 Holiday  North Coast NSW New South Wales 2016 Q1      906.
##  8 Business Sydney          New South Wales 2017 Q3      892.
##  9 Business Sydney          New South Wales 2017 Q2      884.
## 10 Visiting Sydney          New South Wales 2013 Q4      882.
## # ℹ 24,310 more rows

2.8 Exercises

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Employed

us_employment %>%
  filter(Title == "Total Private") %>%
  autoplot(Employed) +
  ggtitle("Total Private Employed")

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_season(Employed) +
  ggtitle("Total Private Employed")

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_subseries(Employed) +
  ggtitle("Total Private Employed")

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_lag(Employed) +
  ggtitle("Total Private Employed")

us_employment %>%
  filter(Title == "Total Private") %>%
  ACF(Employed) %>%
  autoplot() +
  ggtitle("Total Private Employed")

The employment data shows a strong increasing trend and no seasonality as the months go it’s the same. The usual pattern in 2010 there is a bigger dip in the data. There is strong correlation.

Bricks

aus_production %>%
  autoplot(Bricks) +
  ggtitle("Production on Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_season(Bricks) +
  ggtitle("Production on Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_subseries(Bricks) +
  ggtitle("Production on Bricks")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_lag(Bricks) +
  ggtitle("Production on Bricks")
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>%
  ACF(Bricks) %>%
  autoplot() +
  ggtitle("Production on Bricks")

The data in Bricks is a cyclic behavior with seasonality behavior. The data shows bricks production is always increasing but is more recent years it has a downward trend and shows typically in Q1 production is slow and increases over time. There seems to be an increase in Q2 and Q3 during the years. There are some unusual drip in 1982 since it drops by a lot. The correlation is slow decreasing.

PBS

PBS %>%
  filter(ATC2 == "H02") %>%
  autoplot(Cost) +
   ggtitle("H02 cost")

PBS %>%
  filter(ATC2 == "H02") %>%
  gg_season(Cost) +
  ggtitle("H02 cost")

The cost of H02 is a upward trend with cyclic behavior and seasonality. The data fluctuates a lot depending on the concession and type. There is seasonality on the data for general/concessional safety net H02 as it is high in Jan and then decreases then increase as it’s later on the year. Some shows a big decrease in cost in Feb while others are showing an increase. Was only able to run gg_season for the data.

Pelt

pelt %>%
  autoplot(Hare) +
  ggtitle("Hare Trading Records")

pelt %>%
  gg_subseries(Hare) +
  ggtitle("Hare Trading Records")

pelt %>%
  gg_lag(Hare) +
  ggtitle("Hare Trading Records")

pelt %>%
  ACF(Hare) %>%
  autoplot() +
  ggtitle("Hare Trading Records")

The pelt data shows data going up and down which is cyclic. It shows data is consistently going up and down approximately every 5 years which indicates high and low correlations at times. It has usual increase around 1863 and 1883 which is higher than the other data. The data range is consistent for most with a few outliers. Received an error running gg_season for pelt. The autocorrelation is positive and negative.

Barrels

us_gasoline %>%
  autoplot(Barrels) +
   ggtitle("Barrel Gasoline")

us_gasoline %>%
  gg_season(Barrels)+
  ggtitle("Barrel Gasoline")

us_gasoline %>%
  gg_subseries(Barrels)+
  ggtitle("Barrel Gasoline")

us_gasoline %>%
  gg_lag(Barrels)+
  ggtitle("Barrel Gasoline")

us_gasoline %>%
  ACF(Barrels) %>%
  autoplot()+
  ggtitle("Barrel Gasoline")

The gasoline barrel cost is an upward trend. There is increasing through the years with a dip in 2011 and there is a decrease approximately in Oct typically. The correlation is strong in the beginning and then less as time goes on. There does not appear to be a consistent seasonality as the data is up and down constantly.