Homework 1 Timeseries

Problems 2.1,2.2,2.3,2.4,2.5 and 2.8

Packages

library(fpp3)
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble      3.1.8     ✔ tsibble     1.1.3
## ✔ dplyr       1.1.0     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.0
## ✔ lubridate   1.9.1     ✔ fable       0.3.2
## ✔ ggplot2     3.4.0     ✔ fabletools  0.3.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(readr)

Problem 2.1

Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.

help(gafa_stock)

Help function gafa_stock

The help function provides a detailed description of the gafa_stock data set. The data sets description is that the data is historical stock prices from 2014-18 for Apple,Amazon, Facebook(renamed Meta in October 2021) and Google (Google reorganized into Alphabet Inc, and has two main companies that trade under the symbols GOOG and GOOGL). This particular data set is using the GOOG trading symbol.

The format of the data set is timeseries of class tibble.

The details state that gafa_stock is a tsibble containing data on irregular trading days with the fields of “Open”,“High”,“Low”,“Close”, “Adj_Close” and “Volume”. Each stock is identified by one key, which is the ticker symbol for the stock.

The souce of the data is Yahoo Finance Historical Data.

Plotting gafa_stock, Visual notes

This plot style isn’t good for this group of data because of the scale of prices are off. This makes the plot show more details for GOOG and AMZN than it does for AAPL and FB. This plot is basically useless for gaining any visual data about AAPL and FB. For AMZN and GOOG, there does seem to be a long term trend in the prices. The data does not look to be seasonal, rather cyclical. During the last quarter of 2015 into the first quarter of 2016 and 2017 into 2018 the prices of these two stocks sink, then rise. While during the ending of 2014 and 2016 the movements are much less pronounced. While the last half of 2018 has a very steep drop.

autoplot(gafa_stock,Close)
#### Help function PBS

The PBS data sets is of the monthly medicare prescription data in Australia. It is tsibble with two values, one being “Scripts”, the other “Cost”(in $AUD).

The format of the data set is timeseries of class tibble.

The data is disaggregated using four keys:

Concession: Concessional scripts are given to pensioners, unemployed, dependents, and other card holders

Type: Co-payments are made until an individual’s script expenditure hits a threshold ($290.00 for concession, $1141.80 otherwise). Safety net subsidies are provided to individuals exceeding this amount.

ATC1: Anatomical Therapeutic Chemical index (level 1)

ATC2: Anatomical Therapeutic Chemical index (level 2)

help(PBS)

Plotting PBS, Visual notes

Using auto plot straight up on the PBS data set won’t work. The data set needs to be grouped in some fashion. The quickest and easiest grouping is by total cost by month.

This data set looks to have seasonality as the data is a repeating pattern that is increasing and decreasing in similar intervals.

PBS %>%
  summarise(TotalC = sum(Cost)) %>%
  autoplot(TotalC) +
  labs(title = "Total Costs of Scripts",
       y = "Total Cost")
#### Help function vic_elec

vic_elec is a half-hourly tsibble with three fields

Demand is the total electricity demand in MWh.

Temperature is the temperature of Melbourne, Australia.

Holiday: Indicator for if that day is a public holiday.

The format of the data is a time series of class tsibble.

This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MWh, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MWh, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).

The source of the data is Australian Energy Market Operator.

help(vic_elec)

Plotting vic_elec, Visual notes

This time series appears to have seasonality, as the data increases and decreases in regular intervals. There might be a trend forming with the peaks of each year using more power.

autoplot(vic_elec,Demand)

Help function pelt

This dataset contains the Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935.

The format is time series of class tsibble

The pelt data is an annual tsibble with two values Hare and Lynx that represents how many pelts were traded.

the source of data is the Hudson Bay Company

help(pelt)

Plotting pelt, Visual notes

This time series appears to have seasonality, as the data increases and decreases in regular intervals. There also appears to be a long term increasing trend and a long term decreasing trend.

Problem 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

During the timeframe of 2014 to 2018, the peak of this group of stocks all happened in 2018 between the 3rd and 4th quarters.

The peak of Facebook and Google happened a day apart, with Facebook peaking on July 25, 2018 and Google peaking on July 26, 2018.

Apple peaked the latest of the group, peaking on October 3, 2018. While Amazon peaked on September 4, 2018.

gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  select(Symbol, Date)
## # A tsibble: 4 x 2 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date      
##   <chr>  <date>    
## 1 AAPL   2018-10-03
## 2 AMZN   2018-09-04
## 3 FB     2018-07-25
## 4 GOOG   2018-07-26

Problem 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  1. You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)

  1. Convert the data to time series

tute_series <- tute1 %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter)

  1. Construct time series plots of each of the three series

tute_series %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

Facet creates subplots, which plots every symbol in its own separate plot, while using the same axis.

Without the facet being used, all the plots are on the same plot.

By using the subplot you gain more accurate information as you can see a more precise scale of each plot.

tute_series %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()
### Problem 2.4

The USgas package contains data on the demand for natural gas in the US.

  1. Install the USgas package.

#install.packages('USgas')
library(USgas)

  1. Create a tsibble from us_total with year as the index and state as the key.

us_total <- us_total %>%
  as_tibble(key = state,
            index = year)

  1. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

    Connecticut has a strong upward trend, although there does not appear to be any seasonality. It also has the second highest amount of natural gas used.

    Maine had peak consumption in 2002. From then on Maines natural gas consumption has been on a downward trend. Maine consumes the second least amount of natural gas out of the New England area states.

    Massachusetts gas consumption has a slight upward trend. It uses the most natural gas of all the New England states.

    New Hampshire gas usage has no real trend of increasing or decreasing, but it does seem to have seaonality where the consumption spikes every three years or so. New Hampshire consumes roughly 55,000, which puts it in the middle of the amount of usage out of New England states.

    Rhode Island’s natural gas consumption peaked in 1998. It bottomed out in 2004 and has increasing steadily since then. Rhode Island uses the third most amount of natural gas, behind Massachusetts and Connecticut

    Vermont has the least amount of natural gas consumption, consuming roughly 14,000 units. Although from 2012 to 2019 its consumption has increased rapidly from roughly 8,000 units to roughly 14,000 units. Almost doubling in consumption in seven years.

us_total %>%
  filter(state %in% c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')) %>%
  ggplot(aes(x = year, y = y, colour = state)) +
  geom_line() +
  facet_grid(state ~., scales = "free_y") +
  labs(title = "Annual Natural Gas Consumption in New England area",
       y = "Consumption")
### Problem 2.5

  1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

library(readxl)
tourism <- readxl::read_excel('tourism.xlsx')

  1. Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_ts<- tourism %>%
  group_by(Region, State) %>%
  summarise(Total_Trips = sum(Trips))
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
tourism_ts
## # A tibble: 76 × 3
## # Groups:   Region [76]
##    Region                     State              Total_Trips
##    <chr>                      <chr>                    <dbl>
##  1 Adelaide                   South Australia         45906.
##  2 Adelaide Hills             South Australia          2299.
##  3 Alice Springs              Northern Territory       4529.
##  4 Australia's Coral Coast    Western Australia       15167.
##  5 Australia's Golden Outback Western Australia       15017.
##  6 Australia's North West     Western Australia       13067.
##  7 Australia's South West     Western Australia       41825.
##  8 Ballarat                   Victoria                11017.
##  9 Barkly                     Northern Territory       1388.
## 10 Barossa                    South Australia          3850.
## # … with 66 more rows

  1. Find what combination of Region and Purpose had the maximum number of overnight trips on average.

highest_avg_rg_purp <-  tourism %>%
  group_by(Region, Purpose) %>%
  mutate(Avg_Trips = mean(Trips)) %>%
  filter(Avg_Trips == max(Avg_Trips)) %>%
  distinct(Region, Purpose,Avg_Trips)%>%
  arrange(desc(Avg_Trips))

highest_avg_rg_purp
## # A tibble: 304 × 3
## # Groups:   Region, Purpose [304]
##    Region          Purpose  Avg_Trips
##    <chr>           <chr>        <dbl>
##  1 Sydney          Visiting      747.
##  2 Melbourne       Visiting      619.
##  3 Sydney          Business      602.
##  4 North Coast NSW Holiday       588.
##  5 Sydney          Holiday       550.
##  6 Gold Coast      Holiday       528.
##  7 Melbourne       Holiday       507.
##  8 South Coast     Holiday       495.
##  9 Brisbane        Visiting      493.
## 10 Melbourne       Business      478.
## # … with 294 more rows

  1. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

total_trips <- tourism %>%
  group_by(Region) %>%
  mutate(Total_Trips =sum(Trips)) %>%
  distinct(Region,Total_Trips)%>%
  arrange(desc(Total_Trips))
total_trips
## # A tibble: 76 × 2
## # Groups:   Region [76]
##    Region           Total_Trips
##    <chr>                  <dbl>
##  1 Sydney               161607.
##  2 Melbourne            136170.
##  3 Brisbane              98485.
##  4 North Coast NSW       87675.
##  5 Gold Coast            70404.
##  6 South Coast           65306.
##  7 Experience Perth      62743.
##  8 Sunshine Coast        57848.
##  9 Hunter                56837.
## 10 Adelaide              45906.
## # … with 66 more rows

Problem 2.8

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

head(aus_retail,10)
## # A tsibble: 10 x 5 [1M]
## # Key:       State, Industry [1]
##    State                        Industry                Serie…¹    Month Turno…²
##    <chr>                        <chr>                   <chr>      <mth>   <dbl>
##  1 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Apr     4.4
##  2 Australian Capital Territory Cafes, restaurants and… A33498… 1982 May     3.4
##  3 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Jun     3.6
##  4 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Jul     4  
##  5 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Aug     3.6
##  6 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Sep     4.2
##  7 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Oct     4.8
##  8 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Nov     5.4
##  9 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Dec     6.9
## 10 Australian Capital Territory Cafes, restaurants and… A33498… 1983 Jan     3.8
## # … with abbreviated variable names ¹​`Series ID`, ²​Turnover
set.seed(2151)
aus_retail <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Explore your chosen retail time series using the following functions:

autoplot()

autoplot(aus_retail,Turnover)

gg_season()

aus_retail %>% gg_season(Turnover)

gg_subseries()

aus_retail %>% gg_subseries(Turnover)

gg_lag()

aus_retail %>% gg_lag(Turnover,geom='point')

ACF() %>% autoplot()

aus_retail %>% ACF(Turnover) %>% autoplot()

  1. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

    In the autoplot a pattern can be seen. There was a long term upward trend from 1982 until 2010. Then there was a drop off in of roughly 30% to 40% from 2010 to 2012. From 2012 until 2018 Australian retail has been on another long term trend.

    Seasonality also looks to be in the autoplot, but becomes more apparent in the gg_seasonality plot. Where retail tends to drop from Jan to Feb, then rises in March. From March until June retail tends to drop, only to rise until August. Then it proceeds to drop until November and December when retail makes sharp gains. Although there are years that buck this seasonal trend, it is not easy to tell if that bucking is in regular intervals, such as every three years.

    Due to the bucking in the seasonality data I would assume this is due to business cycles, national economics or possibly weather cycles.

    The acf plot shows us that there are strong correlations from zero lags up until 26 lags. This further validates that there are strong trends, seasonality and cyclical patterns in the data.