Objective

Assignment 1 involves answering questions 2.1, 2.2, 2.3, 2.4, 2.5, 2.8 from the textbook Forecasting: principles and practice by Rob J Hyndman and George Athanasopoulos.

Exercise 2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.


Bricks from aus_production

The aus_production dataset looks at the Quartery production of specific commordities in Australia. This includes Beer, Tobacco, Bricks, Cement, Electricity, and Gas.

The time interval is Quarterly.

aus_production
## # A tsibble: 218 x 7 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
##  1 1956 Q1   284    5225    189    465        3923     5
##  2 1956 Q2   213    5178    204    532        4436     6
##  3 1956 Q3   227    5297    208    561        4806     7
##  4 1956 Q4   308    5681    197    570        4418     6
##  5 1957 Q1   262    5577    187    529        4339     5
##  6 1957 Q2   228    5651    214    604        4811     7
##  7 1957 Q3   236    5317    227    603        5259     7
##  8 1957 Q4   320    6152    222    582        4735     6
##  9 1958 Q1   272    5758    199    554        4608     5
## 10 1958 Q2   233    5641    229    620        5196     7
## # ℹ 208 more rows
aus_production <- aus_production  %>%
  select(Quarter, Bricks)
autoplot(aus_production, Bricks) +
  labs(y = "$ (hundreds)",
      title = "Quartery Production of Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

Lynx from pelt

The pelt dataset contains the Hudson Bay Company trading records for Snowshow Hare and Canadian Lynx furs.

The time interval is yearly.

pelt
## # A tsibble: 91 x 3 [1Y]
##     Year  Hare  Lynx
##    <dbl> <dbl> <dbl>
##  1  1845 19580 30090
##  2  1846 19600 45150
##  3  1847 19610 49150
##  4  1848 11990 39520
##  5  1849 28040 21230
##  6  1850 58000  8420
##  7  1851 74600  5560
##  8  1852 75090  5080
##  9  1853 88480 10170
## 10  1854 61280 19600
## # ℹ 81 more rows
pelt_l <- pelt %>%
  select(Year, Lynx)
autoplot(pelt_l, Lynx) +
  labs(y = "$ (thousands)",
    title = "Trading Numbers for Canadian Lynx")

Close from gafa_stock

The gafa_stock dataset looks at stock prices from 2014-2018 for Google, Amazon, Facebook and Apple.

The time interval is daily.

gafa_stock
## # A tsibble: 5,032 x 8 [!]
## # Key:       Symbol [4]
##    Symbol Date        Open  High   Low Close Adj_Close    Volume
##    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
##  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
##  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
##  3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
##  4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
##  5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
##  6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
##  7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
##  8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200
##  9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400
## 10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700
## # ℹ 5,022 more rows
gafa_stock <- gafa_stock %>%
  select(Date, Close)
autoplot(gafa_stock, Close) +
  labs(y = "Closing Price",
    title = "Closing Price for GAFA Stocks")

Demand from vic_elec

Vic_elec contains half-hourly electricity demand for Victoria, Australia.

The time interval is half-hourly

vic_elec
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
##    Time                Demand Temperature Date       Holiday
##    <dttm>               <dbl>       <dbl> <date>     <lgl>  
##  1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
##  2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
##  3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
##  4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
##  5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
##  6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE   
##  7 2012-01-01 03:00:00  3694.        20.1 2012-01-01 TRUE   
##  8 2012-01-01 03:30:00  3562.        19.6 2012-01-01 TRUE   
##  9 2012-01-01 04:00:00  3433.        19.1 2012-01-01 TRUE   
## 10 2012-01-01 04:30:00  3359.        19.0 2012-01-01 TRUE   
## # ℹ 52,598 more rows
vic_elec <- vic_elec %>%
  select(Time, Demand)
autoplot(vic_elec, Demand) +
  labs(y = "Demand in MWh",
      title = "Electricity Demand for Victoria")

Exercise 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

max_close <- gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close, na.rm = TRUE)) %>%
  select(Symbol, Date, Close)

max_close
## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

Exercise 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents.

  1. You can read the data into R with the following script:
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
  1. Convert the data to time series
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)
  1. Construct time series plots of each of the three series
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

  1. Check what happens when you don’t include facet_grid().

When you don’t include facet_grid, all three variables share the same y-axis

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() 

Exercise 2.4

The USgas package contains data on the demand for natural gas in the US.

  1. Install the USgas package.

  2. Create a tsibble from us_total with year as the index and state as the key.

c.Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island)

colnames(us_total)
## [1] "year"  "state" "y"
us_total <- us_total %>%
  as_tsibble(key = c(state),
             index = year)  
us_total_ne <- us_total %>%
  filter(state == c("Maine","Vermont","New Hampshire","Massachusetts","Connecticut", "Rhode Island"))
autoplot(us_total_ne, y) +
  labs(y = "Gas Consumption",
       title = "Annual Gas Consumption ")

Exercise 2.5

  1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

  2. Create a tsibble which is identical to the tourism tsibble from the tsibble package.

  3. Find what combination of Region and Purpose had the maximum number of overnight trips on average.

  4. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

The combination of Melbourne (Region) and Visiting (Purpose) had the maximum number of overnight trips on average (985).

tourismxl <- readxl::read_excel("tourism.xlsx")
tourismxl <- tourismxl %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, State, Purpose), 
             index = Quarter)
tourismxl
## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Quarter Region   State           Purpose  Trips
##      <qtr> <chr>    <chr>           <chr>    <dbl>
##  1 1998 Q1 Adelaide South Australia Business  135.
##  2 1998 Q2 Adelaide South Australia Business  110.
##  3 1998 Q3 Adelaide South Australia Business  166.
##  4 1998 Q4 Adelaide South Australia Business  127.
##  5 1999 Q1 Adelaide South Australia Business  137.
##  6 1999 Q2 Adelaide South Australia Business  200.
##  7 1999 Q3 Adelaide South Australia Business  169.
##  8 1999 Q4 Adelaide South Australia Business  134.
##  9 2000 Q1 Adelaide South Australia Business  154.
## 10 2000 Q2 Adelaide South Australia Business  169.
## # ℹ 24,310 more rows
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Quarter Region   State           Purpose  Trips
##      <qtr> <chr>    <chr>           <chr>    <dbl>
##  1 1998 Q1 Adelaide South Australia Business  135.
##  2 1998 Q2 Adelaide South Australia Business  110.
##  3 1998 Q3 Adelaide South Australia Business  166.
##  4 1998 Q4 Adelaide South Australia Business  127.
##  5 1999 Q1 Adelaide South Australia Business  137.
##  6 1999 Q2 Adelaide South Australia Business  200.
##  7 1999 Q3 Adelaide South Australia Business  169.
##  8 1999 Q4 Adelaide South Australia Business  134.
##  9 2000 Q1 Adelaide South Australia Business  154.
## 10 2000 Q2 Adelaide South Australia Business  169.
## # ℹ 24,310 more rows
tourismxl_2 <- tourismxl %>%
    group_by(Region, Purpose) %>%
    summarize(Average_trips = mean(Trips, na.rm = TRUE)) %>%
    ungroup() %>%
    filter(Average_trips == max(Average_trips))

print(tourismxl_2)
## # A tsibble: 1 x 4 [1Q]
## # Key:       Region, Purpose [1]
##   Region    Purpose  Quarter Average_trips
##   <chr>     <chr>      <qtr>         <dbl>
## 1 Melbourne Visiting 2017 Q4          985.
state_total_tsibble <- tourismxl %>%
  group_by(State) %>%  
  summarize(Total_Trips = sum(Trips, na.rm = TRUE)) %>%
  as_tsibble(index = Quarter, key = State)

print(state_total_tsibble)
## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
##    State Quarter Total_Trips
##    <chr>   <qtr>       <dbl>
##  1 ACT   1998 Q1        551.
##  2 ACT   1998 Q2        416.
##  3 ACT   1998 Q3        436.
##  4 ACT   1998 Q4        450.
##  5 ACT   1999 Q1        379.
##  6 ACT   1999 Q2        558.
##  7 ACT   1999 Q3        449.
##  8 ACT   1999 Q4        595.
##  9 ACT   2000 Q1        600.
## 10 ACT   2000 Q2        557.
## # ℹ 630 more rows

Exercise 2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

“Total Private” Employed from us_employment

Looking specifically at “Total Private” Employed individuals, I can observe the following:

Repeated spikes and drops on specific times of the year, there is over-all long term growth in the chart, and occasional cycles.

There is seasonality, trend and cyclic behavior observed. January, in particular appears to be a down year each year.

An unusual year is the dip in 2008 which corresponds to the recession of 2008.

glimpse(us_employment)
## Rows: 143,412
## Columns: 4
## Key: Series_ID [148]
## $ Month     <mth> 1939 Jan, 1939 Feb, 1939 Mar, 1939 Apr, 1939 May, 1939 Jun, …
## $ Series_ID <chr> "CEU0500000001", "CEU0500000001", "CEU0500000001", "CEU05000…
## $ Title     <chr> "Total Private", "Total Private", "Total Private", "Total Pr…
## $ Employed  <dbl> 25338, 25447, 25833, 25801, 26113, 26485, 26481, 26848, 2746…
us_employment_tp <- us_employment %>%
  filter(Title == "Total Private")
autoplot(us_employment_tp,Employed) +
  labs(y = "Employment",
       title = "US Employment Data")

us_employment_tp %>%
  gg_season(Employed, labels ="both") +
  labs(y = "Employment",
       title = "Seasonal plot:US Employment Data")

us_employment_tp %>%
  gg_subseries(Employed) +
  labs(y = "Employment",
       title = "US Employment Data")

us_employment_tp %>%
  gg_lag(Employed, geom = "point") +
  labs(x = "lag(Employed, k)")

us_employment_tp %>%
  ACF(Employed, lag_max = 9)
## # A tsibble: 9 x 3 [1M]
## # Key:       Series_ID [1]
##   Series_ID          lag   acf
##   <chr>         <cf_lag> <dbl>
## 1 CEU0500000001       1M 0.997
## 2 CEU0500000001       2M 0.993
## 3 CEU0500000001       3M 0.990
## 4 CEU0500000001       4M 0.986
## 5 CEU0500000001       5M 0.983
## 6 CEU0500000001       6M 0.980
## 7 CEU0500000001       7M 0.977
## 8 CEU0500000001       8M 0.974
## 9 CEU0500000001       9M 0.971

Bricks from aus_production

There is seasonality present in the data. As there is a consistent pattern of fluctuations throughout the year. This could be linked to something like weather conditions or general demand.

There isn’t a clear trend as it’s unclear whether there is will be a decline or incline. From 1960 to 1980 however, we can see a clear upward trend however, afterwards there are periods sharp declines and some smaller upward movement.

Cyclic patterns are visible especially after 1980, with the declines and upward movement occuring at irregular intervals.

An usual year is around the early 1980s where there is a sharp decline in production.

glimpse(aus_production)
## Rows: 218
## Columns: 2
## $ Quarter <qtr> 1956 Q1, 1956 Q2, 1956 Q3, 1956 Q4, 1957 Q1, 1957 Q2, 1957 Q3,…
## $ Bricks  <dbl> 189, 204, 208, 197, 187, 214, 227, 222, 199, 229, 249, 234, 20…
autoplot(aus_production,Bricks) +
  labs(y = "Bricks",
       title = "Brick Production in Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_season(Bricks, labels ="both") +
  labs(y = "Bricks",
       title = "Brick Production in Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).

aus_production %>%
  gg_subseries(Bricks) +
  labs(y = "Bricks",
       title = "Brick Production in Australia")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_lag(Bricks, geom = "point") +
  labs(x = "lag(Employed, k)")
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>%
  ACF(Bricks, lag_max = 9)
## # A tsibble: 9 x 2 [1Q]
##        lag   acf
##   <cf_lag> <dbl>
## 1       1Q 0.900
## 2       2Q 0.815
## 3       3Q 0.813
## 4       4Q 0.828
## 5       5Q 0.720
## 6       6Q 0.642
## 7       7Q 0.655
## 8       8Q 0.692
## 9       9Q 0.609

Hare from pelt

There is no evidence for seasonality as there is no annual pattern of fluctuations. There are cycles that span multiple years.

The chart does not show a clear upward or downward trend.

There are however, noticeable cyclical patterns.

Something learned from the series is that the are long term peaks and sharp downward declines. This may be due to the availability of resources.

pelt_h <- pelt %>%
  select(Year, Hare)
autoplot(pelt_h, Hare) +
  labs(y = "Hare",
       title = "Trading Records for Snowshoe Hare")

gg_subseries doesn’t work because it needs multiple categories.

pelt_h %>%
  gg_subseries(Hare) +
  labs(y = "Hares",
       title = "Trading Records for Snowshoe Hare")

pelt_h %>%
  gg_lag(Hare, geom = "point") +
  labs(x = "lag(Hare, k)")

pelt_h %>%
  ACF(Hare, lag_max = 9)
## # A tsibble: 9 x 2 [1Y]
##        lag    acf
##   <cf_lag>  <dbl>
## 1       1Y  0.658
## 2       2Y  0.214
## 3       3Y -0.155
## 4       4Y -0.401
## 5       5Y -0.493
## 6       6Y -0.401
## 7       7Y -0.168
## 8       8Y  0.113
## 9       9Y  0.307

“H02” Cost from PBS

All prescription types display seasonality as there are seasonal drops and rises that occur at times throughout the year.

There appears to be indications of an upward trend for both Concessional/Co-payments/H/H02, and Concessional/Safety net/H/H02.

There aren’t long term cycles outside of the seasonal patterns.

The upward trend for Concessional/Co-payments and Concessional/Safety net indicates there is an increase in growth and overall use. The Concessional category performs the best compared to the General category.

unique(PBS$ATC2)
##  [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A09" "A10" "A11" "A12" "A14"
## [13] "A15" "B01" "B02" "B03" "B05" "C01" "C02" "C03" "C04" "C05" "C07" "C08"
## [25] "C09" "C10" "D"   "D01" "D02" "D04" "D05" "D06" "D07" "D08" "D10" "D11"
## [37] "G01" "G02" "G03" "G04" "H01" "H02" "H03" "H04" "H05" "J01" "J02" "J04"
## [49] "J05" "J06" "J07" "L01" "L02" "L03" "L04" "M01" "M02" "M03" "M04" "M05"
## [61] "N02" "N03" "N04" "N05" "N06" "N07" "P01" "P02" "P03" "R"   "R01" "R03"
## [73] "R05" "R06" "S"   "S01" "S02" "S03" "V01" "V03" "V04" "V06" "V07" "Z"
PBS_H0 <- PBS %>%
  filter(ATC2 == "H02")
autoplot(PBS_H0,Cost) +
  labs(y = "Cost",
       title = "Monthly Medicare for Prescription in Australia")

PBS_H0 %>%
  gg_season(Cost, labels ="both") +
  labs(y = "Prescription Costs",
       title = "Monthly Medicare for Prescription in Australia")

PBS_H0 %>%
  gg_subseries(Cost) +
  labs(y = "Prescription Costs",
       title = "Monthly Medicare for Prescription in Australia")

PBS_H0 %>%
  ACF(Cost, lag_max = 9)
## # A tsibble: 36 x 6 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [4]
##    Concession   Type        ATC1  ATC2       lag   acf
##    <chr>        <chr>       <chr> <chr> <cf_lag> <dbl>
##  1 Concessional Co-payments H     H02         1M 0.834
##  2 Concessional Co-payments H     H02         2M 0.679
##  3 Concessional Co-payments H     H02         3M 0.514
##  4 Concessional Co-payments H     H02         4M 0.352
##  5 Concessional Co-payments H     H02         5M 0.264
##  6 Concessional Co-payments H     H02         6M 0.219
##  7 Concessional Co-payments H     H02         7M 0.253
##  8 Concessional Co-payments H     H02         8M 0.337
##  9 Concessional Co-payments H     H02         9M 0.464
## 10 Concessional Safety net  H     H02         1M 0.710
## # ℹ 26 more rows

Barrels from us_gasoline

There appears to be seasonality as there are periodic flunctuations ever year.

There is also an over upward growth trend.

There appears to be some cyclic behavior around the 2000s.

An unusual year is around 2008 where there is a big drop. This could be tied to the financial recession around this time.

autoplot(us_gasoline, Barrels) +
  labs(y = "Barrels",
       title = "US Finished Motor Gasoline Product Supplied")

us_gasoline %>%
  gg_season(Barrels, labels ="both") +
  labs(y = "Barrels",
       title = "US Finished Motor Gasoline Product Supplied")

us_gasoline %>%
  gg_subseries(Barrels) +
  labs(y = "Barrels",
       title = "US Finished Motor Gasoline Product Supplied")

us_gasoline %>%
  gg_lag(Barrels, geom = "point") +
  labs(x = "lag(Barrels, k)")

us_gasoline %>%
  ACF(Barrels, lag_max = 9)
## # A tsibble: 9 x 2 [1W]
##        lag   acf
##   <cf_lag> <dbl>
## 1       1W 0.893
## 2       2W 0.882
## 3       3W 0.873
## 4       4W 0.866
## 5       5W 0.847
## 6       6W 0.844
## 7       7W 0.832
## 8       8W 0.831
## 9       9W 0.822