library(fpp3)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble      3.3.0     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(tsibble)

2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

  • Use ? (or help()) to find out about the data in each series.
  • What is the time interval of each series?
  • Use autoplot() to produce a time plot of each series.
  • For the last plot, modify the axis labels and title.

Australia Production

This data contains quarterly production of beer, tobacco, bricks, cement, electricty, and gas in Austrailia from 1956 to 2010, sourced from the Australian Bureau of Statistics.

?aus_production
head(aus_production)
## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7
interval(aus_production)
## <interval[1]>
## [1] 1Q
aus_production %>%
  autoplot(Bricks) +
  labs(title = "Bricks Quarterly Production in Australia", 
       x = "Quarter", 
       y = "Bricks (in millions)")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

The chart shows a clear upward trend in brick production from late 1950s, reaching its peak in late 1980s. Following the peak, there is downward trend in brick production. There is evidence of seasonal pattern.

Annual Pelt Trading Reconds

This data contains Hudson Company annual trading records for Snowshoe Hare and Canadian furs from 1845 to 1935.

?pelt
head(pelt)
## # A tsibble: 6 x 3 [1Y]
##    Year  Hare  Lynx
##   <dbl> <dbl> <dbl>
## 1  1845 19580 30090
## 2  1846 19600 45150
## 3  1847 19610 49150
## 4  1848 11990 39520
## 5  1849 28040 21230
## 6  1850 58000  8420
interval(pelt)
## <interval[1]>
## [1] 1Y
pelt %>%
  autoplot(Lynx) +
  labs(title = "Canadian Lynx Pelts Annual Trading Records",
       x = "Year",
       y = "Number of Lynx Pelts")

There is no long term upward or downward trend. The Lynx trading data shows a strong cyclinal pattern with peaks about every 10 years.

Gafa_stock

This data contains historical daily stock prices in $USD from 2014 to 2018 for Google, Amazon, Facebook, and Apples.

?gafa_stock
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close    Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
## 1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
## 2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
## 3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
## 4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
## 5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
## 6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
gafa_stock %>%
  autoplot(Close) +
  labs(title = "Daily Stock Closing Prices from 2014 - 2018", 
       x = "Time", 
       y = "Closing Stock Prices")

All four stocks shows an upward trend, especially in the late 2010s.There is some dips, showing the volatility of the stocks.

Vic_elec

This data contains half-hourly electricity demand in Victoria, Australia.

?vic_elec
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
##   Time                Demand Temperature Date       Holiday
##   <dttm>               <dbl>       <dbl> <date>     <lgl>  
## 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
## 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
## 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
## 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
## 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
## 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE
interval(vic_elec)
## <interval[1]>
## [1] 30m
vic_elec %>%
  autoplot(Demand) +
  labs(title = "Electricity Demand in Victoria, Australia", 
       x = "Time",
       y = "Demand (in MWh)")

The chart is more “noisy” compared to the last 3 charts. Electricity demand in Australia shows a strong seasonal pattern with peaks mid year (summer) and between end of the year and beginning of the next year (winter).There is no clear no term trends over the three year data.

2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close))
## # A tsibble: 4 x 8 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
## 2 AMZN   2018-09-04 2026. 2050. 2013  2040.     2040.  5721100
## 3 FB     2018-07-25  216.  219.  214.  218.      218. 58954200
## 4 GOOG   2018-07-26 1251  1270. 1249. 1268.     1268.  2405600

2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a

You can read the data into R with the following script:

tute1 <- readr::read_csv("https://raw.githubusercontent.com/suswong/DATA-624/refs/heads/main/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)

b

Convert the data to time series

# Code provided by text
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

c

Construct time series plots of each of the three series.

# Code provided by text
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid()

When you do not include the facet_grid(), the three time series will be plotted on a single graph, instead of on three separate panels. This makes it harder to interpret trends as all three time series will be sharing the same y-axis. Different time series usually have different scales. Using the facet_grid() function allows a fair comparison between the time series as it allows the series to keep its scale.

# modifying the code provided by text
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() 

2.4

The USgas package contains data on the demand for natural gas in the US.

a

Install the USgas package.

#install.packages("USgas")
library(USgas)

b

Create a tsibble from us_total with year as the index and state as the key.

This dataset contains information on US annual total natual gas consumption by state between 1997 and 2019.

#?us_total
#head(us_total)

gas_total <- us_total %>%
  as_tsibble(index = year, key = state)

head(gas_total)
## # A tsibble: 6 x 3 [1Y]
## # Key:       state [1]
##    year state        y
##   <int> <chr>    <int>
## 1  1997 Alabama 324158
## 2  1998 Alabama 329134
## 3  1999 Alabama 337270
## 4  2000 Alabama 353614
## 5  2001 Alabama 332693
## 6  2002 Alabama 379343

c

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

gas_total %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
  autoplot() +
  labs(title = "Annual Total Natural Gas Consumption from 1997-2019", 
       x = "Year", 
       y = "Consumption (Million Cubic Feet)")
## Plot variable not specified, automatically selected `.vars = y`

When the annual natural gas consumption by state for the New England area are plot on one graph, we can see Massachusetts shows the highest overall gas consumption among the New England states and Vermont has the lowest consumption.However, it was difficult to read trends in gas consumption for Vermont as it looks like a straight line with no trend

When we plot each state individually, we can see Vermont’s gas consumption has been very gradually and steadily increasing over time. There are no clear trends in Rhode Island consumption but its gas consumption is consistently lower than Connecticut and Massachusetts and higher than Maine, Vermont, and New Hampshire.

gas_total %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
  autoplot() +
  labs(title = "Annual Total Natural Gas Consumption from 1997-2019", 
       x = "Year", 
       y = "Consumption (Million Cubic Feet)") + 
  facet_wrap(~state, scales = "free_y", ncol = 2 ) + 
  theme(legend.position = "none")
## Plot variable not specified, automatically selected `.vars = y`

2.5

a

Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

#install.packages("readxl")
#library(readxl)

# I was unable to get the xlsx raw file link from my GitHub to work so I converted it to csv file. 

tourism_data <- read.csv("https://raw.githubusercontent.com/suswong/DATA-624/refs/heads/main/tourism.csv")

head(tourism_data)
##      Quarter   Region           State  Purpose    Trips
## 1 1998-01-01 Adelaide South Australia Business 135.0777
## 2 1998-04-01 Adelaide South Australia Business 109.9873
## 3 1998-07-01 Adelaide South Australia Business 166.0347
## 4 1998-10-01 Adelaide South Australia Business 127.1605
## 5 1999-01-01 Adelaide South Australia Business 137.4485
## 6 1999-04-01 Adelaide South Australia Business 199.9126

b

Create a tsibble which is identical to the tourism tsibble from the tsibble package.

# Note 
tourism_tsibble <- tourism_data %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))

head(tourism_tsibble)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.

c

Find what combination of Region and Purpose had the maximum number of overnight trips on average.

Tourists visiting Sydney has the highest number of overnight trips on average.

tourism_data %>%
  group_by(Region, Purpose)  %>%
  summarise(avg_trips = mean(Trips)) %>%
  arrange(-avg_trips)
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 304 × 3
## # Groups:   Region [76]
##    Region          Purpose  avg_trips
##    <chr>           <chr>        <dbl>
##  1 Sydney          Visiting      747.
##  2 Melbourne       Visiting      619.
##  3 Sydney          Business      602.
##  4 North Coast NSW Holiday       588.
##  5 Sydney          Holiday       550.
##  6 Gold Coast      Holiday       528.
##  7 Melbourne       Holiday       507.
##  8 South Coast     Holiday       495.
##  9 Brisbane        Visiting      493.
## 10 Melbourne       Business      478.
## # ℹ 294 more rows

d

Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

state_tourism <- tourism_data %>%
  group_by(Quarter, State) %>%
  summarise(total_trips = sum(Trips)) %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter, key = c(State))
## `summarise()` has grouped output by 'Quarter'. You can override using the
## `.groups` argument.
head(state_tourism)
## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
## # Groups:    @ Quarter [6]
##   Quarter State total_trips
##     <qtr> <chr>       <dbl>
## 1 1998 Q1 ACT          551.
## 2 1998 Q2 ACT          416.
## 3 1998 Q3 ACT          436.
## 4 1998 Q4 ACT          450.
## 5 1999 Q1 ACT          379.
## 6 1999 Q2 ACT          558.

2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: Total Private Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

  • Can you spot any seasonality, cyclicity and trend?
  • What do you learn about the series?
  • What can you say about the seasonal patterns?
  • Can you identify any unusual years?

US Employment

There is a strong upward trend in employment over time. This is not surprising as population increase over time, employment will increase too. The gg_season()and gg_subseries() plots shows seasonal pattern of employment increasing from January and peak during the summer months. There is a large dip in employment in 2008-2010 (financial crisis).

?us_employment

total_employed <- us_employment %>%
  filter(Title == "Total Private")

autoplot(total_employed, Employed) +
  labs(title = "Autoplot of Total Private Employed", 
       x = "Month", 
       y = "Total Employed")

gg_season(total_employed, Employed) +
  labs(title = "Seasonal Plot of Total Private Employed", 
       x = "Month", 
       y = "Total Employed")
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

gg_subseries(total_employed, Employed) +
  labs(title = "Seasonal Subseries Plot of Total Private Employed")
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

gg_lag(total_employed, Employed) +
  labs(title = "Lag Plot of Total Private Employed", 
       y = "Total Employed")
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

total_employed %>%
  ACF(Employed) %>%
  autoplot()+
  labs(title = "Autocorrection Plot of Total Private Employed")

Australia Production

Over the years there is a clear overall upward trend in brick production from late 1950s, reaching its peak in late 1980s. Following the peak, there is overall downward trend in brick production.

At the same time, there is also evidence of seasonal pattern. The gg_season()and gg_subseries() plots shows brick production increases from quarter 1 to quarter 3 and decreases after. There are notable dips in mid 1970s and mid 1980s.

aus_production %>%
  autoplot(Bricks) +
  labs(title = "Bricks Quarterly Production in Australia", 
       x = "Quarter", 
       y = "Bricks (in millions)")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries(aus_production, Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_lag(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>%
  ACF(Bricks) %>%
  autoplot()+
  labs(title = "Autocorrection Plot of Total Private Employed")

Annual Pelt Trading Reconds

There is no long term upward or downward trend. The hare trading data shows a strong cyclinal pattern with peaks about every 10 years. Some notable unusual years are high peaks aorund 1865 and 1885.

pelt %>%
  autoplot(Hare) +
  labs(title = "Canadian Hare Pelts Annual Trading Records",
       x = "Year",
       y = "Number of Lynx Pelts")

# This dataset contains yearly data so we cannot use this function here
#gg_season(pelt, Hare)
# This dataset contains yearly data so we cannot use this function here
gg_subseries(pelt, Hare)

gg_lag(pelt, Hare)

pelt %>%
  ACF(Hare) %>%
  autoplot() 

H02 Cost

There is an overall upward trend of the cost of the drug over the years. Seasonal pattern is evident in the gg_season() plot where the cost of the drug peak at the end of the year to the begginnig of the following year. There is sharp drop cost in February and the cost gradually increases to the end of the year. There is an unusual number of sales (dip) in March 2008 where in most cases there is an increase number of sales from February to March.

cost <- PBS %>%
  filter(ATC2 == "H02") %>%
  select(Month, Concession, Type, Cost) %>%
  summarise(TotalC = sum(Cost)) 

autoplot(cost) +
  labs(title = "Autoplot of Total Cost of Drug H02 (Monthly)")
## Plot variable not specified, automatically selected `.vars = TotalC`

gg_season(cost) +
  labs(title = "Seasonal Plot of Total Cost of Drug H02 (Monthly)")
## Plot variable not specified, automatically selected `y = TotalC`

gg_subseries(cost) +
  labs(title = "Seasonal Subseries Plot of Total Cost of Drug H02 (Monthly)")
## Plot variable not specified, automatically selected `y = TotalC`

gg_lag(cost) +
  labs(title = "Lag Plot of Total Cost of Drug H02 (Monthly)")
## Plot variable not specified, automatically selected `y = TotalC`

cost %>%
  ACF(TotalC) %>%
  autoplot() +
  labs(title = "Autcorrection Plot of Total Cost of Drug H02 (Monthly)")

US Gaslone

This time series shows a overall trend upward from 1990s to around 2006, trend downward until mid 2010s, and finally an upward trends. The gg_season()and gg_subseries() plots shows seasonal pattern of increases of gasoline supplied in the summer months and decreases as winter arrives. There are notable unusual dips from 2008 to 2010s (financial crisis).

us_gasoline %>%
  autoplot(Barrels)

gg_season(us_gasoline, Barrels)

gg_subseries(us_gasoline, Barrels)

gg_lag(us_gasoline, Barrels)

us_gasoline %>%
  ACF(Barrels) %>%
  autoplot()