Data624

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'tsibble' was built under R version 4.4.3

## Warning: package 'tsibbledata' was built under R version 4.4.3

## Warning: package 'feasts' was built under R version 4.4.3

## Warning: package 'fabletools' was built under R version 4.4.3

## Warning: package 'fable' was built under R version 4.4.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

Exercise 2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series.
What is the time interval of each series?
Use autoplot() to produce a time plot of each series.
For the last plot, modify the axis labels and title.

a. Use ? (or help()) to find out about the data in each series.

?aus_production

## starting httpd help server ... done

Quarterly production of selected commodities in Australia.

Description: Quarterly estimates of selected indicators of manufacturing production in Australia.

Details: aus_production is a half-hourly tsibble with six values:

Beer: Beer production in megalitres. Tobacco: Tobacco and cigarette production in tonnes. Bricks: Clay brick production in millions of bricks. Cement: Portland cement production in thousands of tonnes. Electricity: Electricity production in gigawatt hours. Gas: Gas production in petajoules.

?pelt

Pelt trading records

Description: Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company.

Details: pelt is an annual tsibble with two values:

Hare: The number of Snowshoe Hare pelts traded. Lynx: The number of Canadian Lynx pelts traded.

?gafa_stock

GAFA stock prices

Description: Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD.

Details: gafa_stock is a tsibble containing data on irregular trading days:

Open: The opening price for the stock. High: The stock’s highest trading price. Low: The stock’s lowest trading price. Close: The closing price for the stock. Adj_Close: The adjusted closing price for the stock. Volume: The amount of stock traded.

?vic_elec

Half-hourly electricity demand for Victoria, Australia

Description: vic_elec is a half-hourly tsibble with three values:

Demand: Total electricity demand in MWh. Temperature: Temperature of Melbourne (BOM site 086071). Holiday: Indicator for if that day is a public holiday.

Details: This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MWh, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MWh, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).

b. What is the time interval of each series?

aus_production is a quarterly. pelt is an annual. gafa_stock is daily. vic_elec is a half-hourly.

c. Use autoplot() to produce a time plot of each series.

aus_production %>%
  select(Bricks) %>%
  autoplot(Bricks) +
  labs(title = "Quarterly production of selected commodities in Australia") +
  xlab("Time") +
  ylab("Clay brick production in millions of bricks")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

An error message appeared that indicated 20 missing values: G2;H2;Warningh: [38;5;232mRemoved 20 rows containing missing values or values outside the scale range (geom_line()).[39mg

I removed the missing values and reran the plot:

aus_production %>%
  filter(!is.na(Bricks)) %>%
  autoplot(Bricks) +
  labs(title = "Clay brick production in millions of bricks") +
  xlab("Time") +
  ylab("Brick Production")

pelt %>%
  select(Lynx) %>%
  autoplot(Lynx) +
  labs(title = "Pelt trading records") +
  xlab("Time") +
  ylab("Number of Canadian Lynx pelts traded")

gafa_stock %>%
  select(Close) %>%
  autoplot(Close) +
  labs(title = "GAFA daily stock prices for Google, Amazon, Facebook and Apple") +
  xlab("Time") +
  ylab("Stock Prices")

d. For the last plot, modify the axis labels and title.

I did this for all plots.

vic_elec %>%
  select(Demand) %>%
  autoplot(Demand) +
  labs(title = "Half-hourly electricity demand for Victoria, Australia") +
  xlab("Time") +
  ylab("Total electricity demand in MWh")

Exercise 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

The first task is to figure out the variable name that the stocks are classified in.

summary(gafa_stock)

##     Symbol               Date                 Open              High        
##  Length:5032        Min.   :2014-01-02   Min.   :  54.02   Min.   :  54.94  
##  Class :character   1st Qu.:2015-04-02   1st Qu.: 118.33   1st Qu.: 119.25  
##  Mode  :character   Median :2016-06-30   Median : 257.59   Median : 261.94  
##                     Mean   :2016-07-01   Mean   : 465.75   Mean   : 469.95  
##                     3rd Qu.:2017-09-29   3rd Qu.: 746.53   3rd Qu.: 750.96  
##                     Max.   :2018-12-31   Max.   :2038.11   Max.   :2050.50  
##       Low              Close           Adj_Close           Volume         
##  Min.   :  51.85   Min.   :  53.53   Min.   :  53.53   Min.   :     7900  
##  1st Qu.: 117.35   1st Qu.: 118.54   1st Qu.: 115.48   1st Qu.:  2519975  
##  Median : 256.89   Median : 259.51   Median : 258.61   Median : 10804400  
##  Mean   : 460.92   Mean   : 465.56   Mean   : 464.24   Mean   : 19493800  
##  3rd Qu.: 738.01   3rd Qu.: 744.79   3rd Qu.: 744.79   3rd Qu.: 29399250  
##  Max.   :2013.00   Max.   :2039.51   Max.   :2039.51   Max.   :266380800

Symbol is the variable. Now, I needed to figure out the unique symbol names.

unique(gafa_stock$Symbol)

## [1] "AAPL" "AMZN" "FB"   "GOOG"

With each symbol name, I then filtered the name of the stock and figured out the max closing price for each stock.

gafa_stock %>%
  filter(Symbol == "AAPL") %>%
  filter(Close == max(Close))

## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  filter(Close == max(Close))

## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close  Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
## 1 AMZN   2018-09-04 2026. 2050.  2013 2040.     2040. 5721100

gafa_stock %>%
  filter(Symbol == "FB") %>%
  filter(Close == max(Close))

## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 FB     2018-07-25  216.  219.  214.  218.      218. 58954200

gafa_stock %>%
  filter(Symbol == "GOOG") %>%
  filter(Close == max(Close))

## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close  Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
## 1 GOOG   2018-07-26  1251 1270. 1249. 1268.     1268. 2405600

Exercise 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")

## New names:
## Rows: 100 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (3): Sales, AdBudget, GDP
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

View(tute1)
head(tute1)

## # A tibble: 6 × 4
##   ...1   Sales AdBudget   GDP
##   <chr>  <dbl>    <dbl> <dbl>
## 1 Mar-81 1020.     659.  252.
## 2 Jun-81  889.     589   291.
## 3 Sep-81  795      512.  291.
## 4 Dec-81 1004.     614.  292.
## 5 Mar-82 1058.     647.  279.
## 6 Jun-82  944.     602   254

Convert the data to time series.

This is a complex process. When reading in the csv file, R did not recognize “Quarter”. Rather, it read in as “…1”. I needed to rename the variable back to Quarter. Inspecting the structure of “…1”, I noticed it was a chr. The Quarters needed to be converted to a date format. I had to change the Quarter variable with the as.Date function and reformatted the dates in a DD-MM-YY date format. To make it a date format, DD-MM-YY, I had to add a DD in the date. Therefore, “01-” neeed to be appended in front of the MM-YY format. As I was trying to set up years and quarters, I kept getting an error that there were NA values and it could not parse the data. I reinspected the data and noticed that rows 81-100 were in a DD-MMM format. No years were entered into the Excel file cells. Therefore, the yearquarter function failed. Using the stringr library, I found I could remove the rows with this format, but the code is challenging to read. Instead, I just dropped the rows that fit the DD-MMM pattern. During the Excel tute1 file import, I could not tell if the hyphens between the DD-MMM are really hyphens, en or em dashes, I just dropped any row with possible en or em dashes. Then I proceeded to make the rest of the date a DD-MM-YY format as described above. Finally, applying the yearquarter function, I was able to create the tsibble with YYYY Q# format in the Quarters column. I had to use AI to work out the more technical coding with the naming and stringr functions.

library(stringr)
mytimeseries <- tute1 |>
  rename(Quarter = any_of("...1")) |>
  filter(!str_detect(Quarter, "^\\s*\\d{1,2}\\s*[-\\u2013\\u2014]\\s*[A-Za-z]{3}\\s*$")) |>
  mutate(Quarter = as.Date(paste0("01-", Quarter), format = "%d-%b-%y")) |>
  mutate(Quarter = tsibble::yearquarter(Quarter)) |>
  tsibble::as_tsibble(index = Quarter)
mytimeseries

## # A tsibble: 80 x 4 [1Q]
##    Quarter Sales AdBudget   GDP
##      <qtr> <dbl>    <dbl> <dbl>
##  1 1981 Q1 1020.     659.  252.
##  2 1981 Q2  889.     589   291.
##  3 1981 Q3  795      512.  291.
##  4 1981 Q4 1004.     614.  292.
##  5 1982 Q1 1058.     647.  279.
##  6 1982 Q2  944.     602   254 
##  7 1982 Q3  778.     531.  296.
##  8 1982 Q4  932.     608.  272.
##  9 1983 Q1  996.     638.  260.
## 10 1983 Q2  908.     582.  280.
## # ℹ 70 more rows

Construct time series plots of each of the three series.

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

The labels for the variables disappeared and the order of presentation was altered.

Exercise 2.6

The aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US.

Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries.

?aus_arrivals
aus_arrivals

## # A tsibble: 508 x 3 [1Q]
## # Key:       Origin [4]
##    Quarter Origin Arrivals
##      <qtr> <chr>     <int>
##  1 1981 Q1 Japan     14763
##  2 1981 Q2 Japan      9321
##  3 1981 Q3 Japan     10166
##  4 1981 Q4 Japan     19509
##  5 1982 Q1 Japan     17117
##  6 1982 Q2 Japan     10617
##  7 1982 Q3 Japan     11737
##  8 1982 Q4 Japan     20961
##  9 1983 Q1 Japan     20671
## 10 1983 Q2 Japan     12235
## # ℹ 498 more rows

International Arrivals to Australia

Description: Quarterly international arrivals to Australia from Japan, New Zealand, UK and the US. 1981Q1 - 2012Q3.

autoplot(aus_arrivals)

## Plot variable not specified, automatically selected `.vars = Arrivals`

library(forecast)

## Warning: package 'forecast' was built under R version 4.4.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

gg_season(aus_arrivals)

## Plot variable not specified, automatically selected `y = Arrivals`

gg_subseries(aus_arrivals)

## Plot variable not specified, automatically selected `y = Arrivals`

The autoplots inform us that all four countries show an upward trend from 1980 to around 2010, but Japan shows a downward trend starting around 1998. All countries appear to exhibit seasonal patterns. It appears that NZ has the most arrivals, followed by the UK, Japan and the US.

The gg_season plots inform us about seasonality. All four countries appear to exhibit seasonality and it appear to be more pronounced as the level of arrivals increase. It appears that Japan and NZ have highs in Q3 and the UK has lows in Q2 and Q3. The US has highs in Q4.

The gg_subseries plots inform us of the performance in each quarter compared to the mean average in each quarter. It appears that Japan has a problem in that all four quarters arrivals have fallen below their means. NZ has shown consistent increases across all four quarters. The UK shows levels above their means, but downturns in Q1, Q3 and Q4 and remaining flat in Q2. The US shows levels above their means in all quarters, and in Q1, Q4, is increasing, but remains flat in Q2, Q3.

b. Can you identify any unusual observations?

What I think may be unusual is Japan’s decline across all four seasons. However, what I see as very unusual is post 9/11. I think it is obvious that US would have a sharp decline during those early years. Japanese consumers appear to also have been heavily affected by 9/11 in Q2, Q3 and Q4. the downturn in the market during the mid-2000’s. However, UK and NZ travelers appear to have been less affected across all quarters. This is some evidence of an effect, but it is small relative to the US and Japan. UK travelers seemed to take a less cautious attitude to flying to Australia.

Data624_Ch2

JF

2025-08-28