R Markdown

  1. Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
  1. Use ? (or help()) to find out about the data in each series.
  2. What is the time interval of each series?
  3. Use autoplot() to produce a time plot of each series.
  4. For the last plot, modify the axis labels and title.
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble      3.3.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.2     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0
## ✔ ggplot2     4.0.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

The help function tells me the Quarterly production of selected commodities in Australia.

?aus_production
cat("The time interval for aus_production is:", format(interval(aus_production)), "\n")
## The time interval for aus_production is: 1Q
autoplot(aus_production, Bricks) +
  ggtitle("Australian Clay Brick Production (1956-2005)") +
  xlab("Quarter") +
  ylab("Production (millions of bricks)") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

The help function tells me the Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935

?pelt

cat("The time interval for pelt is:", format(interval(pelt)), "\n")
## The time interval for pelt is: 1Y
autoplot(pelt, Lynx) +
  ggtitle("Annual Lynx Trappings by Hudson Bay Company (1845-1935)") +
  xlab("Year") +
  ylab("Number of Lynx Trapped") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

The help function shows me the Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD

?gafa_stock

cat("The time interval for gafa_stock is:", format(interval(gafa_stock)), "\n")
## The time interval for gafa_stock is: !
autoplot(gafa_stock, Close) +
  ggtitle("GAFA Stock Closing Prices (2014-2018)") +
  xlab("Year") +
  ylab("Closing Price (USD)") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

The help function shows me the Half-hourly electricity demand for Victoria, Australia

?vic_elec

cat("The time interval for vic_elec is:", format(interval(vic_elec)), "\n")
## The time interval for vic_elec is: 30m
autoplot(vic_elec, Demand) +
  ggtitle("Half-Hourly Electricity Demand in Victoria, Australia (2012-2015)") +
  xlab("Date and Time") +
  ylab("Demand (Megawatts)") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

  1. Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close))
## # A tsibble: 4 x 8 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
## 2 AMZN   2018-09-04 2026. 2050. 2013  2040.     2040.  5721100
## 3 FB     2018-07-25  216.  219.  214.  218.      218. 58954200
## 4 GOOG   2018-07-26 1251  1270. 1249. 1268.     1268.  2405600

The results are the following peak days closing price for each stock: APPLE: 2018-10-03
AMAZON: 2018-09-04
FACEBOOK: 2018-07-25
GOOGLE: 2018-07-26

  1. Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- read.csv("http://OTexts.com/fpp3/extrafiles/tute1.csv", header=TRUE)
view(tute1)
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() 

Without ’ facet_grid(name ~ ., scales = “free_y”)`, all three series are plotted on the same graph with the same y-axis, making it hard to see the patterns in AdBudget and GDP since Sales has much larger values.

  1. The USgas package contains data on the demand for natural gas in the US.
  1. Install the USgas package.
  2. Create a tsibble from us_total with year as the index and state as the key.
  3. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire,
  4. Massachusetts, Connecticut and Rhode Island).
#install.packages("USgas")
library(USgas)


us_tsibble <- us_total %>%
  as_tsibble(index = year, key = state)

us_tsibble %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire", 
                      "Massachusetts", "Connecticut", "Rhode Island")) %>%
  autoplot(y) +
  ggtitle("Annual Natural Gas Consumption in New England States") +
  xlab("Year") +
  ylab("Natural Gas Consumption") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

    1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
    2. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
    3. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
    4. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
library(readxl)
library(dplyr)

tourism_data <- read_excel("tourism.xlsx")

head(tourism_data)
## # A tibble: 6 × 5
##   Quarter    Region   State           Purpose  Trips
##   <chr>      <chr>    <chr>           <chr>    <dbl>
## 1 1998-01-01 Adelaide South Australia Business  135.
## 2 1998-04-01 Adelaide South Australia Business  110.
## 3 1998-07-01 Adelaide South Australia Business  166.
## 4 1998-10-01 Adelaide South Australia Business  127.
## 5 1999-01-01 Adelaide South Australia Business  137.
## 6 1999-04-01 Adelaide South Australia Business  200.
# b) Create a tsibble which is identical to the tourism tsibble from the tsibble package
tourism_tsibble <- tourism_data %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))

print(tourism_tsibble)
## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Quarter Region   State           Purpose  Trips
##      <qtr> <chr>    <chr>           <chr>    <dbl>
##  1 1998 Q1 Adelaide South Australia Business  135.
##  2 1998 Q2 Adelaide South Australia Business  110.
##  3 1998 Q3 Adelaide South Australia Business  166.
##  4 1998 Q4 Adelaide South Australia Business  127.
##  5 1999 Q1 Adelaide South Australia Business  137.
##  6 1999 Q2 Adelaide South Australia Business  200.
##  7 1999 Q3 Adelaide South Australia Business  169.
##  8 1999 Q4 Adelaide South Australia Business  134.
##  9 2000 Q1 Adelaide South Australia Business  154.
## 10 2000 Q2 Adelaide South Australia Business  169.
## # ℹ 24,310 more rows
# c) Find what combination of Region and Purpose had the maximum number of overnight trips on averag
max_combo <- tourism_tsibble %>%
  group_by(Region, Purpose) %>%
  summarise(avg_trips = mean(Trips, na.rm = TRUE)) %>%
  ungroup() %>%
  filter(avg_trips == max(avg_trips))

print("Max combo of overnight trips on average: ")
## [1] "Max combo of overnight trips on average: "
print(max_combo)
## # A tsibble: 1 x 4 [1Q]
## # Key:       Region, Purpose [1]
##   Region    Purpose  Quarter avg_trips
##   <chr>     <chr>      <qtr>     <dbl>
## 1 Melbourne Visiting 2017 Q4      985.
# d) Create a new tsibble which combines the Purposes and Regions, and just has total trips by State
tourism_by_state <- tourism_tsibble %>%
  index_by(Quarter) %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips, na.rm = TRUE)) %>%
  ungroup() %>%
  as_tsibble(index = Quarter, key = State)
print("Tourism by State:")
## [1] "Tourism by State:"
print(tourism_by_state)
## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
##    State Quarter Trips
##    <chr>   <qtr> <dbl>
##  1 ACT   1998 Q1  551.
##  2 ACT   1998 Q2  416.
##  3 ACT   1998 Q3  436.
##  4 ACT   1998 Q4  450.
##  5 ACT   1999 Q1  379.
##  6 ACT   1999 Q2  558.
##  7 ACT   1999 Q3  449.
##  8 ACT   1999 Q4  595.
##  9 ACT   2000 Q1  600.
## 10 ACT   2000 Q2  557.
## # ℹ 630 more rows
  1. Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
  1. Can you spot any seasonality, cyclicity and trend?
  2. What do you learn about the series?
  3. What can you say about the seasonal patterns?
  4. Can you identify any unusual years?
#install.packages("ggtime")

library(ggtime)
## Registered S3 methods overwritten by 'ggtime':
##   method                  from      
##   +.gg_tsensemble         feasts    
##   autolayer.tbl_ts        fabletools
##   autoplot.dcmp_ts        fabletools
##   autoplot.tbl_ts         fabletools
##   grid.draw.gg_tsensemble feasts    
##   print.gg_tsensemble     feasts
## 
## Attaching package: 'ggtime'
## The following objects are masked from 'package:feasts':
## 
##     gg_arma, gg_irf, gg_lag, gg_season, gg_subseries, gg_tsdisplay,
##     gg_tsresiduals
# Total Private Employed from us_employment

total_private <- us_employment %>%
  filter(Title == "Total Private")

autoplot(total_private, Employed) +
  ggtitle("Total Private Employment") +
  theme_minimal()

ggtime::gg_season(total_private, Employed) +
  ggtitle("Seasonal Plot: Total Private Employment") +
  theme_minimal()

gg_subseries(total_private, Employed) +
  ggtitle("Subseries Plot: Total Private Employment") +
  theme_minimal()

total_private %>%
  gg_lag(Employed, geom = "point") +
  ggtitle("Lag Plot: Total Private Employment") +
  theme_minimal()

total_private %>%
  ACF(Employed) %>%
  autoplot() +
  ggtitle("ACF: Total Private Employment") +
  theme_minimal()

a) Can you spot any seasonality, cyclicity and trend? Very strong upward trend b) What do you learn about the series? The series properties do not change over time there are some down turns but overall the behavior of the series does not change c) What can you say about the seasonal patterns? from the seasonal plot, employment is a bit lower in January and peaks in the mid to late year d) Can you identify any unusual years? The unusual years are from 1940 - 1945 which was possibly due to world war2, 2000 - 2002 possibly due to the dot com bubble, and finally everyone favorite modern crisis from 2008 - 2010 the Great recession era.

# Bricks from aus_production
autoplot(aus_production, Bricks) +
  ggtitle("Australian Brick Production") +
  theme_minimal()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(aus_production, Bricks) +
  ggtitle("Seasonal Plot: Brick Production") +
  theme_minimal()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries(aus_production, Bricks) +
  ggtitle("Subseries Plot: Brick Production") +
  theme_minimal()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_lag(Bricks, geom = "point") +
  ggtitle("Lag Plot: Brick Production") +
  theme_minimal()
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>%
  ACF(Bricks) %>%
  autoplot() +
  ggtitle("ACF: Brick Production") +
  theme_minimal()

a) Can you spot any seasonality, cyclicity and trend? Clear seasonality, strong multi year cycles, and a rise then long run decline b) What do you learn about the series? The values change smoothly over time, with each period looking very similar to the recent past c) What can you say about the seasonal patterns? Production is lowest in Q1 and highest around Q3 d) Can you identify any unusual years? Large drops in the early 1980s and late 1990s stand out

# 3. Hare from pelt
autoplot(pelt, Hare) +
  ggtitle("Hare Trappings") +
  theme_minimal()

pelt %>%
  gg_lag(Hare, geom = "point") +
  ggtitle("Lag Plot: Hare Trappings") +
  theme_minimal()

pelt %>%
  ACF(Hare) %>%
  autoplot() +
  ggtitle("ACF: Hare Trappings") +
  theme_minimal()

a) Can you spot any seasonality, cyclicity and trend? No seasonality, strong multi year cycles, and no long term trend b) What do you learn about the series? The series shows regular boom and bust population cycles c) What can you say about the seasonal patterns? There are no seasonal patterns. d) Can you identify any unusual years? Extremely high peaks in the 1860s and late 1880s stand out as unusual

# 4. "H02" Cost from PBS
h02 <- PBS %>%
  filter(ATC2 == "H02") %>%
  summarise(Cost = sum(Cost))

autoplot(h02, Cost) +
  ggtitle("H02 Cost from PBS") +
  theme_minimal()

gg_season(h02, Cost) +
  ggtitle("Seasonal Plot: H02 Cost") +
  theme_minimal()

gg_subseries(h02, Cost) +
  ggtitle("Subseries Plot: H02 Cost") +
  theme_minimal()

h02 %>%
  gg_lag(Cost, geom = "point") +
  ggtitle("Lag Plot: H02 Cost") +
  theme_minimal()

h02 %>%
  ACF(Cost) %>%
  autoplot() +
  ggtitle("ACF: H02 Cost") +
  theme_minimal()

a) Can you spot any seasonality, cyclicity and trend? Strong upward trend and very strong seasonality, with no clear long cycles b) What do you learn about the series? Costs increase over time with large, regular within year swings c) What can you say about the seasonal patterns? Costs drop sharply in January - February and rise steadily to peak near the end of the year d) Can you identify any unusual years? No unusual years

# 5. Barrels from us_gasoline
autoplot(us_gasoline, Barrels) +
  ggtitle("US Gasoline Production") +
  theme_minimal()

gg_season(us_gasoline, Barrels) +
  ggtitle("Seasonal Plot: US Gasoline") +
  theme_minimal()

gg_subseries(us_gasoline, Barrels) +
  ggtitle("Subseries Plot: US Gasoline") +
  theme_minimal()

us_gasoline %>%
  gg_lag(Barrels, geom = "point") +
  ggtitle("Lag Plot: US Gasoline") +
  theme_minimal()

us_gasoline %>%
  ACF(Barrels) %>%
  autoplot() +
  ggtitle("ACF: US Gasoline") +
  theme_minimal()

  1. Can you spot any seasonality, cyclicity and trend? Strong seasonality, a long run rise then slight decline, and no clear multiyear cycles
  2. What do you learn about the series? Gasoline production is highly persistent with regular within year swings
  3. What can you say about the seasonal patterns? Production is lowest in winter and peaks in summer
  4. Can you identify any unusual years? A noticeable drop around the late 2000s stands out as unusual.