Introduction

This assignment answers questions from the Forecasting: Principles and Practice textbook, 3rd edition. The questions come from chapter 2 section 10 - Exercises. Prompts for each question are included before their respective code/answers.

Load Libraries

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(readxl)
library(tsibble)
library(dplyr)
library(ggplot2)

Question 01

Prompt:

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.

Bricks data

The below code first takes the Bricks columns from the aus_production data set and reads the head to see the data formatting. Next, the data documentation is explored with the help() method. Bricks data is defined as ‘Clay brick production in millions of bricks.’ The interval of the data is quarterly. A basic autoplot is used by passing the appropriate arguments through.

bricks_raw <- aus_production |> select(Bricks)
head(bricks_raw)

## # A tsibble: 6 x 2 [1Q]
##   Bricks Quarter
##    <dbl>   <qtr>
## 1    189 1956 Q1
## 2    204 1956 Q2
## 3    208 1956 Q3
## 4    197 1956 Q4
## 5    187 1957 Q1
## 6    214 1957 Q2

help(aus_production)
# Bricks:   Clay brick production in millions of bricks.

interval(aus_production)

## <interval[1]>
## [1] 1Q

# Quarter

autoplot(aus_production, Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

Lynx data

The below code first takes the Lynx columns from the pelt data set and reads the head to see the data formatting. Next, the data documentation is explored with the help() method. Lynx data is defined as ‘The number of Canadian Lynx pelts traded.’ The interval of the data is yearly. A basic autoplot is used by passing the appropriate arguments through.

lynx_raw <- pelt |> select(Lynx)
head(lynx_raw)

## # A tsibble: 6 x 2 [1Y]
##    Lynx  Year
##   <dbl> <dbl>
## 1 30090  1845
## 2 45150  1846
## 3 49150  1847
## 4 39520  1848
## 5 21230  1849
## 6  8420  1850

help(pelt)
# Lynx: The number of Canadian Lynx pelts traded.

interval(pelt)

## <interval[1]>
## [1] 1Y

# Year

autoplot(pelt, Lynx)

Close data

The below code first takes the Close columns from the gafa_stock data set and reads the head to see the data formatting. Next, the data documentation is explored with the help() method. Close data is defined as ‘The closing price for the stock.’ Using the interval() method returns ‘!’ meaning there may be irregular intervals. To find out what intervals exist in the data, I find the difference between a row’s date to the previous row’s date using lag(Date) and mutate. The distinct intervals are 1 day, 3 days, 4 days, 2 days, and -1824 days (The identified interval differences include -1824 days which might indicate indicate a misalignment of the dataset). A basic autoplot is used by passing the appropriate arguments through.

close_raw <- gafa_stock |> select(Close)
head(close_raw)

## # A tsibble: 6 x 3 [!]
## # Key:       Symbol [1]
##   Close Date       Symbol
##   <dbl> <date>     <chr> 
## 1  79.0 2014-01-02 AAPL  
## 2  77.3 2014-01-03 AAPL  
## 3  77.7 2014-01-06 AAPL  
## 4  77.1 2014-01-07 AAPL  
## 5  77.6 2014-01-08 AAPL  
## 6  76.6 2014-01-09 AAPL

help(gafa_stock)
# Close:    The closing price for the stock.

interval(gafa_stock)

## <interval[1]>
## [1] !

# Returns '!' which suggest irregular intervals. Check what unique intervals there are:

gafa_stock |>
  mutate(diff = Date - lag(Date)) |>
  distinct(diff) |>
  drop_na()

## # A tibble: 5 × 1
##   diff      
##   <drtn>    
## 1     1 days
## 2     3 days
## 3     4 days
## 4     2 days
## 5 -1824 days

autoplot(gafa_stock, Close)

Demand data

The below code first takes the Demand columns from the vic_elec data set and reads the head to see the data formatting. Next, the data documentation is explored with the help() method. Demand data is defined as ‘Total electricity demand in MWh.’ The interval of the data is 30 minutes. Autoplot is used by passing the appropriate arguments through, including a descriptive title, X-axis label, and Y-axis label. The data is quite concentrated, as the intervals of observations are 30 minutes and the data spans several years.

demand_raw <- vic_elec |> select(Demand)
head(demand_raw)

## # A tsibble: 6 x 2 [30m] <Australia/Melbourne>
##   Demand Time               
##    <dbl> <dttm>             
## 1  4383. 2012-01-01 00:00:00
## 2  4263. 2012-01-01 00:30:00
## 3  4049. 2012-01-01 01:00:00
## 4  3878. 2012-01-01 01:30:00
## 5  4036. 2012-01-01 02:00:00
## 6  3866. 2012-01-01 02:30:00

help(vic_elec)
# Demand:   Total electricity demand in MWh.

interval(vic_elec)

## <interval[1]>
## [1] 30m

# 30 minutes

autoplot(vic_elec, Demand) + 
  ggtitle("Electricity Demand") +
  ylab("Demand (MWh)") +
  xlab("Year")

Question 02

Prompt:

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

Peak closing price for each stock is found by the below calculations. First, the gafa_stock data passes through the grouped Symbols column. The data is filtered to find the rows with max values of close. Only Symbol, Data, and Close columns are kept, and they are ordered. See each of the four stock’s closing dates with those date’s prices below.

# Find the peak closing price for each stock
peak_days <- gafa_stock |>
  group_by(Symbol) |>
  filter(Close == max(Close, na.rm = TRUE)) |>
  select(Symbol, Date, Close) |>
  arrange(Symbol, Date)

print(peak_days)

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

Question 03

Prompt:

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

After opening the data in Excel and looking at the columns and values, the data is uploaded to this Markdown. The below logic is given by the textbook: taking in the tute1.csv file that is stored in this project’s repository documentation. Data is previewed using head() function.

# Use read_csv for tute1.csv and preview data
tute1 <- readr::read_csv("tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(tute1)

## # A tibble: 6 × 4
##   Quarter    Sales AdBudget   GDP
##   <date>     <dbl>    <dbl> <dbl>
## 1 1981-03-01 1020.     659.  252.
## 2 1981-06-01  889.     589   291.
## 3 1981-09-01  795      512.  291.
## 4 1981-12-01 1004.     614.  292.
## 5 1982-03-01 1058.     647.  279.
## 6 1982-06-01  944.     602   254

Next, the book logic has the data converted to time series.

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

Then it constructs time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Without including the facet_grid line, all of the series are on the same graph. Each are a measure with different ranges for values. Because they are on the same y-axis, the changes of GDP across time are less varied.

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

Question 04

Prompt:

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package.
Create a tsibble from us_total with year as the index and state as the key.
Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

This logic loads the USgas package with the logic for installing that package commented out. The tibble us_gas_tibble is created by passing the us_total data through as_tibble() and assigning year as the index and state as the key. New England area states are saved as a new value new_england_states. Then autoplot is used for plotting only those New England states, including a title, axis labels, and a facet wrap.

# install.packages("USgas") 
library(USgas)

us_gas_tsibble <- us_total |>
  as_tsibble(index = year, key = state)  

new_england_states <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")

us_gas_tsibble |>
  filter(state %in% new_england_states) |>
  autoplot(y) +
  ggtitle("Annual Natural Gas Consumption in New England") +
  ylab("Consumption") +
  xlab("Year") +
  facet_wrap(~ state, scales = "free_y")

Question 05

Prompt:

Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
Create a tsibble which is identical to the tourism tsibble from the tsibble package.
Find what combination of Region and Purpose had the maximum number of overnight trips on average.
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

Data is read in using read_excel. The data structure, column names, and data types are checked from the tourism tsibble. The tsibble is created mirroring the tsibble data structure. I check if the new tsibble mirrors the tourism tsibble from the package using all_equal(). For c, the combination of Region and Purpose that had the maximum number of overnight trips on average is found by group by(), summarise(), arrange(), and slice_head() and is Melbourne. For d, a new tsibble is created which combines the Purposes and Regions, and just has total trips by State. This is done by indexing by quarter, using summarise() on the trips, and saving as_tibble().

tourism_data <- readxl::read_excel("tourism.xlsx")

head(tourism)

## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.

tourism_tsibble <- tourism_data |>
  mutate(Quarter = yearquarter(Quarter)) |>  # Ensure Quarter is in year-quarter format
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))  # Convert to tsibble

all.equal(tourism_tsibble, tourism)

## [1] TRUE

max_trips <- tourism_tsibble |>
  group_by(Region, Purpose) |>
  summarise(avg_trips = mean(Trips, na.rm = TRUE), .groups = "drop") |>
  arrange(desc(avg_trips)) |>
  slice_head(n = 1)

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.

print(max_trips)

## # A tsibble: 1 x 4 [1Q]
## # Key:       Region, Purpose [1]
##   Region    Purpose  Quarter avg_trips
##   <chr>     <chr>      <qtr>     <dbl>
## 1 Melbourne Visiting 2017 Q4      985.

state_trips_tsibble <- tourism_tsibble |>
  index_by(Quarter) |> 
  summarise(Total_Trips = sum(Trips, na.rm = TRUE)) |>  
  as_tsibble(index = Quarter) 

print(state_trips_tsibble)

## # A tsibble: 80 x 2 [1Q]
##    Quarter Total_Trips
##      <qtr>       <dbl>
##  1 1998 Q1      23182.
##  2 1998 Q2      20323.
##  3 1998 Q3      19827.
##  4 1998 Q4      20830.
##  5 1999 Q1      22087.
##  6 1999 Q2      21458.
##  7 1999 Q3      19914.
##  8 1999 Q4      20028.
##  9 2000 Q1      22339.
## 10 2000 Q2      19941.
## # ℹ 70 more rows

Question 08

Prompt:

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?

The data is all loaded. Each analysis is broken up by the data set.

data("us_employment")
data("aus_production")
data("pelt")
data("PBS")
data("us_gasoline")

Total Private Employed

The below graphs help answer the prompt questions.

Can you spot any seasonality, cyclicity and trend? -> The Seasonality of Employment graph shows some small bumps seasonally, meaning there are slight increases in early summer and keeps steady or decreases in other months. Any cyclicity comes from major economic events. The upward angle of the graph shows a positive overall trend. What do you learn about the series?

What can you say about the seasonal patterns? -> Small seasonality with increases in early summer and decreases in other months.

Can you identify any unusual years? -> There is an unusual dip in employment around 2009-2019. This is of course from the Great Recession.

employment_ts <- us_employment |> filter(Title == "Total Private")

# autoplot()
autoplot(employment_ts, Employed) +
  ggtitle("Total Private Employment Over Time") +
  ylab("Employment (thousands)") + xlab("Year")

# gg_season()
gg_season(employment_ts, Employed) + ggtitle("Seasonality of Employment")

# gg_subseries()
gg_subseries(employment_ts, Employed) + ggtitle("Employment Subseries Plot")

# gg_lag()
gg_lag(employment_ts, Employed, lags = 1:12) + ggtitle("Lag Plot for Employment")

# ACF()
employment_ts |> ACF(Employed) |> autoplot() + ggtitle("ACF of Employment")

Bricks

The below graphs help answer the prompt questions.

Can you spot any seasonality, cyclicity and trend? -> Brick production shows strong seasonality, with the lowest quarter of each year expected as Q1. The trend increases until around 1980 and trends downward after, with fluctuations throughout.

What can you say about the seasonal patterns? -> There are decreased brick productions in Q1s, increases to Q2, and variable changes from Q2 to Q3 and Q3 to Q4.

Can you identify any unusual years? -> There are unusual dips in bricks around 1975 and 1983-1984.

# autoplot()
autoplot(aus_production, Bricks) +
  ggtitle("Brick Production Over Time") +
  ylab("Millions of Bricks") + xlab("Year")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

# gg_season()
gg_season(aus_production, Bricks) + ggtitle("Seasonality of Brick Production")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

# gg_subseries()
gg_subseries(aus_production, Bricks) + ggtitle("Brick Production Subseries Plot")

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

# gg_lag()
gg_lag(aus_production, Bricks, lags = 1:12) + ggtitle("Lag Plot for Bricks")

## Warning: Removed 20 rows containing missing values (gg_lag).

# ACF()
aus_production |> ACF(Bricks) |> autoplot() + ggtitle("ACF of Bricks")

Hare

The below graphs help answer the prompt questions.

Can you spot any seasonality, cyclicity and trend? -> The seasonality graph could not be created as the data does not exist to that drill down. The the data shows strong cyclic behavior around every 10-11 years, as seen in the ACF graphic. The trend seems to be pretty even with no clear increases or decreases over time.

What can you say about the seasonal patterns? -> Nothing can be said about seasonality! This is because the data is annual rather than monthly or quarterly, meaning seasonality is not relevant in this context.

Can you identify any unusual years? -> 1964 and 1983 have larger increases to the population than other years.

# autoplot()
autoplot(pelt, Hare) +
  ggtitle("Hare Population Over Time") +
  ylab("Number of Pelts") + xlab("Year")

# gg_season()
# the following code does not work as there is not one observation per seasonal period recognized: gg_season(pelt, Hare) + ggtitle("Seasonality of Hare Population")

# gg_subseries()
gg_subseries(pelt, Hare) + ggtitle("Hare Population Subseries Plot")

# gg_lag()
gg_lag(pelt, Hare, lags = 1:12) + ggtitle("Lag Plot for Hare Population")

# ACF()
pelt |> ACF(Hare) |> autoplot() + ggtitle("ACF of Hare Population")

HO2 Cost

The below graphs help answer the prompt questions.

Can you spot any seasonality, cyclicity and trend? -> Seasonality, cyclicity and trend all vary across the different series that have data here. For the safety net serieses, there are big drops in February and slow increases starting in May, yearly. Concessional co-payments have seasonal changes of increases in spring and decreases for other months. No clear seasonality trends for general co-payments. All have cycles that last around 12 years in total. Trends are positive for concessional co-payments. General payments look relatively steady, with slight decreased trend for general safety net.

What can you say about the seasonal patterns? -> As stated above, the safety net serieses show big drops in February and slow increases starting in May, yearly. Concessional co-payments have seasonal changes of increases in spring and decreases for other months. No clear seasonality trends for general co-payments.

Can you identify any unusual years? -> Pre 1995 data shows higher values than would be expected by post 1995 trends.

# H02 data filters differently
PBS_H02 <- PBS |> filter(ATC2 == "H02")
glimpse(PBS_H02)

## Rows: 816
## Columns: 9
## Key: Concession, Type, ATC1, ATC2 [4]
## $ Month      <mth> 1991 Jul, 1991 Aug, 1991 Sep, 1991 Oct, 1991 Nov, 1991 Dec,…
## $ Concession <chr> "Concessional", "Concessional", "Concessional", "Concession…
## $ Type       <chr> "Co-payments", "Co-payments", "Co-payments", "Co-payments",…
## $ ATC1       <chr> "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H",…
## $ ATC1_desc  <chr> "Systemic hormonal preparations, excl. sex hormones and ins…
## $ ATC2       <chr> "H02", "H02", "H02", "H02", "H02", "H02", "H02", "H02", "H0…
## $ ATC2_desc  <chr> "CORTICOSTEROIDS FOR SYSTEMIC USE", "CORTICOSTEROIDS FOR SY…
## $ Scripts    <dbl> 63261, 53528, 52822, 54016, 49281, 51798, 42436, 52913, 629…
## $ Cost       <dbl> 317384.0, 269891.0, 269703.0, 280418.0, 268070.0, 277139.0,…

# autoplot()
autoplot(PBS_H02, Cost) +
  ggtitle("H02 Prescription Costs Over Time") +
  ylab("Cost ($AUD)") +
  xlab("Year")

# gg_season()
gg_season(PBS_H02, Cost) + ggtitle("Seasonality of H02 Costs")

# gg_subseries()
gg_subseries(PBS_H02, Cost) + ggtitle("H02 Cost Subseries Plot")

# gg_lag()
PBS_H02_summarized <- PBS_H02 |> 
  index_by(Month) |>  # Group by time index
  summarise(Cost = sum(Cost, na.rm = TRUE))
gg_lag(PBS_H02_summarized, Cost, lags = 1:12) + 
  ggtitle("Lag Plot for H02 Costs (Summed Across Groups)")

# ACF()
PBS_H02 |> ACF(Cost) |> autoplot() + ggtitle("ACF of H02 Costs")

Gasoline

The below graphs help answer the prompt questions.

Can you spot any seasonality, cyclicity and trend? -> This data is quite volatile. Seasonality has increases in summer months and decreases in winter months. The cycles are not too clear; there do not seem to be any regular cycles in the data. The data trends up overall, but does have some decreases around and after 2009.

What can you say about the seasonal patterns? -> Seasonality roughly has increases in summer months and decreases in winter months

Can you identify any unusual years? -> The decreases around and after 2009 are likely due to the Great Recession.

# autoplot()
autoplot(us_gasoline, Barrels) +
  ggtitle("Gasoline Consumption Over Time") +
  ylab("Millions of Barrels") + xlab("Year")

# gg_season()
gg_season(us_gasoline, Barrels) + ggtitle("Seasonality of Gasoline Consumption")

# gg_subseries()
gg_subseries(us_gasoline, Barrels) + ggtitle("Gasoline Consumption Subseries Plot")

# gg_lag()
gg_lag(us_gasoline, Barrels, lags = 1:12) + ggtitle("Lag Plot for Gasoline Consumption")

# ACF()
us_gasoline |> ACF(Barrels) |> autoplot() + ggtitle("ACF of Gasoline Consumption")

Homework01KEvans

2025-02-08

Introduction

Load Libraries

Question 01

Prompt:

Bricks data

Lynx data

Close data

Demand data

Question 02

Prompt:

Question 03

Prompt:

Question 04

Prompt:

Question 05

Prompt:

Question 08

Prompt:

Total Private Employed

Bricks

Hare

HO2 Cost

Gasoline