Forecasting: Principles and Practice

Exercise 2.1

Exploring four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, and Demand from vic_elec.

?aus_production
?pelt
?gafa_stock
?vic_elec

interval(aus_production)

## <interval[1]>
## [1] 1Q

interval(pelt)

## <interval[1]>
## [1] 1Y

interval(gafa_stock)

## <interval[1]>
## [1] !

interval(vic_elec)

## <interval[1]>
## [1] 30m

# Plot 1: Brick production
aus_production %>%
  autoplot(Bricks) +
  ggtitle("Australian Brick Production") + 
  theme_minimal()

# Plot 2: Lynx trappings  
pelt %>%
  autoplot(Lynx) +
  ggtitle("Canadian Lynx Trappings") + 
  theme_minimal()

# Plot 3: Stock prices
gafa_stock %>%
  autoplot(Close) +
  ggtitle("GAFA Stock Closing Prices") + 
  theme_minimal()

# Plot 4: Victoria electricity demand 
vic_elec %>%
  autoplot(Demand) +
  labs(title = "Half-hourly Electricity Demand - Victoria",
       x = "Date", 
       y = "Electricity Demand (MW)",
       caption = "Data: Australian Energy Market Operator") + 
  theme_minimal()

The time intervals vary significantly: aus_production is quarterly, pelt is annual, gafa_stock is daily, and vic_elec is half-hourly data.

Exercise 2.2

Finding peak closing prices for each stock in the GAFA dataset.

peak_prices <- gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close, na.rm = TRUE)) %>%
  select(Symbol, Date, Close) %>%
  arrange(desc(Close))

peak_prices

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AMZN   2018-09-04 2040.
## 2 GOOG   2018-07-26 1268.
## 3 AAPL   2018-10-03  232.
## 4 FB     2018-07-25  218.

Exercise 2.3

Working with the tutorial tute1.csv dataset from the book website.

tute1 <- readr::read_csv("tute1.csv")
#View(tute1)
head(tute1)

## # A tibble: 6 × 4
##   Quarter    Sales AdBudget   GDP
##   <date>     <dbl>    <dbl> <dbl>
## 1 1981-03-01 1020.     659.  252.
## 2 1981-06-01  889.     589   291.
## 3 1981-09-01  795      512.  291.
## 4 1981-12-01 1004.     614.  292.
## 5 1982-03-01 1058.     647.  279.
## 6 1982-06-01  944.     602   254

# Convert to proper time series format
my_ts <- tute1 %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter)

# Faceted plot (separate panels)
my_ts %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, color = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y") +
  theme_minimal()

# Combined plot w/ out facet grid
my_ts %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, color = name)) +
  geom_line() +
  labs(title = "All series on same scale") +
  theme_minimal()

The faceted plot allows better comparison of individual series patterns, while the combined plot shows relative magnitudes but makes it harder to see patterns in smaller series.

Exercise 2.4

Analyzing US natural gas consumption data.

# Create tsibble with state as key
gas_data <- us_total %>%
  as_tsibble(index = year, key = state)

# New England states only
ne_states <- c("Maine", "Vermont", "New Hampshire", 
               "Massachusetts", "Connecticut", "Rhode Island")

gas_data %>%
  filter(state %in% ne_states) %>%
  autoplot(y) +
  labs(title = "Natural Gas Consumption - New England",
       y = "Consumption") +
  theme(legend.position = "bottom") +
  theme_minimal()

MA seems to have the highest natural gas consumption in New England, and Vermont the lowest.

Exercise 2.5

Tourism data analysis from the Excel file.

# Read Excel file
tourism_xl <- read_excel("tourism.xlsx")

# Recreate tourism tsibble structure
tourism_clean <- tourism_xl %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))

# Find max overnight trips by region/purpose combo
max_combo <- tourism %>%
  as_tibble() %>%
  group_by(Region, Purpose) %>%
  summarise(mean_trips = mean(Trips, na.rm = TRUE), .groups = "drop") %>%
  slice_max(mean_trips, n = 1)

print("Highest average trips:")

## [1] "Highest average trips:"

max_combo

## # A tibble: 1 × 3
##   Region Purpose  mean_trips
##   <chr>  <chr>         <dbl>
## 1 Sydney Visiting       747.

# State-level aggregation
state_tourism <- tourism %>%
  index_by(Quarter) %>%
  group_by(State) %>%
  summarise(state_total = sum(Trips, na.rm = TRUE), .groups = "drop")

Exercise 2.8

Time series feature exploration using multiple graphics functions.

# Total Private Employment
us_employment %>%
  filter(Title == "Total Private") %>%
  autoplot(Employed) +
  ggtitle("Total Private Employment - US") +
  theme_minimal()

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_season(Employed) +
  theme_minimal()

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_subseries(Employed) +
  theme_minimal()

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_lag(Employed) +
  theme_minimal()

us_employment %>%
  filter(Title == "Total Private") %>%
  ACF(Employed) %>%
  autoplot() +
  theme_minimal()

# Bricks from aus_production
aus_production %>%
  autoplot(Bricks) +
  ggtitle("Australian Brick Production") +
  theme_minimal()

aus_production %>%
  gg_season(Bricks) +
  theme_minimal()

aus_production %>%
  gg_subseries(Bricks) +
  theme_minimal()

aus_production %>%
  gg_lag(Bricks) +
  theme_minimal()

aus_production %>%
  ACF(Bricks) %>%
  autoplot() +
  theme_minimal()

# Hare from pelt
pelt %>%
  autoplot(Hare) +
  ggtitle("Hare Trappings") +
  theme_minimal()

pelt %>%
  gg_lag(Hare) +
  theme_minimal()

pelt %>%
  ACF(Hare) %>%
  autoplot() +
  theme_minimal()

# H02 Cost from PBS
PBS %>%
  filter(ATC2 == "H02") %>%
  autoplot(Cost) +
  ggtitle("H02 Pharmaceutical Costs") +
  theme_minimal()

# H02 Cost from PBS
PBS %>%
  filter(ATC2 == "H02", Concession == "Concessional", Type == "Co-payments") %>%
  gg_season(Cost) +
  theme_minimal()

PBS %>%
  filter(ATC2 == "H02", Concession == "Concessional", Type == "Co-payments") %>%
  gg_subseries(Cost) +
  theme_minimal()

PBS %>%
  filter(ATC2 == "H02", Concession == "Concessional", Type == "Co-payments") %>%
  gg_lag(Cost) +
  theme_minimal()

PBS %>%
  filter(ATC2 == "H02") %>%
  ACF(Cost) %>%
  autoplot() +
  theme_minimal()

# Barrels from us_gasoline
us_gasoline %>%
  autoplot(Barrels) +
  ggtitle("US Gasoline Production") +
  theme_minimal()

us_gasoline %>%
  gg_season(Barrels) +
  theme_minimal()

us_gasoline %>%
  gg_subseries(Barrels) +
  theme_minimal()

us_gasoline %>%
  gg_lag(Barrels) +
  theme_minimal()

us_gasoline %>%
  ACF(Barrels) %>%
  autoplot() +
  theme_minimal()

Analysis

Employment: Shows clear upward trend with seasonal dips in winter. The 2008 crash is obvious - employment drops dramatically. Strong seasonality with predictable winter lows.

Bricks: Declining over time, which makes sense as construction methods changed. Production peaks in the summer - it’s less desirable to make bricks in winter. The seasonal pattern is consistent across different years.

Hare: Big population swings with no predictable pattern. Cycles that may be tied to predator-prey dynamics. No seasonality since it’s annual data.

Pharmaceutical Costs: Just keeps increasing. Different payment types have different seasonal patterns - some spike mid-year, others at year-end. Probably related to insurance deductibles resetting.

Gasoline: Summer driving season is apparent in the data - higher for vacation travel. Overall upward trend until recently. Economic downturns show up as demand drops.

Key observations:

* Economic series (employment, bricks, gas) all show the 2008 recession

* Seasonal patterns make intuitive sense - construction in summer, travel in summer, etc.

* Hare data is the most unpredictable - ecological systems can be chaotic

* Pharmaceutical costs seem driven more by policy than natural cycles

* Most series show strong autocorrelation except for hares

The lag plots confirm that economic data is pretty predictable short-term, but ecological data (hares) is more so random.

Forecasting: Principles and Practice - Chapter 2 Exercise

Tai Chou-Kudu