#Chapter 3 Exercises

3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

To find the GDP per capita, we divide the total GDP by the population. This allows us to compare the economic output relative to the size of each country.

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.5.2

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.3.0     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0
## ✔ ggplot2     4.0.1

## Warning: package 'ggplot2' was built under R version 4.5.2

## Warning: package 'tsibble' was built under R version 4.5.2

## Warning: package 'tsibbledata' was built under R version 4.5.2

## Warning: package 'feasts' was built under R version 4.5.2

## Warning: package 'fabletools' was built under R version 4.5.2

## Warning: package 'fable' was built under R version 4.5.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

# 1. Plotting GDP per capita for all countries
global_economy %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  autoplot(GDP_per_capita, alpha = 0.3) +
  guides(colour = "none") +
  labs(title = "GDP per capita over time", y = "USD")

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

# 2. Identifying the country with the highest value
global_economy %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE))

## # A tsibble: 1 x 10 [1Y]
## # Key:       Country [1]
##   Country Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>   <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Monaco  MCO    2014 7060236168.   7.18    NA      NA      NA      38132
## # ℹ 1 more variable: GDP_per_capita <dbl>

Analysis

Highest GDP per capita: Monaco consistently shows the highest values in the dataset due to its small population and high concentration of wealth.

3.2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

# 1. US GDP (Scaling by population or inflation is common, but log helps variance)
global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) + labs(title = "US GDP")

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

# 2. Victorian Bulls (Count data often shows non-constant variance)
aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(Count) + labs(title = "Victorian Bulls Slaughtered")

Victorian Electricity Demand from vic_elec.

# 3. Victorian Electricity (High frequency; log or Box-Cox is typical)
vic_elec %>%
  autoplot(Demand) + labs(title = "Victorian Electricity Demand")

Gas production from aus_production.

# 4. Gas Production (Clear seasonal variation that increases over time)
aus_production %>%
  autoplot(Gas) + labs(title = "Gas Production")

# 1. US GDP - Applying Logarithm
global_economy %>%
  filter(Country == "United States") %>%
  autoplot(log(GDP)) + 
  labs(title = "Log Transformed US GDP", y = "Log(USD)")

# 2. Victorian Bulls - Applying Logarithm
aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(log(Count)) + 
  labs(title = "Log Transformed Victorian Bulls")

# 3. Electricity Demand - Applying Box-Cox (Automatic Lambda)
lambda_elec <- vic_elec %>%
  features(Demand, features = guerrero) %>%
  pull(lambda_guerrero)

vic_elec %>%
  autoplot(box_cox(Demand, lambda_elec)) +
  labs(title = "Box-Cox Transformed Electricity Demand", subtitle = paste("Lambda =", round(lambda_elec, 2)))

# 4. Gas Production - Applying Logarithm
aus_production %>%
  autoplot(log(Gas)) + 
  labs(title = "Log Transformed Gas Production")

### Effects of Transformations United States GDP: Applying the Logarithm removed the exponential curvature. The growth now appears as a straight line, which makes it easier to see the constant percentage growth over the decades.

Victorian Bulls: Applying the Logarithm stabilized the random fluctuations.

Electricity Demand: Applying the Box-Cox transformation evened out the extreme seasonal spikes. The high demand during summer and winter now has a comparable height, stabilizing the variance across the series.

Gas Production: Applying the Logarithm fixed the “fanning” effect. The seasonal “waves” used to get taller as production increased, but now the seasonal amplitude is constant from start to finish.

3.3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

The Box-Cox transformation is unhelpful for the canadian_gas data because the seasonal variation is not proportional to the level of the series.

Reasons

Inconsistent Variance: In this dataset, the “swings” (seasonal peaks) get very large in the middle of the series and then smaller again at the end, even though the production level stays high.

Mathematical Limitation: A Box-Cox transformation (using a single $\lambda$) only works if the variation increases or decreases linearly with the trend. Since the variance here fluctuates independently, the transformation cannot stabilize it.

3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Exercise 7 from section 2.10

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(12345678) myseries <- aus_retail |> filter(Series ID == sample(aus_retail$Series ID,1)) Explore your chosen retail time series using the following functions:

autoplot(), gg_season(), gg_subseries(), gg_lag(),

ACF() |> autoplot()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

set.seed(12345678)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# 1. Exploration plots
myseries %>% autoplot(Turnover)

myseries %>% gg_season(Turnover)

## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

myseries %>% gg_subseries(Turnover)

## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

myseries %>% gg_lag(Turnover)

## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

myseries %>% ACF(Turnover) %>% autoplot()

# 2. Finding the optimal Lambda
lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# 3. Plotting the transformation
myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(title = paste("Box-Cox Transformation with Lambda =", round(lambda, 2)))

Selected Transformation

For this series, the optimal Lambda ($\lambda$) is usually around 0.1 to 0.3 (depending on the specific retail category).

Effect: This transformation is necessary because the seasonal “swings” (the December peaks) grow larger as the Turnover increases. Using Box-Cox stabilizes the variance, making the size of the seasonal spikes uniform across the entire timeline.

3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

# 1. Tobacco (Australian Production)
lambda_tobacco <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Tobacco, lambda_tobacco)) +
  labs(title = "Transformed Tobacco Production", subtitle = paste("Lambda =", round(lambda_tobacco, 2)))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

# 2. Economy Class (Melbourne-Sydney)
economy_ansett <- ansett %>%
  filter(Airports == "MEL-SYD", Class == "Economy")

lambda_passengers <- economy_ansett %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

economy_ansett %>%
  autoplot(box_cox(Passengers, lambda_passengers)) +
  labs(title = "Transformed Economy Passengers", subtitle = paste("Lambda =", round(lambda_passengers, 2)))

# 3. Pedestrian Counts (Southern Cross Station)
southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")

lambda_pedestrian <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda_pedestrian)) +
  labs(title = "Transformed Pedestrian Counts", subtitle = paste("Lambda =", round(lambda_pedestrian, 2)))

Analysis

Tobacco: The $\lambda$ is usually close to 0.9, meaning very little transformation is needed as the variance is relatively stable.

Economy Passengers: This series has massive dips. A Box-Cox helps, but the extreme outliers make stabilization difficult.

Pedestrian Counts: Typically requires a low $\lambda$ (around -0.2 to 0.2). The transformation helps equalize the daily peaks (rush hours) throughout the weeks.

3.7 Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

# Initial Setup
gas <- tail(aus_production, 5*4) %>% select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>% autoplot(Gas) + labs(title = "Gas Production (Last 5 Years)")

Identification: The plot shows a clear upward trend and strong annual seasonality, with peaks usually occurring in winter (Q2/Q3) due to heating demand.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

gas_decomp %>% autoplot()

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

Do the results support the graphical interpretation from part a?

Interpretation: Yes, the decomposition confirms the graphical intuition. The seasonal indices are consistent, and the trend-cycle shows steady growth.

Compute and plot the seasonally adjusted data.

# d. Seasonally Adjusted Data
gas_decomp %>%
  autoplot(season_adjust) + labs(title = "Seasonally Adjusted Gas Data")

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

# e. Adding an Outlier in the middle
gas_outlier_mid <- gas
gas_outlier_mid$Gas[10] <- gas_outlier_mid$Gas[10] + 300

gas_outlier_mid %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot(season_adjust)

Mid-Outlier Effect: An outlier in the middle distorts the trend-cycle significantly. Because classical decomposition uses moving averages, the “spike” leaks into surrounding periods, making the trend look like it has a hump.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

# f. Adding an Outlier at the end
gas_outlier_end <- gas
gas_outlier_end$Gas[20] <- gas_outlier_end$Gas[20] + 300

gas_outlier_end %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot(season_adjust)

End-Outlier Effect: If the outlier is at the end, the effect on the trend is harder to calculate because the moving average “runs out” of data. It primarily affects the last available trend values, often making the final direction of the series misleading.

3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)

## Warning: package 'seasonal' was built under R version 4.5.2

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

library(fpp3)
# Using the same seed and series from Exercise 3.4
set.seed(12345678)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))



# Applying X-11 Decomposition

x11_dcmp <- myseries %>%
  model(x11 = feasts:::X11(Turnover)) %>%
  components()

## Warning: `X11()` was deprecated in feasts 0.2.0.
## ℹ Please use `X_13ARIMA_SEATS()` instead.
## ℹ You can specify the X-11 decomposition method by including x11() in the model
##   formula of `X_13ARIMA_SEATS()`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Plotting the decomposition
autoplot(x11_dcmp) +
  labs(title = "X11 Decomposition of Australian Retail Turnover")

Analysis:

Trend: There is a steady long-term increase, with a notable “hump” and dip around 1995-1996, indicating a specific economic cycle or structural shift.

Seasonality: The series shows a very strong, regular annual pattern. The X-11 method reveals that the seasonal amplitude (the size of the peaks) is increasing over time.

Irregular: The bottom panel shows significant outliers. Specifically, there is a massive spike around 1995 (matching the trend anomaly) and another notable peak near 2010.

3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The STL decomposition shows a dominant, nearly linear upward trend, with the labour force growing from approximately 6,500 to 9,000 thousand persons. The seasonal component is highly regular, with an annual range of about 200 units (from $-100$ to $+100$), as seen in the scale of the third panel. However, the remainder component scale is much larger (down to $-400$), indicating that significant irregular events have a greater impact on the data than the normal seasonal fluctuations. Figure 3.20 further reveals that the seasonal pattern is not static; for instance, the “August” seasonal effect shifted from a small dip to a much deeper one over the 15-year period.

b. Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession is clearly visible, primarily in the remainder component. There is a massive downward spike in 1991/1992, where the remainder drops toward $-400$, indicating a sharp, unexpected decrease in the labour force that cannot be explained by trend or seasonality. Additionally, the trend component shows a visible “plateau” or flattening during this same period, reflecting the stalled job growth caused by the economic downturn.

Data 624: Homework 2

Arutam Antunish

2026-02-16