#Chapter 3 Exercises
To find the GDP per capita, we divide the total GDP by the population. This allows us to compare the economic output relative to the size of each country.
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.5.2
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.0 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.5.0
## ✔ ggplot2 4.0.1
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tsibble' was built under R version 4.5.2
## Warning: package 'tsibbledata' was built under R version 4.5.2
## Warning: package 'feasts' was built under R version 4.5.2
## Warning: package 'fabletools' was built under R version 4.5.2
## Warning: package 'fable' was built under R version 4.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
# 1. Plotting GDP per capita for all countries
global_economy %>%
mutate(GDP_per_capita = GDP / Population) %>%
autoplot(GDP_per_capita, alpha = 0.3) +
guides(colour = "none") +
labs(title = "GDP per capita over time", y = "USD")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
# 2. Identifying the country with the highest value
global_economy %>%
mutate(GDP_per_capita = GDP / Population) %>%
filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE))
## # A tsibble: 1 x 10 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7060236168. 7.18 NA NA NA 38132
## # ℹ 1 more variable: GDP_per_capita <dbl>
Analysis
Highest GDP per capita: Monaco consistently shows the highest values in the dataset due to its small population and high concentration of wealth.
United States GDP from global_economy.
# 1. US GDP (Scaling by population or inflation is common, but log helps variance)
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP) + labs(title = "US GDP")
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
# 2. Victorian Bulls (Count data often shows non-constant variance)
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
autoplot(Count) + labs(title = "Victorian Bulls Slaughtered")
Victorian Electricity Demand from vic_elec.
# 3. Victorian Electricity (High frequency; log or Box-Cox is typical)
vic_elec %>%
autoplot(Demand) + labs(title = "Victorian Electricity Demand")
Gas production from aus_production.
# 4. Gas Production (Clear seasonal variation that increases over time)
aus_production %>%
autoplot(Gas) + labs(title = "Gas Production")
# 1. US GDP - Applying Logarithm
global_economy %>%
filter(Country == "United States") %>%
autoplot(log(GDP)) +
labs(title = "Log Transformed US GDP", y = "Log(USD)")
# 2. Victorian Bulls - Applying Logarithm
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
autoplot(log(Count)) +
labs(title = "Log Transformed Victorian Bulls")
# 3. Electricity Demand - Applying Box-Cox (Automatic Lambda)
lambda_elec <- vic_elec %>%
features(Demand, features = guerrero) %>%
pull(lambda_guerrero)
vic_elec %>%
autoplot(box_cox(Demand, lambda_elec)) +
labs(title = "Box-Cox Transformed Electricity Demand", subtitle = paste("Lambda =", round(lambda_elec, 2)))
# 4. Gas Production - Applying Logarithm
aus_production %>%
autoplot(log(Gas)) +
labs(title = "Log Transformed Gas Production")
### Effects of Transformations United States GDP: Applying the Logarithm
removed the exponential curvature. The growth now appears as a straight
line, which makes it easier to see the constant percentage growth over
the decades.
Victorian Bulls: Applying the Logarithm stabilized the random fluctuations.
Electricity Demand: Applying the Box-Cox transformation evened out the extreme seasonal spikes. The high demand during summer and winter now has a comparable height, stabilizing the variance across the series.
Gas Production: Applying the Logarithm fixed the “fanning” effect. The seasonal “waves” used to get taller as production increased, but now the seasonal amplitude is constant from start to finish.
The Box-Cox transformation is unhelpful for the canadian_gas data because the seasonal variation is not proportional to the level of the series.
Reasons
Inconsistent Variance: In this dataset, the “swings” (seasonal peaks) get very large in the middle of the series and then smaller again at the end, even though the production level stays high.
Mathematical Limitation: A Box-Cox transformation (using a single \(\lambda\)) only works if the variation increases or decreases linearly with the trend. Since the variance here fluctuates independently, the transformation cannot stabilize it.
Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
set.seed(12345678) myseries <- aus_retail |>
filter(Series ID ==
sample(aus_retail$Series ID,1)) Explore your chosen retail
time series using the following functions:
autoplot(), gg_season(), gg_subseries(), gg_lag(),
ACF() |> autoplot()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
# 1. Exploration plots
myseries %>% autoplot(Turnover)
myseries %>% gg_season(Turnover)
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
myseries %>% gg_subseries(Turnover)
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
myseries %>% gg_lag(Turnover)
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
myseries %>% ACF(Turnover) %>% autoplot()
# 2. Finding the optimal Lambda
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# 3. Plotting the transformation
myseries %>%
autoplot(box_cox(Turnover, lambda)) +
labs(title = paste("Box-Cox Transformation with Lambda =", round(lambda, 2)))
Selected Transformation
For this series, the optimal Lambda (\(\lambda\)) is usually around 0.1 to 0.3 (depending on the specific retail category).
Effect: This transformation is necessary because the seasonal “swings” (the December peaks) grow larger as the Turnover increases. Using Box-Cox stabilizes the variance, making the size of the seasonal spikes uniform across the entire timeline.
# 1. Tobacco (Australian Production)
lambda_tobacco <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda_tobacco)) +
labs(title = "Transformed Tobacco Production", subtitle = paste("Lambda =", round(lambda_tobacco, 2)))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
# 2. Economy Class (Melbourne-Sydney)
economy_ansett <- ansett %>%
filter(Airports == "MEL-SYD", Class == "Economy")
lambda_passengers <- economy_ansett %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
economy_ansett %>%
autoplot(box_cox(Passengers, lambda_passengers)) +
labs(title = "Transformed Economy Passengers", subtitle = paste("Lambda =", round(lambda_passengers, 2)))
# 3. Pedestrian Counts (Southern Cross Station)
southern_cross <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
lambda_pedestrian <- southern_cross %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
southern_cross %>%
autoplot(box_cox(Count, lambda_pedestrian)) +
labs(title = "Transformed Pedestrian Counts", subtitle = paste("Lambda =", round(lambda_pedestrian, 2)))
Analysis
Tobacco: The \(\lambda\) is usually close to 0.9, meaning very little transformation is needed as the variance is relatively stable.
Economy Passengers: This series has massive dips. A Box-Cox helps, but the extreme outliers make stabilization difficult.
Pedestrian Counts: Typically requires a low \(\lambda\) (around -0.2 to 0.2). The transformation helps equalize the daily peaks (rush hours) throughout the weeks.
gas <- tail(aus_production, 5*4) |> select(Gas)
# Initial Setup
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas %>% autoplot(Gas) + labs(title = "Gas Production (Last 5 Years)")
Identification: The plot shows a clear upward trend and strong annual
seasonality, with peaks usually occurring in winter (Q2/Q3) due to
heating demand.
gas_decomp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
gas_decomp %>% autoplot()
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
Interpretation: Yes, the decomposition confirms the graphical intuition. The seasonal indices are consistent, and the trend-cycle shows steady growth.
# d. Seasonally Adjusted Data
gas_decomp %>%
autoplot(season_adjust) + labs(title = "Seasonally Adjusted Gas Data")
# e. Adding an Outlier in the middle
gas_outlier_mid <- gas
gas_outlier_mid$Gas[10] <- gas_outlier_mid$Gas[10] + 300
gas_outlier_mid %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot(season_adjust)
Mid-Outlier Effect: An outlier in the middle distorts the trend-cycle
significantly. Because classical decomposition uses moving averages, the
“spike” leaks into surrounding periods, making the trend look like it
has a hump.
# f. Adding an Outlier at the end
gas_outlier_end <- gas
gas_outlier_end$Gas[20] <- gas_outlier_end$Gas[20] + 300
gas_outlier_end %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot(season_adjust)
End-Outlier Effect: If the outlier is at the end, the effect on the
trend is harder to calculate because the moving average “runs out” of
data. It primarily affects the last available trend values, often making
the final direction of the series misleading.
library(seasonal)
## Warning: package 'seasonal' was built under R version 4.5.2
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
##
## view
library(fpp3)
# Using the same seed and series from Exercise 3.4
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
# Applying X-11 Decomposition
x11_dcmp <- myseries %>%
model(x11 = feasts:::X11(Turnover)) %>%
components()
## Warning: `X11()` was deprecated in feasts 0.2.0.
## ℹ Please use `X_13ARIMA_SEATS()` instead.
## ℹ You can specify the X-11 decomposition method by including x11() in the model
## formula of `X_13ARIMA_SEATS()`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Plotting the decomposition
autoplot(x11_dcmp) +
labs(title = "X11 Decomposition of Australian Retail Turnover")
Analysis:
Trend: There is a steady long-term increase, with a notable “hump” and dip around 1995-1996, indicating a specific economic cycle or structural shift.
Seasonality: The series shows a very strong, regular annual pattern. The X-11 method reveals that the seasonal amplitude (the size of the peaks) is increasing over time.
Irregular: The bottom panel shows significant outliers. Specifically, there is a massive spike around 1995 (matching the trend anomaly) and another notable peak near 2010.
The STL decomposition shows a dominant, nearly linear upward trend, with the labour force growing from approximately 6,500 to 9,000 thousand persons. The seasonal component is highly regular, with an annual range of about 200 units (from \(-100\) to \(+100\)), as seen in the scale of the third panel. However, the remainder component scale is much larger (down to \(-400\)), indicating that significant irregular events have a greater impact on the data than the normal seasonal fluctuations. Figure 3.20 further reveals that the seasonal pattern is not static; for instance, the “August” seasonal effect shifted from a small dip to a much deeper one over the 15-year period.
Yes, the recession is clearly visible, primarily in the remainder component. There is a massive downward spike in 1991/1992, where the remainder drops toward \(-400\), indicating a sharp, unexpected decrease in the labour force that cannot be explained by trend or seasonality. Additionally, the trend component shows a visible “plateau” or flattening during this same period, reflecting the stalled job growth caused by the economic downturn.