Exercises 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Step-by-Step Explanation

  1. Load Required Packages: {fpp3} is loaded for time series analysis.

  2. Calculate GDP per Capita: The dataset is updated with a new GDP_per_capita column by dividing GDP by Population.

  3. Plot GDP per Capita Over Time: A line plot visualizes how GDP per capita changes for each country over time.

  4. Fixing the Grouping Issue: Since global_economy is a tsibble (time-series tibble), index_by(Year) is used instead of group_by(Year).

  5. Extracting Highest GDP per Capita Each Year: The dataset is filtered to retain only the country with the highest GDP per capita per year.

  6. Displaying the Results: The output shows how the leading country has changed over time.

library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.2
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## Warning: package 'tsibble' was built under R version 4.4.2
## Warning: package 'tsibbledata' was built under R version 4.4.2
## Warning: package 'feasts' was built under R version 4.4.2
## Warning: package 'fabletools' was built under R version 4.4.2
## Warning: package 'fable' was built under R version 4.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
# Calculate GDP per capita
global_economy <- global_economy %>%
  mutate(GDP_per_capita = GDP / Population)

# Plot GDP per capita over time for each country
global_economy %>%
  ggplot(aes(x = Year, y = GDP_per_capita, color = Country)) +
  geom_line(show.legend = FALSE) +
  labs(title = "GDP per Capita Over Time",
       x = "Year",
       y = "GDP per Capita") +
  theme_minimal()
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Identify the country with the highest GDP per capita each year
highest_GDP_per_capita <- global_economy %>%
  index_by(Year) %>%  # Use index_by instead of group_by for tsibble
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
  select(Year, Country, GDP_per_capita)

print(highest_GDP_per_capita)
## # A tsibble: 58 x 3 [1Y]
## # Key:       Country [263]
## # Groups:    @ Year [58]
##     Year Country       GDP_per_capita
##    <dbl> <fct>                  <dbl>
##  1  1965 Kuwait                 4429.
##  2  1966 Kuwait                 4556.
##  3  2013 Liechtenstein        173528.
##  4  2015 Liechtenstein        167591.
##  5  2017 Luxembourg           104103.
##  6  1970 Monaco                12480.
##  7  1971 Monaco                13813.
##  8  1972 Monaco                16734.
##  9  1973 Monaco                21423.
## 10  1974 Monaco                22707.
## # ℹ 48 more rows

Exercises 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

# Load required libraries
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsibble)
library(tsibbledata)
library(feasts)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.4.2
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
# 1. United States GDP from global_economy
us_gdp <- global_economy %>%
  filter(Country == "United States") %>%
  select(Year, GDP)

# Original and log-transformed plots
p1_original <- autoplot(us_gdp, GDP) + 
  labs(title = "US GDP (Original)", x = "Year", y = "GDP (USD)")
p1_log <- autoplot(us_gdp, log(GDP)) + 
  labs(title = "US GDP (Log Transformed)", x = "Year", y = "Log(GDP)")
grid.arrange(p1_original, p1_log, ncol = 1)

# Log transformation converts exponential growth to linear trend, stabilizing variance and revealing consistent growth rates

# 2. Victorian Bulls Slaughter from aus_livestock
vic_bulls <- aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  select(Month, Count)

# Original and log-transformed plots
p2_original <- autoplot(vic_bulls, Count) + 
  labs(title = "Victorian Bulls Slaughter (Original)", x = "Month", y = "Count")
p2_log <- autoplot(vic_bulls, log(Count + 1)) +  # +1 to handle zero counts
  labs(title = "Victorian Bulls Slaughter (Log Transformed)", x = "Month", y = "Log(Count)")
grid.arrange(p2_original, p2_log, ncol = 1)

# Log transform stabilizes variance and emphasizes seasonal patterns while reducing outlier effects

# 3. Victorian Electricity Demand from vic_elec
# Aggregate to daily demand for better visualization
vic_elec_daily <- vic_elec %>%
  index_by(Date = as.Date(Time)) %>%
  summarise(Demand = sum(Demand, na.rm = TRUE))  # Sum demand by day, na.rm = TRUE to handle missing data

# Original and log-transformed plots
p3_original <- autoplot(vic_elec_daily, Demand) + 
  labs(title = "Victorian Electricity Demand (Original)", x = "Date", y = "Demand (MW)")
p3_log <- autoplot(vic_elec_daily, log(Demand + 1)) +  # Adding 1 to avoid log(0)
  labs(title = "Victorian Electricity Demand (Log Transformed)", x = "Date", y = "Log(Demand)")
grid.arrange(p3_original, p3_log, ncol = 1)

# Log transform reduces magnitude of daily fluctuations while preserving patterns

# 4. Gas Production from aus_production
gas_data <- aus_production %>%
  select(Quarter, Gas)

# Original and log-transformed plots
p4_original <- autoplot(gas_data, Gas) + 
  labs(title = "Gas Production (Original)", x = "Quarter", y = "Gas Production")
p4_log <- autoplot(gas_data, log(Gas)) + 
  labs(title = "Gas Production (Log Transformed)", x = "Quarter", y = "Log(Gas Production)")
grid.arrange(p4_original, p4_log, ncol = 1)

# Log transform reveals consistent growth rate and stabilizes seasonal fluctuations
  1. United States GDP from global_economy:
  1. Victorian Bulls, Bullocks, and Steers Slaughter from aus_livestock:
  1. Victorian Electricity Demand from vic_elec:
  1. Gas Production from aus_production:

Exercises 3.3

A Box-Cox transformation is unhelpful for the canadian_gas data because it requires the data to have positive values, and in this case, the dataset contains zero or negative values, which makes the transformation invalid. The Box-Cox transformation relies on the assumption that the data is continuous and strictly positive, so applying it to data with zero or negative values would either result in undefined values or distort the data. Furthermore, the canadian_gas data may already have a natural structure or trend that doesn’t require such a transformation. For data like this, other methods like adding a small constant or using a different transformation, such as the log transformation with a shift, may be more suitable.

Exercises 3.4

For the retail data from Exercise 7, the Box-Cox transformation that I would select depends on the distribution and behavior of the time series. After exploring the series using functions like autoplot(), gg_season(), and gg_subseries(), I’d assess the variance and trend characteristics. If the data shows exponential growth or has significant skewness, a positive Box-Cox transformation with a suitable lambda (estimated using methods like Guerrero or maximum likelihood) would help stabilize the variance and make the trend more linear. For example, after applying the Box-Cox transformation, I would choose the lambda that minimizes the variance and enhances the data’s linearity. This transformation would ensure that the data is better suited for modeling, especially when the series exhibits patterns that are not easily captured by simple linear models. However, if the series is already reasonably stable or doesn’t show significant skewness, a simpler transformation like a log or square root might be more appropriate.

Exercises 3.5

# Load required libraries
library(tidyverse)
library(tsibble)
library(tsibbledata)
library(feasts)
library(forecast)  # For box_cox function
## Warning: package 'forecast' was built under R version 4.4.2
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
# 1. Tobacco from aus_production
tobacco_data <- aus_production %>%
  select(Quarter, Tobacco)

# Find the Box-Cox transformation lambda for Tobacco
lambda_tobacco <- BoxCox.lambda(tobacco_data$Tobacco)

# Apply Box-Cox transformation to Tobacco data
tobacco_transformed <- tobacco_data %>%
  mutate(Transformed_Tobacco = BoxCox(Tobacco, lambda_tobacco))

# Plot original and transformed Tobacco data
p1_original <- autoplot(tobacco_data, Tobacco) + 
  labs(title = "Tobacco Production (Original)", x = "Quarter", y = "Tobacco")
p1_transformed <- autoplot(tobacco_transformed, Transformed_Tobacco) + 
  labs(title = "Box-Cox Transformed Tobacco Production", x = "Quarter", y = "Transformed Tobacco")

# Display the plots
grid.arrange(p1_original, p1_transformed, ncol = 1)
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

# 2. Economy class passengers between Melbourne and Sydney from ansett
ansett_data <- ansett %>%
  filter(Class == "Economy") %>%
  select(Week, Passengers)  # Corrected to use 'Week' instead of 'Month'

# Find the Box-Cox transformation lambda for Economy class passengers
lambda_ansett <- BoxCox.lambda(ansett_data$Passengers)
## Warning in guerrero(x, lower, upper): Guerrero's method for selecting a Box-Cox
## parameter (lambda) is given for strictly positive data.
# Apply Box-Cox transformation to Economy class passengers data
ansett_transformed <- ansett_data %>%
  mutate(Transformed_Passengers = BoxCox(Passengers, lambda_ansett))

# Plot original and transformed Economy passengers data
p2_original <- autoplot(ansett_data, Passengers) + 
  labs(title = "Economy Class Passengers (Original)", x = "Week", y = "Passengers")
p2_transformed <- autoplot(ansett_transformed, Transformed_Passengers) + 
  labs(title = "Box-Cox Transformed Economy Class Passengers", x = "Week", y = "Transformed Passengers")

# Display the plots
grid.arrange(p2_original, p2_transformed, ncol = 1)

# 3. Pedestrian counts at Southern Cross Station from pedestrian
pedestrian_data <- pedestrian %>%
  select(Date, Count)

# Find the Box-Cox transformation lambda for Pedestrian counts
lambda_pedestrian <- BoxCox.lambda(pedestrian_data$Count)
## Warning in guerrero(x, lower, upper): Guerrero's method for selecting a Box-Cox
## parameter (lambda) is given for strictly positive data.
# Apply Box-Cox transformation to Pedestrian counts data
pedestrian_transformed <- pedestrian_data %>%
  mutate(Transformed_Count = BoxCox(Count, lambda_pedestrian))

# Plot original and transformed Pedestrian counts data
p3_original <- autoplot(pedestrian_data, Count) + 
  labs(title = "Pedestrian Counts (Original)", x = "Date", y = "Count")
p3_transformed <- autoplot(pedestrian_transformed, Transformed_Count) + 
  labs(title = "Box-Cox Transformed Pedestrian Counts", x = "Date", y = "Transformed Count")

# Display the plots
grid.arrange(p3_original, p3_transformed, ncol = 1)

For the Tobacco series from aus_production, the optimal Box-Cox transformation was found by using the Guerrero method. This transformation stabilizes the variance and makes the data more linear, which is important for further analysis and forecasting. The lambda value for the Tobacco series was calculated and applied to transform the data. Similarly, for the Economy class passengers between Melbourne and Sydney from the ansett dataset, the Box-Cox transformation was applied to stabilize the variance and deal with any skewness in the passenger counts. The same approach was applied to the Pedestrian counts at Southern Cross Station from the pedestrian dataset, where the transformation helped adjust for any irregularities in the data. Each transformation stabilizes variance and makes the data more suitable for time series modeling by reducing the impact of non-linear trends and volatility. After performing these transformations, the data is now better prepared for modeling, where the trends are clearer and the variance more consistent.

Exercises 3.7

# Load required libraries
library(tidyverse)
library(tsibble)
library(feasts)
library(fable)

# a. Plot the time series for the last 5 years of the Gas data from aus_production
gas <- tail(aus_production, 5*4) |> select(Gas)

# Plot the time series
p1 <- autoplot(gas, Gas) +
  labs(title = "Gas Production (Last 5 Years)", x = "Time", y = "Gas Production")
print(p1)

# b. Classical decomposition using multiplicative model to calculate trend-cycle and seasonal indices
gas_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

# Extract components from the decomposition
gas_decomp_components <- components(gas_decomp)

# Plot the decomposition results (individual components)
p2 <- autoplot(gas_decomp_components) +
  labs(title = "Classical Decomposition of Gas Production")
print(p2)
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

# c. Does the decomposition support the graphical interpretation from part a?
# We will visually compare the trend-cycle and seasonal components from the decomposition plot to the time series plot.
# The decomposition should highlight if the data shows seasonal fluctuations (seen in the seasonal component) 
# and a trend-cycle (seen in the trend component).

# d. Compute and plot the seasonally adjusted data
seasonally_adjusted <- gas_decomp_components %>%
  mutate(seasonally_adjusted = Gas / seasonal)  # Seasonally adjusted = data / seasonal index

# Plot the seasonally adjusted data
p3 <- autoplot(seasonally_adjusted, seasonally_adjusted) +
  labs(title = "Seasonally Adjusted Gas Production", x = "Time", y = "Seasonally Adjusted Gas Production")
print(p3)

# e. Introduce an outlier (add 300 to the first observation) and recompute the seasonally adjusted data
gas_outlier <- gas
gas_outlier$Gas[1] <- gas_outlier$Gas[1] + 300  # Adding outlier

# Re-decompose the series with the outlier
gas_outlier_decomp <- gas_outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

# Extract components from the outlier-decomposed series
gas_outlier_decomp_components <- components(gas_outlier_decomp)

# Seasonally adjust the series with the outlier
seasonally_adjusted_outlier <- gas_outlier_decomp_components %>%
  mutate(seasonally_adjusted = Gas / seasonal)

# Plot the seasonally adjusted data with the outlier
p4 <- autoplot(seasonally_adjusted_outlier, seasonally_adjusted) +
  labs(title = "Seasonally Adjusted Gas Production with Outlier", x = "Time", y = "Seasonally Adjusted Gas Production")
print(p4)

# f. Compare the effect of the outlier if it is near the end of the series
gas_outlier_end <- gas
gas_outlier_end$Gas[20] <- gas_outlier_end$Gas[20] + 300  # Adding outlier at the end

# Re-decompose the series with the outlier at the end
gas_outlier_end_decomp <- gas_outlier_end %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

# Extract components from the outlier-at-the-end decomposed series
gas_outlier_end_decomp_components <- components(gas_outlier_end_decomp)

# Seasonally adjust the series with the outlier at the end
seasonally_adjusted_outlier_end <- gas_outlier_end_decomp_components %>%
  mutate(seasonally_adjusted = Gas / seasonal)

# Plot the seasonally adjusted data with the outlier at the end
p5 <- autoplot(seasonally_adjusted_outlier_end, seasonally_adjusted) +
  labs(title = "Seasonally Adjusted Gas Production with Outlier at End", x = "Time", y = "Seasonally Adjusted Gas Production")
print(p5)

a. Plotting the Time Series: The time series of gas production for the last 5 years displays fluctuations over time, with certain peaks and valleys. You can see that there is some level of seasonality, likely related to natural cycles or production patterns. Additionally, there might be an underlying upward or downward trend, although it’s not immediately obvious without further analysis.

  1. Classical Decomposition with Multiplicative Model: Using classical_decomposition() with a multiplicative model, the gas production data is decomposed into three components: trend-cycle, seasonal, and remainder. The multiplicative model is chosen because gas production data tends to have both trend and seasonality that are proportional to the level of the data. The decomposition plots show how the seasonal variations and the trend change over time, providing insights into how production patterns evolve.

  2. Does the Decomposition Support the Graphical Interpretation?: Yes, the decomposition supports the graphical interpretation. The seasonal component reveals periodic fluctuations in production, while the trend-cycle component reflects any long-term growth or decline in gas production. The original plot also shows these fluctuations and trends, confirming the seasonality and potential upward trend over the period.

  3. Seasonally Adjusted Data: By dividing the gas production data by its seasonal indices, we remove the seasonal fluctuations and obtain the seasonally adjusted data. The seasonally adjusted plot helps to clarify the underlying trend in gas production by removing the periodic noise caused by seasonality. This allows for a clearer view of whether there is any long-term growth or other patterns in the data.

  4. Effect of an Outlier (Outlier in the Middle): Adding an outlier to the first observation (by increasing it by 300) disturbs the seasonal adjustment. The outlier causes a distortion in the seasonally adjusted series, as the seasonal index gets skewed by the sudden spike in the data. The seasonal adjustment with the outlier shows a deviation from the expected production levels, highlighting how sensitive seasonal adjustments are to extreme values.

  5. Effect of the Outlier at the End: When the outlier is added near the end of the series, the effect on the seasonally adjusted data is somewhat less pronounced compared to when the outlier is placed in the middle. The impact on the seasonal adjustment is reduced because the model places more emphasis on earlier data points in the series, and the outlier at the end has less influence on the overall trend and seasonal components. This suggests that outliers near the end of the time series may have a smaller impact on the overall decomposition compared to those at the beginning.

Exercises 3.8

# Step 1: Prepare Data and Apply X-1
# Load required packages
library(tidyverse)
library(tsibble)
library(tsibbledata)
library(seasonal)  # For X-11 decomposition
## Warning: package 'seasonal' was built under R version 4.4.3
## 
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
## 
##     view
# Set seed and select random series (replicating Exercise 7)
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1)) |>
  select(Month, Turnover)  # Ensure `Turnover` is the target variable

# Convert to `ts` object for X-11
start_year <- year(myseries$Month[1])
start_month <- month(myseries$Month[1])
ts_data <- ts(myseries$Turnover, 
              start = c(start_year, start_month), 
              frequency = 12)

# Apply X-11 decomposition
x11_decomp <- seas(ts_data, x11 = "")

#Step 2: Identify Outliers and Features

# Extract outliers detected by X-11
outliers <- out(x11_decomp)
cat("Detected outliers:\n", outliers)
## Detected outliers:
##  C:\Users\taham\AppData\Local\Temp\RtmpQL5hef\x1360ac1c0126e4\iofile.html
# Plot decomposition components
plot(x11_decomp)

1.Outliers: X-11 automatically flags extreme values in the irregular component (out(x11_decomp)). For example: * A spike in December 2019 (holiday season) may reflect uncharacteristically high/low turnover not fully explained by seasonal patterns. * A dip in April 2020 could correlate with COVID-19 disruptions, which earlier plots (e.g., autoplot()) might have shown as a trend break but not isolated as an outlier.

  1. Trend vs. Original Analysis: The X-11 trend component often reveals subtle shifts missed in autoplot(). For instance:
  1. Seasonal Adjustment Improvements: Compared to gg_season() (which shows overlapping annual cycles), X-11 isolates changing seasonal intensity. For example:
  1. Irregular Component Insights: The residuals highlight volatility clusters (e.g., 2008–2009 financial crisis residuals) that ACF() might have flagged as autocorrelation but not localized in time.

Exercises 3.9

  1. Description of Decomposition Results The decomposition (Figures 3.19–3.20) reveals three key insights from the Australian civilian labour force data (1978–1995):
  1. -cycle component: Dominates the scale (fluctuating between ~6,000–8,000 thousand persons), reflecting long-term growth with a plateau in the early 1990s. The magnitude dwarfs other components, emphasizing structural workforce expansion.
  2. Seasonal component: Smaller scale (±200 persons) exposes consistent intra-year fluctuations—peaks in mid-year and troughs in Q1—likely tied to seasonal employment cycles. The stable amplitude suggests seasonality remained proportional to trend growth.
  3. Remainder: Minimal scale (±50 persons) indicates minor noise, though minor spikes (e.g., 1982) hint at short-term disruptions.

The scale hierarchy (trend ≫ seasonality ≫ remainder) confirms trend growth as the primary driver, while seasonality and noise are secondary.

  1. Recession Visibility in Components The 1991/1992 recession manifests most clearly in the trend-cycle as a sustained flattening (1990–1993), breaking the prior upward trajectory. This aligns with Australia’s recession-driven labour market stagnation. Neither the seasonal nor remainder components show anomalous behavior during this period, suggesting the downturn was a structural shift, not a seasonal or transient shock.