DATA 624 HW 2

Chapter 3 Exercises

library(fpp3)
library(tsibble)
library(dplyr)
library(ggplot2)
library(seasonal)

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
```
# Step 1: Find most recent year
latest_year <- max(global_economy$Year)

# Step 2: Identify top 10 countries in that year
top10 <- global_economy %>% 
  filter(Year == latest_year) %>% 
  mutate(GDP_per_capita = GDP / Population) %>% 
  slice_max(GDP_per_capita, n = 10) %>% 
  pull(Country)

# Step 3: Plot GDP per capita over time for those countries
global_economy %>% 
  filter(Country %in% top10) %>% 
  autoplot(GDP / Population) +
  labs(
    title = paste("GDP per Capita Over Time (Top 10 Countries in", latest_year, ")"),
    y = "GDP per capita ($US)"
  )
```
```
## Warning: Removed 32 rows containing missing values or values outside the scale range
## (`geom_line()`).
```
Over time, GDP per capita for the top 10 countries in 2017 has increased substantially, showing a strong long-term upward trend from the 1960s onward. Growth accelerates particularly after the 1990s and early 2000s, with several countries experiencing sharp rises during this period. There is a noticeable dip around 2008–2009 corresponding to the global financial crisis, followed by recovery. Some countries display greater volatility, while others grow more steadily, and the gap between the highest-income countries and the rest widens in recent decades. Overall, the pattern reflects sustained economic growth with increasing divergence at the top.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
- United States GDP from global_economy.
```
global_economy %>% 
  filter(Country == "United States") %>% 
  autoplot(GDP) +
  labs(title = "United States GDP",
   y = "USD (current)")
```
```
#log
global_economy %>% 
  filter(Country == "United States") %>% 
  autoplot(log(GDP)) +
  labs(title = "Log of United States GDP",
   y = "log(GDP)")
```
- When the original plot curves upward, it suggests accelerating growth over time, meaning the values are increasing at an increasing rate. After applying a log transformation, the trend often appears more linear because exponential growth becomes approximately a straight line on a log scale. This transformation also helps stabilize variability, reducing the widening spread that commonly occurs as values grow larger. As a result, patterns become easier to interpret, especially growth rates, since a constant slope on a log-transformed plot corresponds to a constant percentage growth rate over time.
- Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>% 
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") %>% 
  autoplot(Count) +
  labs(title = "Slaughter of Bulls, Bullocks and Steers - Victoria",
       y = "Head (thousands)")

# Log
aus_livestock %>% 
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") %>% 
  autoplot(log(Count)) +
  labs(title = "Log Slaughter - Victoria",
       y = "log(Head)")

Applying a log transformation can reduce seasonal amplitude when seasonal fluctuations grow larger over time. It helps stabilize the variance by compressing higher values more than lower ones, which prevents the seasonal swings from widening as the series increases. This often makes the underlying seasonal pattern easier to see and interpret. In many real-world datasets, such as livestock slaughter data, seasonality is accompanied by increasing variation as overall production rises. In those cases, a log transformation is often appropriate because it produces more consistent seasonal patterns and improves interpretability.
Victorian Electricity Demand from vic_elec.

aus_livestock %>% 
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") |>
  autoplot(log(Count)) +
  labs(title = "Log Slaughter - Victoria",
       y = "log(Head)")

#log
vic_elec %>% 
  autoplot(log(Demand)) +
  labs(title = "Log Victorian Electricity Demand")

In most cases, a transformation is not necessary unless the variance clearly increases at higher demand levels or you specifically want to stabilize extreme peaks. If the series shows larger fluctuations as demand rises, a transformation such as a log can help make the variability more consistent. The main effect would be to reduce the impact of extreme peaks and slightly compress high-demand periods, while still preserving the underlying structure of the data. Importantly, daily and weekly seasonal patterns typically remain visible after transformation, though the amplitude of large spikes may appear less dramatic.
Gas production from aus_production.
Gas production data typically show strong seasonality, a clear upward trend, and increasing seasonal amplitude over time. When the seasonal fluctuations grow as the level increases, a transformation such as taking the log is appropriate. This makes the seasonal variation more constant across time, reduces the widening spread, and often makes the overall trend appear more linear. As a result, the series becomes easier to model using additive methods, since both the trend and seasonal components behave more consistently.

aus_production %>% 
  autoplot(Gas) +
  labs(title = "Australian Gas Production",
       y = "Petajoules")

Why is a Box-Cox transformation unhelpful for the canadian_gas data?
```
canadian_gas %>% 
  autoplot(box_cox(Volume, 0))
```
A Box–Cox transformation is not useful for the canadian_gas series because the data already displays roughly constant variance and additive seasonality. The size of the seasonal fluctuations stays about the same over time rather than increasing with the level of the series.

Since Box–Cox transformations are primarily used to stabilize increasing variance or convert multiplicative seasonality into additive seasonality, applying it here produces little to no change in the structure of the data. In other words, the transformation does not meaningfully improve the series for modeling or interpretation.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

A Box–Cox transformation with lambda near 0 or an equivalent to taking the logarithm would be appropriate. Retail sales data commonly show a rising trend and larger seasonal fluctuations as overall sales increase. Applying a log transformation helps stabilize the variance, keeps seasonal effects more consistent over time, and makes the trend appear more linear. This results in a series that is easier to interpret and better suited for additive time series models.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco
```
aus_production %>% 
  filter(!is.na(Tobacco)) %>% 
  features(Tobacco, features = guerrero)
```
```
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.926
```
The estimated lambda of 0.93 is very close to 1, indicating that the variance is already fairly stable. Since lambda = 1 corresponds to no transformation, this series does not require a Box–Cox transformation. The original scale is appropriate for modeling.

Passengers
```
ansett %>% 
  filter(Class == "Economy") %>% 
  features(Passengers, features = guerrero)
```
```
## # A tibble: 10 × 3
##    Airports Class   lambda_guerrero
##    <chr>    <chr>             <dbl>
##  1 ADL-PER  Economy           2.00 
##  2 MEL-ADL  Economy           2.00 
##  3 MEL-BNE  Economy           1.56 
##  4 MEL-OOL  Economy           1.51 
##  5 MEL-PER  Economy           0.969
##  6 MEL-SYD  Economy           2.00 
##  7 SYD-ADL  Economy           1.77 
##  8 SYD-BNE  Economy           1.80 
##  9 SYD-OOL  Economy           1.90 
## 10 SYD-PER  Economy           0.875
```
The lamda values vary by route but are generally near or above 1, with several close to 2. This suggests that a log transformation (lambda = 0) is not suitable and that variance does not increase proportionally with the level. In most cases, little to no transformation is needed, though a mild Box–Cox transformation could be used if modeling individual routes separately.

Pedestrians
```
pedestrian %>% 
  filter(Sensor == "Southern Cross Station") |>
  features(Count, features = guerrero)
```
```
## # A tibble: 1 × 2
##   Sensor                 lambda_guerrero
##   <chr>                            <dbl>
## 1 Southern Cross Station          -0.250
```
The estimated lambda of negative 0.25 indicates clear variance instability, with large peaks dominating the series. A transformation is recommended for this data. Applying the estimated Box–Cox transformation would reduce the impact of extreme values, stabilize the variance, and make seasonal patterns more consistent over time.
Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
```
# 3-term and 5-term moving average weights
w3 <- rep(1/3, 3)
w5 <- rep(1/5, 5)

# Convolve the weights
w_combined <- convolve(w5, rev(w3), type = "open")

# Display weights
round(w_combined, 3)
```
```
## [1] 0.067 0.133 0.200 0.200 0.200 0.133 0.067
```
```
# Check they sum to 1
sum(w_combined)
```
```
## [1] 1
```
A 3×5 moving average applies a 3-term moving average to a 5-term moving average. Since moving averages are linear filters, this is equivalent to convolving their weights. When the 3-term weights (1/3 each) are combined with the 5-term weights (1/5 each), the resulting 7-term weighted moving average has weights:

0.067, 0.133, 0.200, 0.200, 0.200, 0.133, 0.067.

These weights are symmetric and sum to 1, confirming the equivalence.
Consider the last five years of the Gas data from aus_production.
```
gas <- tail(aus_production, 5*4) |> select(Gas)
```
1. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
  
  The time series plot clearly shows strong quarterly seasonal fluctuations, with a consistent pattern of peaks and troughs repeating each year. In addition to seasonality, there is a noticeable upward trend-cycle over the five-year period, indicating that overall gas production is increasing over time. The seasonal swings appear somewhat proportional to the level of the series, suggesting multiplicative seasonality.
```
    library(fpp3)

    gas <- tail(aus_production, 5*4) %>% 
      select(Gas)

    autoplot(gas) +
      labs(title = "Quarterly Gas Production (Last 5 Years)")
```
```
## Plot variable not specified, automatically selected `.vars = Gas`
```
1. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
  
  The multiplicative classical decomposition separates the series into trend-cycle, seasonal, and remainder components. The trend-cycle component shows a smooth upward movement, confirming the long-term growth observed in the plot. The seasonal component reveals a stable quarterly pattern that repeats consistently each year. These results support the graphical interpretation from part (a), as both strong seasonality and an upward trend are clearly identified.
```
    gas_decomp <- gas %>% 
      model(classical_decomposition(Gas, type = "multiplicative"))

    components(gas_decomp) %>% 
      autoplot()
```
```
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
```
1. Do the results support the graphical interpretation from part a?
  
  Yes, the results support the graphical interpretation from part (a). The time series plot shows clear quarterly seasonal fluctuations and an overall upward trend-cycle. The multiplicative classical decomposition confirms this by separating the series into a steadily increasing trend-cycle component and a strong seasonal component with consistent quarterly patterns. The seasonally adjusted data remove the regular seasonal variation and reveal a smoother upward trend, which aligns with what was observed visually in the original plot.
2. Compute and plot the seasonally adjusted data.
  
  The seasonally adjusted series removes the recurring quarterly fluctuations, leaving a smoother representation of the underlying movement in gas production. Without the seasonal effects, the upward trend becomes easier to see and interpret. This adjusted series highlights the underlying growth pattern more clearly and is more suitable for analyzing long-term changes or forecasting.
```
    gas_components <- components(gas_decomp)

    gas_sa <- gas_components %>% 
      mutate(season_adjust = Gas / seasonal)

    autoplot(gas_sa, season_adjust) +
      labs(title = "Seasonally Adjusted Gas Production")
```
1. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
  
  The outlier produces a sharp spike in the seasonally adjusted series. It also distorts the trend-cycle component because classical decomposition uses moving averages. Nearby observations are affected, so the impact spreads around the outlier rather than remaining isolated.
```
    # Add outlier in the middle
    gas_outlier_mid <- gas %>% 
      mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))

    # Decompose
    decomp_mid <- gas_outlier_mid %>% 
      model(classical_decomposition(Gas, type = "multiplicative"))

    # Seasonally adjusted data
    sa_mid <- components(decomp_mid) %>% 
      mutate(season_adjust = Gas / seasonal)

    # Plot
    autoplot(sa_mid, season_adjust) +
      labs(title = "Seasonally Adjusted Gas (Outlier in Middle)")
```
1. Does it make any difference if the outlier is near the end rather than in the middle of the time series?
  
  Yes, the location matters. An outlier near the end has a larger effect because moving averages are less stable at the boundaries. This causes greater distortion in the final trend-cycle estimates compared to an outlier placed in the middle of the series.
```
    # Add outlier at the end
    gas_outlier_end <- gas %>% 
      mutate(Gas = if_else(row_number() == n(), Gas + 300, Gas))

    # Decompose
    decomp_end <- gas_outlier_end %>% 
      model(classical_decomposition(Gas, type = "multiplicative"))

    # Seasonally adjusted data
    sa_end <- components(decomp_end) %>% 
      mutate(season_adjust = Gas / seasonal)

    # Plot
    autoplot(sa_end, season_adjust) +
      labs(title = "Seasonally Adjusted Gas (Outlier at End)")
```
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

The selected retail series likely exhibits strong monthly seasonality and a long-term upward trend. X-11 decomposition confirms these features and often reveals outliers or temporary shocks that were less obvious in the original plot, particularly in the irregular component.
```
# Select random retail series
set.seed(12345678)

myseries <- aus_retail %>% 
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# X-11 decomposition
retail_x11 <- myseries %>% 
  model(X_13ARIMA_SEATS(Turnover ~ x11()))

components(retail_x11) %>% 
  autoplot()
```
```
components(retail_x11) %>% 
  autoplot(irregular)
```
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Figure 3.20: Seasonal component from the decomposition shown in the previous figure.
1. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
  
  The decomposition shows a clear and steady upward trend in the civilian labour force from 1978 to 1995. The seasonal component is relatively small in magnitude (about ±100 thousand) compared to the overall level of the series, which ranges between roughly 6,400 and 9,000 thousand persons, indicating that seasonality is modest relative to the trend. The seasonal pattern is stable and consistent across years, while the remainder component is generally small except for a few noticeable deviations.
2. Is the recession of 1991/1992 visible in the estimated components?
  
  Yes, the recession of 1991/1992 is visible mainly in the remainder component as a large negative spike and in a temporary slowing or flattening of the trend component. Although the overall trend continues upward, the downturn during this period reflects the economic impact of the recession.
This exercise uses the canadian_gas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).
1. Plot the data using autoplot(), gg_subseries() and gg_season() to look at the effect of the changing seasonality over time.³

# Time plot
autoplot(canadian_gas, Volume) +
  labs(title = "Monthly Canadian Gas Production",
       y = "Billion cubic metres")

# Seasonal plot
gg_season(canadian_gas, Volume) +
  labs(title = "Seasonal Plot: Canadian Gas Production",
       y = "Billion cubic metres")

## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Subseries plot
gg_subseries(canadian_gas, Volume) +
  labs(title = "Subseries Plot: Canadian Gas Production",
       y = "Billion cubic metres")

## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The time plot shows a strong upward trend in Canadian gas production over time, along with very pronounced annual seasonality. The seasonal fluctuations increase in magnitude as production increases, indicating multiplicative seasonality. The seasonal and subseries plots reveal that winter months consistently have higher production, while summer months are lower, but the size and shape of these seasonal peaks gradually change over time.

Do an STL decomposition of the data. You will need to choose a seasonal window to allow for the changing shape of the seasonal component.

# Plots (explicit variable to remove message)
autoplot(canadian_gas, Volume)

gg_season(canadian_gas, Volume)

gg_subseries(canadian_gas, Volume)

# STL with fixed seasonal pattern
gas_stl_periodic <- canadian_gas %>% 
  model(STL(Volume ~ season(window = "periodic")))

components(gas_stl_periodic) %>% 
  autoplot()

# STL allowing changing seasonal shape
gas_stl_flexible <- canadian_gas %>% 
  model(STL(Volume ~ season(window = 13)))

components(gas_stl_flexible) %>% 
  autoplot()

How does the seasonal shape change over time? [Hint: Try plotting the seasonal component using gg_season().]

The seasonal pattern remains strongly annual, with peaks in winter and troughs in summer, but the magnitude of the seasonal swings increases as overall production rises. The winter peaks become sharper and larger relative to earlier years. The seasonal component is not perfectly constant, with its amplitude grows and its shape slightly evolves, particularly during periods of rapid production growth.
Can you produce a plausible seasonally adjusted series?

The STL decomposition provides a reasonable seasonally adjusted series by removing the time-varying seasonal component. The adjusted series shows a smoother upward trend and highlights medium-term fluctuations without the strong winter–summer pattern. Because STL allows seasonality to change over time, the adjusted series appears realistic and stable.
Compare the results with those obtained using SEATS and X-11. How are they different?

STL is flexible and allows the seasonal pattern to evolve smoothly over time, making it well suited for changing seasonality. In contrast, X-11 and SEATS assume more structured seasonal behavior and may produce slightly smoother or more rigid seasonal estimates. SEATS is model-based (ARIMA-based), while X-11 is moving-average based. As a result, the seasonally adjusted series from SEATS and X-11 may differ slightly in how quickly they respond to structural changes, especially during periods of rapid growth or volatility.

DATA 624 HW 2

Rebecca Bronstein

2026-02-14

Chapter 3 Exercises