624_M3

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

data("global_economy")

# GDP per capita
global_economy <- global_economy %>%
  mutate(GDP_per_capita = GDP / Population)

highest_gdp_per_capita <- global_economy %>%
  index_by(Year) %>%   
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(GDP_per_capita))

print(highest_gdp_per_capita %>% select(Year, Country, GDP_per_capita))

## # A tsibble: 58 x 3 [1Y]
## # Key:       Country [263]
##     Year Country       GDP_per_capita
##    <dbl> <fct>                  <dbl>
##  1  2014 Monaco               185153.
##  2  2008 Monaco               180640.
##  3  2013 Liechtenstein        173528.
##  4  2016 Monaco               168011.
##  5  2015 Liechtenstein        167591.
##  6  2007 Monaco               167125.
##  7  2011 Monaco               162155.
##  8  2012 Monaco               152000.
##  9  2009 Monaco               149221.
## 10  2010 Monaco               144569.
## # ℹ 48 more rows

# plot 
global_economy %>%
  ggplot(aes(x = Year, y = GDP_per_capita, group = Country, color = Country)) +
  geom_line(alpha = 0.7, size = 0.8) +  
  labs(title = "GDP per Capita Over Time for Each Country",
       x = "Year",
       y = "GDP per Capita (USD)",
       color = "Country") +
  theme_minimal() +
  theme(legend.position = "none")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

# plot for change of the top country over time
highest_gdp_per_capita %>%
  ggplot(aes(x = Year, y = GDP_per_capita, label = Country)) +
  geom_line(color = "blue", size = 1) +
  geom_point(size = 2, color = "red") +
  geom_text(nudge_y = 5000, size = 3, check_overlap = TRUE) +
  labs(title = "Highest GDP per Capita Over Time",
       x = "Year",
       y = "GDP per Capita (USD)") +
  theme_minimal()

The latest data available seems to show that Luxembourg has the highest GDP. Over time, though, it seems Monaco had the highest GDP since data collection began for this dataset, while Liechtenstein took that spot once in the last ten years or so.

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

Starting with:

United States GDP from global_economy

log transformation

library(fpp3)

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title = "United States GDP Over Time", 
       x = "Year", y = "GDP (USD)") +
  theme_minimal()

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(log(GDP)) +
  labs(title = "Log-Transformed United States GDP Over Time",
       x = "Year", y = "Log(GDP)") +
  theme_minimal()

The log transformation made the trend more linear and stabilized varianc somewhat.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

Box-Cox transformation

aus_livestock %>%
  filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
  autoplot(Count) +
  labs(title = "Slaughter of Victorian Bulls, Bullocks, and Steers",
       x = "Year", y = "Count") +
  theme_minimal()

aus_livestock %>%
  filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
  mutate(Transformed_Count = box_cox(Count, lambda = 0)) %>%
  autoplot(Transformed_Count) +
  labs(title = "Log-Transformed Slaughter of Victorian Bulls, Bullocks, and Steers",
       x = "Year", y = "Log(Count)") +
  theme_minimal()

aus_livestock %>%
  filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
  mutate(Diff_Count = difference(Count)) %>%
  autoplot(Diff_Count) +
  labs(title = "First-Order Differenced Slaughter of Victorian Bulls, Bullocks, and Steers",
       x = "Year", y = "Differenced Count") +
  theme_minimal()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

Box-cox transformation stabilized the variance slightly, but the first-order differencing removed the long-term trend but may have also caused some noise and large fluctuations.

Victorian Electricity Demand from vic_elec.

log transformation

vic_elec %>%
  autoplot(Demand) +
  labs(title = "Victorian Electricity Demand Over Time",
       x = "Time", y = "Demand (MW)") +
  theme_minimal()

vic_elec %>%
  mutate(Transformed_Demand = log(Demand)) %>%
  autoplot(Transformed_Demand) +
  labs(title = "Log-Transformed Victorian Electricity Demand",
       x = "Time", y = "Log(Demand)") +
  theme_minimal()

The log transformation here made the variance less pronounced and more stabilized (look at the Y axis).

Gas production from aus_production.

differencing

aus_production %>%
  autoplot(Gas) +
  labs(title = "Australian Gas Production Over Time",
       x = "Year", y = "Gas Production") +
  theme_minimal()

aus_production %>%
  mutate(Diff_Gas = difference(Gas)) %>%
  autoplot(Diff_Gas) +
  labs(title = "Differenced Australian Gas Production",
       x = "Year", y = "Differenced Gas Production") +
  theme_minimal()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

The first-order differencing removed the strong upward trend, though the plot still hows increasing variance and strong seasonality.

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume) +
  labs(title = "Canadian Gas Production Over Time",
       x = "Year", y = "Gas Production (Volume)") +
  theme_minimal()

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  mutate(BoxCox_Volume = box_cox(Volume, lambda)) %>%
  autoplot(BoxCox_Volume) +
  labs(title = "Box-Cox Transformed Canadian Gas Production",
       x = "Year", y = "Transformed Gas Production") +
  theme_minimal()

print(paste("Optimal lambda:", lambda))

## [1] "Optimal lambda: 0.576764791258997"

Box-cox transformation has not removed the strong upward trend or the seasonal patterns in Canadian gas production. It does not make the series stationary.

## What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

aus_retail %>%
  filter(Industry == "Department stores") %>% 
  autoplot(Turnover) +
  labs(title = "Australian Retail Turnover (Department Stores)",
       x = "Year", y = "Turnover (Million AUD)") +
  theme_minimal()

# Specifying only one:

set.seed(123)

myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries |> 
  autoplot(Turnover) +
  labs(title = "Randomly Selected Retail Time Series",
       x = "Year", y = "Turnover (Million AUD)") +
  theme_minimal()

There is an upward trend as well as some seasonable variation. let’s try Box-cox transformation to stabilize the increasing fluctuations over time:

lambda <- myseries |> features(Turnover, features = guerrero) |> pull(lambda_guerrero)

myseries |> 
  mutate(BoxCox_Turnover = box_cox(Turnover, lambda)) |> 
  autoplot(BoxCox_Turnover) +
  labs(title = "Box-Cox Transformed Retail Turnover",
       x = "Year", y = "Transformed Turnover") +
  theme_minimal()

I think it worked and it is somewhat better for the increasing fluctuations, but there is still an upward trend!

So to try to remove trend, we can add first order differencing:

myseries |> 
  mutate(BoxCox_Turnover = box_cox(Turnover, lambda),
         Diff_BoxCox = difference(BoxCox_Turnover),
         Seasonal_Diff_BoxCox = difference(Diff_BoxCox, lag = 12)) |> 
  autoplot(Seasonal_Diff_BoxCox) +
  labs(title = "Fully Transformed Retail Turnover (Box-Cox + Differencing)",
       x = "Year", y = "Transformed Turnover") +
  theme_minimal()

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian

Tobacco Production

Using Box-Cox transformation with optimal lambda

lambda_tobacco <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  mutate(BoxCox_Tobacco = box_cox(Tobacco, lambda_tobacco)) %>%
  autoplot(BoxCox_Tobacco) +
  labs(title = "Box-Cox Transformed Tobacco Production",
       x = "Year", y = "Transformed Tobacco Production") +
  theme_minimal()

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

print(paste("Optimal lambda for Tobacco:", lambda_tobacco))

## [1] "Optimal lambda for Tobacco: 0.926463585274373"

Economy Class Passengers

Using Box-Cox transformation with optimal lambda

# Compute optimal lambda for economy class passengers (Melbourne-Sydney)
lambda_passengers <- ansett %>%
  filter(Class == "Economy", Airports == "MEL-SYD") %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

# Apply Box-Cox transformation
ansett %>%
  filter(Class == "Economy", Airports == "MEL-SYD") %>%
  mutate(BoxCox_Passengers = box_cox(Passengers, lambda_passengers)) %>%
  autoplot(BoxCox_Passengers) +
  labs(title = "Box-Cox Transformed Economy Class Passengers (MEL-SYD)",
       x = "Year", y = "Transformed Passenger Count") +
  theme_minimal()

# Print the optimal lambda
print(paste("Optimal lambda for Economy Class Passengers:", lambda_passengers))

## [1] "Optimal lambda for Economy Class Passengers: 1.9999267732242"

Pedestrian Counts at Southern Cross Station

Using Box-Cox transformation with optimal lambda (before and after)

pedestrian_filtered <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")

pedestrian_filtered %>%
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(Count) +
  labs(title = "Southern Cross Pedestrians")

lambda_pedestrian <- pedestrian_filtered %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero) %>%
  first() 

pedestrian_filtered %>%
  mutate(BoxCox_Count = box_cox(Count, lambda_pedestrian)) %>%
  autoplot(BoxCox_Count) +
  labs(title = "Box-Cox Transformed Pedestrian Counts at Southern Cross Station",
       x = "Year", y = "Transformed Pedestrian Count") +
  theme_minimal()

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas <- tail(aus_production, 5 * 4) |> select(Gas)

# Plot
autoplot(gas) +
  labs(title = "Australian Gas Production (Last 5 Years)",
       x = "Year", y = "Gas Production") +
  theme_minimal()

## Plot variable not specified, automatically selected `.vars = Gas`

So there does seem to be a clear seasonal component that is repeating every year, with an upward trend element too. The change upward every year seems relatively the same in magnitude as well.

# classical decomposition
gas_decomp <- gas |> 
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components()

# Plot 
autoplot(gas_decomp) +
  labs(title = "Multiplicative Decomposition of Australian Gas Production",
       x = "Year") +
  theme_minimal()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

After decomposition, the trend component confirms a graduate increase in gas production over time. And the seasonal element shows the quarterly pattern seen in the first plot.

Now, to compute the seasonally adjusted data:

gas_sa <- gas_decomp |>
  mutate(Seasonally_Adjusted = Gas / seasonal)

# Plot
autoplot(gas_sa, Seasonally_Adjusted) +
  labs(title = "Seasonally Adjusted Australian Gas Production",
       x = "Year", y = "Seasonally Adjusted Gas Production") +
  theme_minimal()

The seasonally adjusted time series removes seasonal fluctuations, making it easier to observe underlying trends. The trend component suggests a gradual increase in gas production over time, with some short-term variations.

Now introducing the outlier component and recomputing:

gas_outlier <- gas |> 
  mutate(Gas = ifelse(row_number() == 10, Gas + 300, Gas))

gas_decomp_outlier <- gas_outlier |> 
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components()

gas_sa_outlier <- gas_decomp_outlier |> 
  mutate(Seasonally_Adjusted = Gas / seasonal)

# Plot
autoplot(gas_sa_outlier, Seasonally_Adjusted) +
  labs(title = "Seasonally Adjusted Gas Production (With Outlier in Middle)",
       x = "Year", y = "Seasonally Adjusted Gas Production") +
  theme_minimal()

Sharp spike in 2008 clearly reflects the artificially introduced outlier. Seasonal adjustment is significantly distorted. The outlier overestimates the true trend.

To test how the effect changes when the outlier is at the end of the time series:

gas_outlier_end <- gas |> 
  mutate(Gas = ifelse(row_number() == n(), Gas + 300, Gas)) 

gas_decomp_outlier_end <- gas_outlier_end |> 
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components()

gas_sa_outlier_end <- gas_decomp_outlier_end |> 
  mutate(Seasonally_Adjusted = Gas / seasonal)

# Plot
autoplot(gas_sa_outlier_end, Seasonally_Adjusted) +
  labs(title = "Seasonally Adjusted Gas Production (With Outlier at End)",
       x = "Year", y = "Seasonally Adjusted Gas Production") +
  theme_minimal()

Sharp spike at the last observation confirms that the outlier significantly impacts the trend estimation. This overestimates the true production levels. Previous values remain stable.

Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

x11_decomp <- myseries |> 
  model(X_11 = X_13ARIMA_SEATS(Turnover)) |> 
  components()

# Plot
autoplot(x11_decomp) +
  labs(title = "X-11 Decomposition of Retail Turnover",
       x = "Year") +
  theme_minimal()

Shows a gradual increase over time, indicating growth in retail turnover. There are no sharp discontinuities. Displays a consistent repeating pattern, confirming strong seasonality in retail sales. Peaks likely correspond to holiday shopping periods. A significant spike around 2000 stands out (anomaly? outlier? error? policy or industry change?)

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The decomposition of the Australian civilian labor force from February 1978 to August 1995 reveals several key patterns. The trend component shows a steady increase in labor force participation over time, with a noticeable dip around 1991–1992, suggesting the impact of the early 1990s recession. The seasonal component highlights consistent fluctuations, indicating recurring labor force patterns across months, with peaks and troughs that remain relatively stable over time. The remainder component, which captures irregular fluctuations, shows a significant negative deviation around 1991, further supporting the presence of economic disruption during the recession.

The second figure provides a detailed view of the seasonal component across different months, showing that March and December tend to have the highest seasonal increases, while September and January exhibit significant declines. The blue lines represent stable seasonal averages, while the black lines show how seasonal effects evolved over time.

Is the recession of 1991/1992 visible in the estimated components?

This decomposition confirms that the 1991–1992 recession had a measurable effect on labor force participation, particularly evident in the trend and remainder components.