Homework2

library(tidyverse)
library(forecast)
library(fpp3)
library(tsibbledata)
library(seasonal)

3.1

global_economy <- global_economy %>%
  mutate(GDP_per_capita = GDP / Population)

ggplot(global_economy, aes(x = Year, y = GDP_per_capita, color = Country, group = Country)) +
  geom_line() +
  labs(
    title = "GDP per Capita Over Time by Country",
    x = "Year",
    y = "GDP per Capita"
  ) +
  theme_minimal() +
  theme(legend.position = "none")  # Hide legend if too many countries

highest_gdp_per_capita <- global_economy %>%
  index_by(Year) %>%
  slice_max(order_by = GDP_per_capita, n = 1)

ggplot(highest_gdp_per_capita, aes(x = Year, y = GDP_per_capita, label = Country)) +
  geom_line() +
  geom_point() +
  geom_text(nudge_y = 5000, check_overlap = TRUE) +
  labs(
    title = "Country with the Highest GDP per Capita Over Time",
    x = "Year",
    y = "GDP per Capita"
  ) +
  theme_minimal()

Monaco has consistently been recognized as the country with the highest GDP per capita over time. However, in the most recent ranking, Luxembourg has emerged as the nation with the highest GDP per capita.

3.2

United States GDP from global_economy

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title = "United States GDP Over Time",
       x = "Year", y = "GDP (US dollars)")

Apply log transformation to stabilize the variance and show proportional changes more clearly.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(log(GDP)) +
  labs(title = "Log-Transformed United States GDP Over Time",
       x = "Year", y = "Log of GDP")

Slaughter of Victorian “Bulls, bullocks and steers” from aus_livestock

aus_livestock %>%
  filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
  autoplot(Count) +
  labs(title = "Slaughter of Bulls, Bullocks, and Steers in Victoria",
       x = "Year", y = "Number Slaughtered")

Victorian Electricity Demand from vic_elec

vic_elec %>%
  autoplot(Demand) +
  labs(title = "Victorian Electricity Demand Over Time",
       x = "Time", y = "Electricity Demand (MW)")

Gas Production from aus_production

aus_production %>%
  autoplot(Gas) +
  labs(title = "Australian Gas Production Over Time",
       x = "Year", y = "Gas Production")

Apply log transformation to stabilize the variance and show proportional changes more clearly.

aus_production %>%
  autoplot(log(Gas)) +
  labs(title = "Log-Transformed Australian Gas Production",
       x = "Year", y = "Log of Gas Production")

3.3

A Box-Cox transformation is typically used to stabilize variance and make a time series more normally distributed.

However, for Canadian gas production (canadian_gas), applying a Box-Cox transformation is unhelpful for the following reason:

The Data is Strongly Seasonal

canadian_gas %>%
  gg_season(Volume) +
  labs(title = "Seasonal Plot of Canadian Gas Production")

Each year’s pattern aligns closely, it confirms strong seasonal effects.

canadian_gas %>%
  model(STL(Volume ~ season(window = "periodic"))) %>%
  components() %>%
  autoplot()

There is strong seasonality in the data, the seasonal plot show clear and consistent peaks and valleys that repeat over time.

3.4

set.seed(1225)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot(Turnover)

lambda <- BoxCox.lambda(myseries$Turnover)
lambda

## [1] -0.231131

When Lambda is close than zero, a log transformation is recommended.

myseries %>%
  autoplot(log(Turnover))

3.5

Tobacco from aus_production

aus_production %>%
  autoplot(Tobacco) +
  labs(title = "Tobacco from aus_production")

tobacco_series <- aus_production$Tobacco
lambda_tobacco <- BoxCox.lambda(tobacco_series)
lambda_tobacco

## [1] 0.7099451

When Lambda is greater than 0 and less than 1, it is recommended to use a power transformation; in this case, I will utilize the square root transformation.

tobacco_series <- na.omit(tobacco_series)
tobacco_sqrt <- tobacco_series^(1/2)
tobacco_sqrt <- na.omit(tobacco_sqrt)

par(mfrow = c(1, 2))
plot(tobacco_series, main = "Original Tobacco Production", type = "o")
plot(tobacco_sqrt, main = "Square Root Transformed Tobacco", type = "o")

var(tobacco_series)

## [1] 1140807

var(tobacco_sqrt)

## [1] 45.71294

sd(tobacco_series)

## [1] 1068.086

sd(tobacco_sqrt)

## [1] 6.761135

The square root transformation proved to be very effective, as demonstrated by the results.

Economy class passengers between Melbourne and Sydney on ansett

passengers <- ansett %>% filter(Airports == "MEL-SYD" & Class == "Economy")

pedestrian_series <- pedestrian$Count + 1

passengers %>%
  autoplot(Passengers) +
  labs(title = "Economy class passengers between Melbourne and Sydney on ansett")

passengers_series <- passengers$Passengers
summary(passengers_series)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0   19694   21524   21509   24049   32468

passengers_series_shifted <- passengers_series + 1

lambda_passengers <- BoxCox.lambda(passengers_series_shifted)
lambda_passengers

## [1] 1.999953

When Lambda is almost equal to 2, it is recommended to use a power transformation. However, since this dataset has extreme values, I use log transformation to achieve better effects.

passengers_series_shifted <- na.omit(passengers_series_shifted)
passengers_log <- log(passengers_series_shifted)

par(mfrow = c(1, 2))
plot(passengers_series_shifted, main = "Original Passengers Count", type = "o")
plot(passengers_log, main = "Log Transformed Passengers", type = "o")

var(passengers_series_shifted)

## [1] 26982184

var(passengers_log)

## [1] 2.458385

sd(passengers_series_shifted)

## [1] 5194.438

sd(passengers_log)

## [1] 1.567924

The Log transformation proved to be very effective, as demonstrated by the results.

Pedestrian counts at Southern Cross Station from pedestrian

pedestrian_selected <- pedestrian %>% filter(Sensor == "Southern Cross Station")
summary(pedestrian_selected)

##     Sensor            Date_Time                           Date           
##  Length:17539       Min.   :2015-01-01 00:00:00.00   Min.   :2015-01-01  
##  Class :character   1st Qu.:2015-07-02 16:30:00.00   1st Qu.:2015-07-02  
##  Mode  :character   Median :2016-01-01 10:00:00.00   Median :2016-01-01  
##                     Mean   :2016-01-01 11:24:48.93   Mean   :2015-12-31  
##                     3rd Qu.:2016-07-02 05:30:00.00   3rd Qu.:2016-07-02  
##                     Max.   :2016-12-31 23:00:00.00   Max.   :2016-12-31  
##       Time          Count       
##  Min.   : 0.0   Min.   :   0.0  
##  1st Qu.: 6.0   1st Qu.:  36.0  
##  Median :12.0   Median : 124.0  
##  Mean   :11.5   Mean   : 495.9  
##  3rd Qu.:18.0   3rd Qu.: 696.0  
##  Max.   :23.0   Max.   :3743.0

pedestrian_series <- pedestrian_selected$Count + 1

pedestrian_selected %>%
  autoplot(Count) +
  labs(title = "Pedestrian counts at Southern Cross Station")

lambda_pedestrian <- BoxCox.lambda(pedestrian_series)
lambda_pedestrian

## [1] 0.03624172

When Lambda is approximately, it is recommended to use log transformation.

pedestrian_log <- log(pedestrian_series)

par(mfrow = c(1, 2))
plot(pedestrian_series, main = "Original Pedestrian Count", type = "o")
plot(pedestrian_log, main = "Log Transformed Pedestrian", type = "o")

var(pedestrian_series)

## [1] 512490.2

var(pedestrian_log)

## [1] 3.867562

sd(pedestrian_series)

## [1] 715.8842

sd(pedestrian_log)

## [1] 1.966612

The Log transformation proved to be very effective, as demonstrated by the results.

3.7

gas <- tail(aus_production, 5*4) |> select(Gas)

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>% autoplot(Gas)

gas %>% gg_season(Gas)

I have identified seasonal fluctuations where gas production increases from Quarter 1 to Quarter 3 and then decreases. Additionally, gas production also rises when comparing the same quarters of previous years.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_ts <- ts(gas$Gas, frequency = 4)

gas_decomposed <- decompose(gas_ts, type = "multiplicative")

trend_cycle <- gas_decomposed$trend
trend_cycle

##      Qtr1    Qtr2    Qtr3    Qtr4
## 1      NA      NA 200.500 203.500
## 2 207.000 210.250 213.000 216.125
## 3 218.625 218.875 218.750 219.000
## 4 219.000 220.375 221.875 223.125
## 5 225.125 226.000      NA      NA

seasonal_indices <- gas_decomposed$seasonal
seasonal_indices

##        Qtr1      Qtr2      Qtr3      Qtr4
## 1 1.1256812 0.9250656 0.8752824 1.0739708
## 2 1.1256812 0.9250656 0.8752824 1.0739708
## 3 1.1256812 0.9250656 0.8752824 1.0739708
## 4 1.1256812 0.9250656 0.8752824 1.0739708
## 5 1.1256812 0.9250656 0.8752824 1.0739708

c. Do the results support the graphical interpretation from part a?

The results confirm a seasonal trend in gas production, with seasonal indices consistently repeating each year, indicating a quarterly pattern.

d. Compute and plot the seasonally adjusted data.

seasonally_adjusted <- gas_ts / gas_decomposed$seasonal

gas_tibble <- as_tsibble(gas_ts) |> mutate(Seasonally_Adjusted = seasonally_adjusted)

autoplot(gas_ts, series = "Original Data") +
  autolayer(seasonally_adjusted, series = "Seasonally Adjusted", color = "darkblue") +
  labs(title = "Seasonally Adjusted Gas Data", y = "Gas Production") +
  theme_minimal()

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_ts_outlier <- gas_ts
gas_ts_outlier[10] <- gas_ts_outlier[10] + 300

gas_decomposed_outlier <- decompose(gas_ts_outlier, type = "multiplicative")

seasonally_adjusted_outlier <- gas_ts_outlier / gas_decomposed_outlier$seasonal

p1 <- autoplot(gas_ts, series = "Original Data") +
      autolayer(seasonally_adjusted, series = "Seasonally Adjusted", color = "darkblue") +
      labs(title = "Before Outlier", y = "Gas Production") +
      theme_minimal() +
      theme(legend.position = "bottom")

p2 <- autoplot(gas_ts_outlier, series = "Original Data with Outlier") +
      autolayer(seasonally_adjusted_outlier, series = "Seasonally Adjusted", color = "darkgreen") +
      labs(title = "After Outlier", y = "Gas Production") +
      theme_minimal() +
      theme(legend.position = "bottom")

gridExtra::grid.arrange(p1, p2, ncol = 2)

Introducing an outlier can distort the trend by pulling it upward, resulting in inaccurate long-term trend estimates. Additionally, the multiplicative decomposition method assumes stable seasonality; the presence of an outlier can inflate the seasonal component, rendering it unreliable. Classical decomposition is sensitive to outliers, and a single extreme value can skew the entire analysis.

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_ts_outlier2 <- gas_ts
gas_ts_outlier2[20] <- gas_ts_outlier2[20] + 300

gas_decomposed_outlier2 <- decompose(gas_ts_outlier2, type = "multiplicative")

seasonally_adjusted_outlier2 <- gas_ts_outlier2 / gas_decomposed_outlier2$seasonal

p3 <- autoplot(gas_ts, series = "Original Data") +
      autolayer(seasonally_adjusted, series = "Seasonally Adjusted", color = "darkblue") +
      labs(title = "Before Outlier", y = "Gas Production") +
      theme_minimal() +
      theme(legend.position = "bottom")

p4 <- autoplot(gas_ts_outlier2, series = "Original Data with Outlier") +
      autolayer(seasonally_adjusted_outlier2, series = "Seasonally Adjusted", color = "darkgreen") +
      labs(title = "After Outlier", y = "Gas Production") +
      theme_minimal() +
      theme(legend.position = "bottom")

gridExtra::grid.arrange(p3, p4, ncol = 2)

The trend-cycle component at the end is significantly distorted due to the limited number of data points available for smoothing. Classical decomposition relies on historical data to establish the trend, so an outlier at the end of the series can heavily skew future trend projections, rendering them unreliable. If the last data point is an outlier, the trend-cycle may overreact, leading to unrealistic predictions of a sudden increase.

3.8

myseries_ts <- ts(myseries$Turnover, start = c(year(min(myseries$Month)), month(min(myseries$Month))), frequency = 12)

x11_decomp <- seas(myseries_ts, x11 = "")

trend <- trendcycle(x11_decomp)
seasonal <- seasonal(x11_decomp)
irregular <- remainder(x11_decomp)

autoplot(myseries_ts, series = "Original Data") +
  autolayer(trend, series = "Trend") +
  autolayer(seasonal, series = "Seasonal") +
  autolayer(irregular, series = "Irregular") +
  labs(title = "X-11 Decomposition of Retail Series", y = "Turnover") +
  theme_minimal() +
  theme(legend.position = "bottom")

autoplot(irregular) +
  labs(title = "Irregular Component (Checking for Outliers)", y = "Irregular Component") +
  theme_minimal()

The plot displays several positive and negative spikes, which indicate sudden deviations from the overall pattern. These sharp spikes represent unexpected increases or decreases in retail sales. The early years show more extreme fluctuations, suggesting the presence of either significant events or outliers in the data. In contrast, the recent years appear more stable, indicating that retail trends have become more predictable.

3.9

The decomposition results show clear trends, seasonal patterns, and irregular elements in the data. The trend indicates a significant drop around 1991/1992, reflecting the economic downturn from the recession. The seasonal element remains steady, suggesting consistent periodic changes that were not impacted by the recession. Meanwhile, the irregular element shows more variability during the recession, indicating greater economic uncertainty. The graphs highlight the noticeable trend drop and irregular changes during the 1991/1992 recession.