Time Series Decomposition

Required Libraries

library(fpp3)
library(gridExtra)

Problem 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

The country with the highest GDP is Monaco, followed by Liechtenstein and Luxembourg. Since the start of the year 2000, Monaco and Liechtenstein have become closer in comparison in terms of GDP per capita, and since 2010, the gap between them and the rest of the countries has widened.

global_economy |> 
  mutate(gdp_per_capita = GDP / Population) |>
  ggplot(aes(x=Year, y=gdp_per_capita, colour=Country, label=Country)) +
  geom_line() +
  geom_text(data = . %>% 
               group_by(Country) %>%
               filter(gdp_per_capita == max(gdp_per_capita, na.rm = TRUE)) %>%
               arrange(desc(gdp_per_capita)) %>%
               head(3), 
             aes(label = Country),
            vjust="bottom") + 
  theme(legend.position="none") +
  labs(title = "Countries' GDP per Capita",
       y = "GDP per Capita")

Country	gdp_per_capita	Year
Monaco	185152.53	2014
Liechtenstein	179308.08	2014
Luxembourg	119225.38	2014
Norway	103059.25	2013
Macao SAR, China	94004.39	2014
Bermuda	93605.75	2008
San Marino	90682.58	2008
Isle of Man	89941.64	2014
Qatar	88564.82	2012
Switzerland	88415.63	2011

Problem 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy |>
  filter(Country == "United States") |>
  autoplot(GDP) +
  labs(title = "United States Yearly GDP")

We observe an exponential curve as the years progress. A transformation is necessary where we can align this growth more closely with the population. A GDP per Capita measure will be established. We can see the curve smoothes out more compared to the drastic change we were seeing earlier.

  global_economy |>
  filter(Country == "United States") |>
  mutate(gdp_per_capita = GDP / Population) |>
  autoplot(gdp_per_capita) +
  labs(y = "",
       title = "United States GDP per Capita")

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock |>
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") |>
  autoplot(Count) +
  labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers")

We can observe some larger variations across the seasons in this time series, especially in the early years. Initially, a Box-Cox transformation could be considered, but these variations did not persist within the seasons.

Victorian Electricity Demand from vic_elec.

vic_elec |>
  autoplot(Demand) +
  labs(title = "Victorian Electricity Demand",
       y = 'MWh')

The graph illustrates the seasonal trends between the months, with significant increases in demand during the early months followed by a smoother rise around mid-year. However, since we cannot discern what might be occurring within these months, and considering that we are plotting in thirty-minute intervals, it would be preferable to scale it up to a daily basis.

vic_elec |>
  group_by_key() |>
  index_by(Date = ~ as_date(.)) |>
  summarise(daily_demand = sum(Demand)) |>
  autoplot(daily_demand) +
  labs(title = "Victorian Electricity Demand",
       y = 'MWh')

Gas production from aus_production.

aus_production |>
  autoplot(Gas) +
  labs(title = "Gas Manufacturing Production",
       y = 'Petajoules')

We can observe a significant variation from the earlier years, suggesting that a Box-Cox transformation would be appropriate to use here. Upon applying it, we notice the seasonal variation becoming more consistent across the time series.

lambda <- 
  aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)

aus_production |>
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(
        paste0("Transformed Gas Manufacturing Production with $\\lambda$ = ",
        round(lambda,2))))

Problem 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

We can see that the Box-Cox transformation did not help with the larger seasonal variation between 1975-1990.

plot_1 <-
  canadian_gas |>
  autoplot(Volume)+
  labs(title = 'Canadian Gas')

lambda <- 
  canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)

plot_2 <-
  canadian_gas |>
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Gas with $\\lambda$ = ",
         round(lambda,2))))

grid.arrange(plot_1, plot_2, ncol=2)

Problem 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(42)

retail <- 
  aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

retail |>
  autoplot(Turnover) +
  labs(title = "Retail Trade Turnover", 
       subtitle = "Western Australia Newspaper and Book",
       y = "Million (AUD $)")

lambda <- 
  retail |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)
  
retail |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Retail Trade Turnover with $\\lambda$ = ",
         round(lambda,2))))

Problem 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.

Tobacco from aus_production

aus_production |>
  autoplot(Tobacco) +
  labs(title = "Tobacco Production")

lambda <- 
  aus_production |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)
  
aus_production |>
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Tobacco Production with $\\lambda$ = ",
         round(lambda,2))))

Economy class passengers between Melbourne and Sydney from ansett

ansett |>
  filter(Class == "Economy",
         Airports == "MEL-SYD") |>
  autoplot(Passengers) +
  labs(title = "MEL-SYD Passenger Counts")

lambda <- 
  ansett |>
  filter(Class == "Economy",
         Airports == "MEL-SYD") |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)
  
ansett |>
  filter(Class == "Economy",
         Airports == "MEL-SYD") |>
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed MEL-SYD Passenger Counts with $\\lambda$ = ",
         round(lambda,2))))

Pedestrian counts at Southern Cross Station from pedestrian.

The original hourly time period makes this graph difficult to interpret the variance issues. We will look at this at a daily level and then transform it to see the differeces.

pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  autoplot(Count) +
  labs(title = "Southern Cross Station Pedestrian Counts")

pedestrian_daily <-
  pedestrian |>
    filter(Sensor == "Southern Cross Station") |>
    group_by_key() |>
    index_by(Date = ~ as_date(.)) |>
    summarise(daily_count = sum(Count)) 

pedestrian_daily |>
  autoplot(daily_count) +
  labs(title = "Southern Cross Station Daily Pedestrians")

lambda <- 
  pedestrian_daily |>
  filter(Sensor == "Southern Cross Station") |>
  features(daily_count, features = guerrero) |>
  pull(lambda_guerrero)
  
pedestrian_daily |>
  filter(Sensor == "Southern Cross Station") |>
  autoplot(box_cox(daily_count, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Southern Cross Station Daily Pedestrians with $\\lambda$ = ",
         round(lambda,2))))

Problem 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)
knitr::kable(head(gas))

Gas	Quarter
221	2005 Q3
180	2005 Q4
171	2006 Q1
224	2006 Q2
233	2006 Q3
192	2006 Q4

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

We see every six months there are peaks during Q3 and troughs during Q1. As for a trend-cycle, there’s a small positive increase as the time series progresses.

gas |>
  autoplot(Gas)

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  autoplot() +
  labs(title = "Classical multiplicative decomposition of gas production")

Do the results support the graphical interpretation from part a?

We see the same results when we decompose the trend-cycle and seasonal components.

Compute and plot the seasonally adjusted data.

dcmp <-
  gas |>
    model(
      classical_decomposition(Gas, type = "multiplicative")) |>
  components(dcmp) |>
  as_tsibble()

dcmp |>
  autoplot(season_adjust) +
  labs(
    y = "",
    title = "Seasonally Adjusted Gas Production"
  )

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

The outlier changes other observations to be about -10 to +3 more different when the last value is an outlier.

gas_outlier_last <- 
  gas

gas_outlier_last[nrow(gas_outlier_last), 1] <- gas_outlier_last[nrow(gas_outlier_last), 1] + 500

dcmp_outlier_last <-
  gas_outlier_last |>
  model(
      classical_decomposition(Gas, type = "multiplicative")) |>
  components(dcmp) |>
  as_tsibble() 

season_adjust_diff <- dcmp['season_adjust'] - dcmp_outlier_last['season_adjust']

knitr::kable(season_adjust_diff, align='l')

season_adjust
2.470092
-8.605846
2.458013
2.624169
2.604215
-9.179569
2.688002
2.741319
2.738337
-9.801102
2.788623
2.682744
2.783045
-9.705482
2.817372
2.788180
2.816575
-10.040154
2.946741
-456.939722

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

There’s not much of a difference where the outlier is located

gas_outlier_middle <- 
  gas

gas_outlier_middle[nrow(gas_outlier_middle), 1] <- gas_outlier_middle[nrow(gas_outlier_middle) / 2, 1] + 500

dcmp_outlier_middle <-
  gas_outlier_middle |>
  model(
      classical_decomposition(Gas, type = "multiplicative")) |>
  components(dcmp) |>
  as_tsibble() 

season_adjust_diff_2 <- dcmp['season_adjust'] - dcmp_outlier_middle['season_adjust']

knitr::kable(list(season_adjust_diff, season_adjust_diff_2), align='l')

season_adjust
2.470092
-8.605846
2.458013
2.624169
2.604215
-9.179569
2.688002
2.741319
2.738337
-9.801102
2.788623
2.682744
2.783045
-9.705482
2.817372
2.788180
2.816575
-10.040154
2.946741
-456.939722

season_adjust
2.348490
-8.159074
2.337006
2.494982
2.476010
-8.703012
2.555673
2.606365
2.603530
-9.292278
2.651340
2.550673
2.646036
-9.201622
2.678673
2.650918
2.677916
-9.518919
2.801674
-428.844638

Problem 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

We noticed a positive trend from 1975-2005 and some possible yearly seasonal trends. Running x11 we see that the trend went further out to 2007. We also see the seasonal trend shrank in the middle of the time series and then grew to larger variances than before. As for the irregular series, we also notice a few outliers not noticeable from 2010-2013.

set.seed(42)

x11_dcmp <-
  aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()

autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Western AUS Turnover using X-11.")

Problem 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Figure 3.20: Seasonal component from the decomposition shown in the previous figure.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

We can see a strong, positive trend throughout the series. This accounts for a major part of the gains throughout the time series. The seasonal fluctuations are about -100 to 100. The major change is the drop that was around 1991-1992. It was removed as irregular patterns and not included in the trend model.

Is the recession of 1991/1992 visible in the estimated components?

We can see this recession only in the irregular graph and was removed from the trend series.