Time series Decomposition

Chapter notes:

Trend-cycle component(trend for simplicity): Decomposing a time series into a single components.

Transformations and adjustments:

Transformations and adjustments are important before decomposing time series to make the decomposition as simple as possible. This will simplify the patterns in the data or by making the pattern more consistent across the dataset.

Types of adjustments:

Calendar adjustments: Removing the variation in seasonal data.
Population adjustments: Adjuting the data that is affected by population changes. It is best to use per-capita data rather than the total population.
Inflation adjustments: Adjusting the data that is affected by the value of money.
Mathematical transformations: Transform the data with variation that increases or decreases with the level of the series.

Time series components/Classical decomposition

Additive decomposition: The magnitude of the seasonal fluctuations, or the variation around the trend-cycle, does not vary with the level of the time series. The seasonal component is constant from year to year.
Multiplicative decomposition: Transform the data until the variation in the series appears to be stable over time. The

values that form the seasonal component are sometimes called the “seasonal indices”

Moving average smoothing

Smoothing Moving average techniques reduce the volatility in a data series. This allow us to identify important trends. The moving average method offers a simple way to smooth data. However, it may obscure the latest changes in the trend because it utilizes data from past time periods.

Methods used by official statistics agencies

X-11 method
SEATS method

STL decomposition

Seasonal and Trend decomposition using Loess (STL) is a versatile and robust method for decomposing time series.

STL has several advantages over classical decomposition and the SEATS and X-11 methods.

Chapter Exercises

Time Series Decomposition exercises from Forecasting: Principles and Practice (3rd Ed.)

https://otexts.com/fpp3/graphics-exercises.html

#Libraries required for the exercises
library(tsibble)
library(dplyr)
library(ggplot2)
library(tsibbledata)
library(feasts)
library(zoo)
library(fpp3)
library(seasonal)
library(imager)

Question 1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy %>%
  tsibble(key = Code, index = Year)%>%
  autoplot(GDP/Population, show.legend =  FALSE) +
  labs(title= "GDP per capita", y = "$US")

## Warning: Removed 3242 row(s) containing missing values (geom_path).

In order to know which country has the highest GDP per capita, we will filter the GDP_per_Capita that is greater than 1000000 USD.

#Filter for the nations in the top tier, using a GDP_per_Capita greater than 100000 USD.

global_economy %>%
  tsibble(key = Code, index = Year)%>%
  filter(GDP/Population > 100000) %>%
  autoplot(GDP/Population)+
  labs(title= "GDP per capita", y = "$US")

Monaco has the highest GDP per-Capita followed by Liechtenstein.

Question 2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

Answer

- United States GDP from global_economy.

#Filter Country to United States
global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title= "United States GDP per Year", y = "$US")

- Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria")%>%
  autoplot(Count)+
  labs(title = "Slaughter of Victorian “Bulls, bullocks and steers”")+
  theme_replace()

- Victorian Electricity Demand from vic_elec.

vic_elec %>% autoplot(Demand) +
    labs(title= "Daily Electricity Demand for Victoria, Australia", y = "MW")

- Gas production from aus_production.

aus_production %>%
  autoplot(Gas)+
  labs(title = "Gas production")

Question 3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

#Autoplot without box-cox
canadian_gas %>%
  autoplot(Volume)

##Autoplot with box-cox

lambda_cangas <- canadian_gas %>%
                  features(Volume, features = guerrero) %>%
                  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda = lambda_cangas))

Looking at the first autoplot with out box-cox and the second autoplot with box-cox, we notice that results is almost the same. Therefore, the box cox transformation is not helpful for this data because the minimum and the maximum values are fairly small and the variation does not increase with the level of the series.

Question 4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

#Autoplot without box-cox

set.seed(12272018)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries, Turnover)

#Autoplot with box-cox
lambda <- myseries %>%
                  features(Turnover, features = guerrero) %>%
                  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda))

Similar to the previous example, the minimum and the maximum values are fairly small. Using the guerrero feature to extract the optimal lambda and plot the resulting box cox transformation of Turnover

It looks like that box cox transformation have normalized some of the seasonal variation in the dataset.

Question 5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production:

lambda_tobacco <- aus_production %>%
                   features(Tobacco, features = guerrero) %>%
                   pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Tobacco, lambda_tobacco))

## Warning: Removed 24 row(s) containing missing values (geom_path).

Economy class passengers between Melbourne and Sydney from ansett:

lambda_class <- ansett %>%
                 filter(Class == "Economy",
                        Airports == "MEL-SYD")%>%
                 features(Passengers, features = guerrero) %>%
                 pull(lambda_guerrero)
ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")%>%
  mutate(Passengers = Passengers/1000) %>%
  autoplot(box_cox(Passengers, lambda = lambda_class))

Pedestrian counts at Southern Cross Station from pedestrian:

lambda_count <- pedestrian %>%
                filter(Sensor == "Southern Cross Station") %>%
                 features(Count, features = guerrero) %>%
                 pull(lambda_guerrero)
pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(box_cox(Count,lambda_count))

Question 7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

#Plot the time series
gas %>%
    autoplot(Gas)

There is a major seasonal fluctuations and an upward trend.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>% model(classical_decomposition(Gas, type = "mult")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition")

## Warning: Removed 2 row(s) containing missing values (geom_path).

Do the results support the graphical interpretation from part a?

Yes, it does confirm the earlier observations regarding an upward trend and clear seasonality.

d.Compute and plot the seasonally adjusted data.

# STL decomposition
Decomp <- gas %>%
  model(stl = STL(Gas))

#Compute and plot the seasonally adjusted data
components(Decomp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas production",
       title = "Australian Gas Production")

The gray line shows the original autoplot() and the blue line represents seasonally-adjusted data.

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

#change one observation to be an outlier
GAS1 <- gas
GAS1$Gas[5] <- GAS1$Gas[5] + 300

Recompute the seasonally adjusted data

# STL decomposition
Decomp1 <- GAS1 %>%
  model(stl = STL(Gas))

#Compute and plot the seasonally adjusted data
components(Decomp1) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas production",
       title = "Australian Gas Production")

The outlier was very influential. It changed the level and shape of the seasonally adjusted data plot. The average Gas production level was raised and a major (early) peak was created by this one outlier.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

#change one observation to be an outlier  near the end:
GAS2 <- gas
GAS2$Gas[17] <- GAS2$Gas[17] + 300

# STL decomposition
Decomp2 <- GAS2 %>%
  model(stl = STL(Gas))

#Compute and plot the seasonally adjusted data
components(Decomp2) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas production",
       title = "Australian Gas Production")

I don’t see any difference whether the outlier is in beginning or at the end of the time series. If the outlier is present, it will be noticable and the seasonally adjusted plot will be altered. We conclude that seasonally adjusted plots are sensitive to outliers.

Question 8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(555)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Australian Retail Turnover using X-11.")

We observe an outlier / unusual feature ~1987 with a significant spike in the irregular component. This component was not a trend and it was not seasonal, thus it was an irregularity that only occurred one time.

Question 9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

image <- load.image("C:/Users/gigig/OneDrive/Desktop/DATA 624/labour1.png")
plot(image)

image2 <- load.image("C:/Users/gigig/OneDrive/Desktop/DATA 624/labour2.png")
plot(image2)

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

First, we see that the figures range from 6500 to less than 9000 from before January of 1980 through January of 1995.

The number of people working in the civilian sector in Australia has risen steadily over time. There is a trend in the trend component. There were significant recessions in 1991 and 1992. The monthly breakdown of the seasonal component shows that a few months show greater velocities in their variations than other months.

b. Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession is visible and it is obvious that the data has been seasonally adjusted.