library(fpp3)
library(zoo) #to convert Quarter to type quarter in Q3
library(USgas)
#library(sna) #components
library(seasonal) #x11

Background

The purpose of this assignment was to explore the Time Series Decomposition exercises from Forecasting: Principles and Practice.


3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

First, I calculate and plot the GDP per capita for all nations to observe general trends. Being that color-coding would not have worked for the number of nations under consideration, we simply used plot():

#calculate GDP per capita
global_economy$GDP_per_Capita <- global_economy$GDP / global_economy$Population
plot(x = global_economy$Year, y = global_economy$GDP_per_Capita, main = 'GDP per Capita') 

We see that there’s a general upward trend to the data, signaling that the global GDP per capita has risen from the 1960s thru the 2010s. We also observe that there appear to be tiers to the data that become a bit more pronounced in later decades. We may take it to mean that certain nations outperform the global average resulting in a higher GDP per capita with time.

To explore this question further and determine which nation has the highest GDP per capita, we filter for the nations in the top tier, using a GDP_per_Capita greater than 90000 USD as our filter:

global_economy %>%
  filter(GDP_per_Capita > 90000) %>%
  autoplot(GDP_per_Capita) +
  labs(title= "GDP per capita", y = "$US")

Monaco, a sovereign city-state of just 499 acres and approximately 35000 people, has the highest GDP. Whereas Liechtenstein is a very close second.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

  • United States GDP from global_economy.
  • Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
  • Victorian Electricity Demand from vic_elec.
  • Gas production from aus_production.

United States GDP:

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title= "United States GDP per Year", y = "$US")

#Square root transformation
global_economy %>%
  filter(Country == "United States") %>%
  autoplot(sqrt(GDP)) +
  labs(title= "United States GDP per Year", y = "$US")

Being that the data showed variation with the level of the series, I thought that transformation coul be useful. We see that, as expected, the quadratic trend was made linear by applying a square root transformation to the GDP field.


Slaughter of Victorian “Bulls, bullocks and steers”:

aus_livestock %>%
    filter(Animal == "Bulls, bullocks and steers") %>%
    group_by(Animal) %>%
    index_by(Month) %>%
    summarise(Count = sum(Count)) -> bbs

bbs %>% autoplot(Count) +
    labs(title= "Slaughter of Australian Bulls, bullocks and steers per Month", y = "Count")

Filtering for “Bulls, bullocks and steers”, grouping by Animal, and taking the sum of animals slaughtered for the given Month produced the plot above. There doesn’t appear to be an upward or downward trend but there does appear to be cyclicity to when bulls, bullocks, and steers are taken to slaughter. Being that the data did not show variation with the level of the series, I didn’t feel that transformation would be useful.


Victorian Electricity Demand:

vic_elec %>%
    index_by(Date) %>%
    summarise(Demand = sum(Demand)) -> dd

dd %>% autoplot(Demand) +
    labs(title= "Daily Electricity Demand for Victoria, Australia", y = "MW")

There does not appear to be a trend, but there appears to be clear seasonality to the data. I took the sum of demand for each date and plotted the result. Being that the data did not show variation with the level of the series, I didn’t feel that transformation would be useful. Decomposition could be valuable with this data set to provide insight regarding seasonality.


Gas production:

#Box Cox transformation

aus_production %>%
    autoplot(Gas) +
    labs(title= "Australian Quarterly Gas Production", y = "petajoules")

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

Being that the data showed variation with the level of the series, I felt that transformation would be useful. The text provided a Box Cox transformation example and so I extended it here.

We use the guererro feature to select the optimal lambda. The lambda which makes the size of the seasonal variation about the same across the whole series. We then plot the box cox transformation of our Gas data (provided our optimal lambda) and see that our seasonal variation is normalized.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
    autoplot()

Applying a box cox transformation to the canadian_gas dataset results in an output that is nearly the same as just plotting via autoplot().

Thus, the box cox transformation is not helpful for this data because the max() / min() value is relatively small and the variation does not increase with the level of the series. Not all data needs to be transformed and sometimes keeping existing scale is fine.

summary(canadian_gas$Volume)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.966   6.453   8.831   9.777  14.429  19.528

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(333)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
    autoplot(Turnover) +
    labs(title= "Australian Retail Turnover", y = "%")

#hist(myseries$Turnover)

Due to the small max(y) / min(y) value and the fact that the variation of values appears to be more random than following the levels, I don’t believe a Box Cox transformation would work with this dataset.

I explore further by using the guerrero feature to extract the optimal lambda though and plot the resulting box cox transformation of Turnover provided this lambda:

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed retail turnover with $\\lambda$ = ",
         round(lambda,2))))

The box cox transformation (as explained in the course text) appears to have normalized some of the seasonal variation within our dataset with regard to Turnover. While there is still a relatively noticeable variation to the data, I would eat my words here and say that the transformation was useful / successful.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production:

lambda_ap <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Tobacco, lambda_ap)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Australian Tobacco Production with $\\lambda$ = ",
         round(lambda_ap,2))))

The variance appears to be semi-stabile with a lambda of 0.93.


Economy class passengers between Melbourne and Sydney from ansett:

#filter for economy class between Melbourne and Sydney
ansett %>%
    filter(Class == 'Economy') %>%
    filter(Airports == 'MEL-SYD') -> economy_ms

lambda_econ <- economy_ms %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

economy_ms %>%
  autoplot(box_cox(Passengers, lambda_econ)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Economy passengers MEL-SYD with $\\lambda$ = ",
         round(lambda_econ,2))))

The variance appears to be semi-stabile with a lambda of 2 with notable drops / variation around 1987, late 1989, and early 1992.


Pedestrian counts at Southern Cross Station from pedestrian:

pedestrian %>%
    filter(Sensor == 'Southern Cross Station') -> sc_stn

lambda_sc <- sc_stn %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

sc_stn %>%
  autoplot(box_cox(Count, lambda_sc)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Pedestrian counts at Southern Cross Station with $\\lambda$ = ",
         round(lambda_sc,2))))

The variance appears to be stable, although the scales and data frequency make it difficult to read.

3.7

Consider the last five years of Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)

  1. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
  2. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
  3. Do the results support the graphical interpretation from part a?
  4. Compute and plot the seasonally adjusted data.
  5. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
  6. Does it make any difference if the outlier is near the end rather than in the middle of the time series?
#filter for the last 5 yrs of Gas data
gas <- tail(aus_production, 5*4) %>% select(Gas)

#plot the time series
gas %>%
    autoplot(Gas)

Considering the last 5 years of Australian gas production we observe major seasonal fluctuations and an upward trend.

Next, we apply classical decomposition:

gas %>% model(classical_decomposition(Gas, type = "mult")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Australian gas production")

These components confirm our earlier observations regarding an upward trend and clear seasonality.

Next, we compute and plot seasonally adjusted data:

# STL decomposition
dcmp <- gas %>%
  model(stl = STL(Gas))

#Compute and plot the seasonally adjusted data
components(dcmp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas production",
       title = "Australian Gas Production")

The gray line in the plot above shows our original autoplot() output while the blue line represents seasonally-adjusted data. Seasonal ajustment results in a clear “blunting” of what are otherwise significant crests and troughs.

#change one observation to be an outlier
gas2 <- gas
gas2$Gas[3] <- gas2$Gas[3] + 1000

#recompute the seasonally adjusted data

# STL decomposition
dcmp <- gas2 %>%
  model(stl = STL(Gas))

#Compute and plot the seasonally adjusted data
components(dcmp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas production",
       title = "Australian Gas Production")

The outlier was very influential. It changed the level and shape of the seasonally adjusted data plot. The average Gas production level was raised and a major (early) peak was created by this one outlier.

Being that this observation was near the start, I’ll now observe the impact of this outlier near the end:

#change one observation to be an outlier
gas3 <- gas
gas3$Gas[18] <- gas3$Gas[18] + 1000

#recompute the seasonally adjusted data

# STL decomposition
dcmp <- gas3 %>%
  model(stl = STL(Gas))

#Compute and plot the seasonally adjusted data
components(dcmp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas production",
       title = "Australian Gas Production")

It doesn’t appear to make a difference whether the outlier is near the middle, end, or beginning of the time series. If the outlier is present, especially at the magnitude I elected (+ 1000) then it’s presence will be felt and the seasonally adjusted plot will be altered. The addition of a dramatic outlier appears to have made the original autoplot() smoother than the seasonally adjusted plot.

From this we might extend, that seasonally adjusted plots are sensitive to outliers.

3.8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(333)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Australian Retail Turnover using X-11.")

We observe an outlier / unusual feature ~1987 with a significant spike in the irregular component. This component was not a trend and it was not seasonal, thus it was an irregularity that only occurred one time.

The cause of this outlier event would be interesting to explore …

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

  1. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
  2. Is the recession of 1991/1992 visible in the estimated components?

In the value we observe a general upward trend with some variation that we really can’t reach much into based on the plot. We see that the figures range from less than 7000 to less than 9000 from before January of 1980 through January of 1995.

When we then consider the components of the decomposition, we observe:

  • There’s a clear upward trend in the trend component.
  • In the season-year component we observe numerous peaks and valleys. There’s an uptick to start every year followed by a sharp decline, a second major uptick with a more gradual decline into a steeper decline, a smaller uptick, a small uptick, and finally the start of the next year. This pattern appears to repeat seasonally.
  • The remainder component captures relatively minor employment figures, hovering between -100 and 100 (in comparison to 7000 to 9000 in values), with a more major drop off in employment figures noted by two clearly visible downward spikes / outliers that reach -400.

Yes, the recession is visible in the estimated components (as described in the final point above).