Homework 2 - Time Series Decomposition

library(fpp3)
library(tidyverse)
library(cowplot)

Exercises 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Solution 3.1

length(unique(global_economy$Country))

## [1] 263

There are 263 Countries in this dataset.

global_economy$GDP_Per_Capita <- global_economy$GDP / global_economy$Population
global_economy %>%
  autoplot(GDP / Population, show.legend =  FALSE) +
  labs(title= "GDP per capita", y = "$US")

## Warning: Removed 3242 rows containing missing values (`geom_line()`).

The country with he highest GDP per capita is Monaco. This was achieved in 2014. The GDP per capita has an increasing trend when compare with majority of the countries.

global_economy %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
  select(Country, GDP_per_capita)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country GDP_per_capita  Year
##   <fct>            <dbl> <dbl>
## 1 Monaco         185153.  2014

global_economy %>%
  filter(Country == "Monaco") %>%
  autoplot(.vars = GDP_Per_Capita) +
  labs(title = "Monaco GDP per Capita")

## Warning: Removed 11 rows containing missing values (`geom_line()`).

Exercise 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

United States GDP from `global_economy`.

After i transformed the data to look at GDP based on the population (GDP per capita) for the US. This adjustment did not change the overall shape of the graph, just the y scaling. It didn’t have any effect on the GDP.

Before Transformation

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(.vars = GDP) +
  labs(title = "United States GDP")

After Transformation

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(.vars = GDP_Per_Capita) +
  labs(title = "United States GDP")

Slaughter of Victorian “Bulls, bullocks and steers” in `aus_livestock`.

A zig-zag downward trend is noticed in the graph.

aus_livestock %>%
  filter(State == "Victoria",
         Animal == "Bulls, bullocks and steers") %>%
  autoplot(.vars = Count) +
  labs(title = "Slaughter of Victorian Bulls, bullocks and steers")

Victorian Electricity Demand from `vic_elec`.

The data was transformed such to reflect daily, weekly and monthly electricity demand instead of half-hourly demand. It is somehow difficult to discern what is occuring in the non tranformed data. The aggregate transformations make it easier to see the underlying seasonality. There appears to be an increase in electricity in the winter and summer months / weeks. Summer is when electricity demand is at its highest.

Before Transformation

vic_elec %>%
  autoplot(.vars = Demand)

After Transformation

daily <- vic_elec %>%
  group_by(Date) %>%
  mutate(Daily_Demand = sum(Demand)) %>%
  distinct(Date, Daily_Demand)

weekly <- vic_elec %>%
  group_by(Date) %>%
  mutate(Daily_Demand = sum(Demand)) %>%
  distinct(Date, Daily_Demand) %>%
  mutate(week = yearweek(Date)) %>%
  group_by(week) %>%
  mutate(Weekly_Demand = sum(Daily_Demand)) %>%
  distinct(week, Weekly_Demand)


monthly <- vic_elec %>%
  group_by(Date) %>%
  mutate(Daily_Demand = sum(Demand)) %>%
  distinct(Date, Daily_Demand) %>%
  mutate(month = yearmonth(Date)) %>%
  group_by(month) %>%
  mutate(Monthly_Demand = sum(Daily_Demand)) %>%
  distinct(month, Monthly_Demand)

p1 <- daily %>% 
  as_tsibble(index = Date) %>%
  autoplot(Daily_Demand) +
  labs(title = "Daily Victorian Electricity Demand")

p2 <- weekly %>% 
  as_tsibble(index = week) %>%
  autoplot(Weekly_Demand) +
  labs(title = "Weekly Victorian Electricity Demand")

p3 <- monthly %>% 
  as_tsibble(index = month) %>%
  autoplot(Monthly_Demand) +
  labs(title = "Monthly Victorian Electricity Demand")

plot_grid(p1, p2, p3,
          nrow = 2, ncol = 2)

Gas production from `aus_production`.

The seasonal variation changes in the original plot. To adjust for this a boxcox transformation can be used. The transformed plot has a seasonal variation that is similar to the rest of series.

Before Transformation

aus_production %>%
  autoplot(.vars = Gas) +
  labs(title = "Gas Production")

After Transformation

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

Exercise 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Solution 3.3

The box-cox transformation is unhelpful because it does not make the seasonal variation about the same across he whole series.

p1 <- canadian_gas %>%
  autoplot(.vars = Volume) +
  labs(title = "Non-Transformed")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

p2 <- canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed with $\\lambda$ = ",
         round(lambda,2)))) 

plot_grid(p1, p2,
          nrow = 1, ncol = 2)

Exercise 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

Solution 3.4

With the seed I choose to use, a good box-cox transformation \(\lambda\) would be 0.23.

set.seed(227)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

p1 <- myseries %>%
  autoplot(Turnover) +
  labs(title = "Non-Transformed")

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

p2 <- myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed with $\\lambda$ = ",
         round(lambda,2)))) 


plot_grid(p1, p2,
          nrow = 1, ncol = 2)

Exercise 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Solution 3.5

Tobacco from `aus_production`

Ideal box-cox transformation value is 0.93.

p1 <- aus_production %>%
  autoplot(.vars = Tobacco) +
  labs(title = "Non-Transformed")

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

p2 <- aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed with $\\lambda$ = ",
         round(lambda,2)))) 

plot_grid(p1, p2,
          nrow = 1, ncol = 2)

## Warning: Removed 24 rows containing missing values (`geom_line()`).
## Removed 24 rows containing missing values (`geom_line()`).

Economy class passengers between Melbourne and Sydney from `ansett`

Ideal box-cox transformation value is 2.

p1 <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD") %>%
  autoplot(.vars = Passengers) +
  labs(title = "Non-Transformed")

lambda <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD") %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

p2 <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD") %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed with $\\lambda$ = ",
         round(lambda,2)))) 

plot_grid(p1, p2,
          nrow = 1, ncol = 2)

Pedestrian counts at Southern Cross Station from `pedestrian`

Ideal box-cox transformation value is -0.25.

p1 <- pedestrian %>%
  filter(Sensor=='Southern Cross Station') %>%
  autoplot(.vars = Count) +
  labs(title = "Non-Transformed")

lambda <- pedestrian %>%
  filter(Sensor=='Southern Cross Station') %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

p2 <- pedestrian %>% 
  filter(Sensor=='Southern Cross Station') %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed with $\\lambda$ = ",
         round(lambda,2)))) 

plot_grid(p1, p2,
          nrow = 1, ncol = 2)

Exercise 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)**

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Solution 3.7

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

There is a seasonality fluctation that occurs every year. The cycle is always lowest at quarter 1 and peaks at quarter 3. There is an positive overall trend in this timeseries.

gas <- tail(aus_production, 5*4) %>% select(Gas)
autoplot(gas, .vars = Gas)

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>% 
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Gas")

## Warning: Removed 2 rows containing missing values (`geom_line()`).

Do the results support the graphical interpretation from part a?

The results do support the graphical interpretation from part a. The trend shows that there is a positive increase over time and the seasonality repeats every year.

Compute and plot the seasonally adjusted data.

gas %>% 
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  select(season_adjust) %>%
  autoplot() +
  labs(title = "Seasonally Adjusted Gas")

## Plot variable not specified, automatically selected `.vars = season_adjust`

q3.7 <- gas %>% 
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components()

autoplot(q3.7, .vars = season_adjust) +
  labs(title = "Seasonally Adjusted Classical Multiplicative Decomposition")

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

The outlier creates a large spike in the timeseries. The outlier also changes the seasonally adjusted values compared to the original data.

gas[10,1] <- gas[10,1] + 300

gas %>% 
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  select(season_adjust) %>%
  autoplot() +
  labs(title = "Seasonally Adjusted Gas with Outlier") + 
  geom_line(aes(y = q3.7$season_adjust, color = "Original")) +
  scale_color_manual(name='',
                     breaks=c('Outlier', 'Original'),
                     values=c('Outlier'='black', 'Original'='blue'))

## Plot variable not specified, automatically selected `.vars = season_adjust`

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

The outlier’s location appears to make a difference in the end results of the seasonality adjustment calculations.

gas <- tail(aus_production, 5*4) %>% select(Gas)

gas[2,1] <- gas[2,1] + 300

p1 <- gas %>% 
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  select(season_adjust) %>%
  autoplot() +
  labs(title = "Outlier Near Front") + 
  geom_line(aes(y = q3.7$season_adjust, color = "Original")) +
  scale_color_manual(name='',
                     breaks=c('Outlier', 'Original'),
                     values=c('Outlier'='black', 'Original'='blue'))

## Plot variable not specified, automatically selected `.vars = season_adjust`

gas <- tail(aus_production, 5*4) %>% select(Gas)

gas[10,1] <- gas[10,1] + 300

p2 <- gas %>% 
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  select(season_adjust) %>%
  autoplot() +
  labs(title = "Outlier Near Middle") + 
  geom_line(aes(y = q3.7$season_adjust, color = "Original")) +
  scale_color_manual(name='',
                     breaks=c('Outlier', 'Original'),
                     values=c('Outlier'='black', 'Original'='blue'))

## Plot variable not specified, automatically selected `.vars = season_adjust`

gas <- tail(aus_production, 5*4) %>% select(Gas)

gas[18,1] <- gas[18,1] + 300

p3 <- gas %>% 
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  select(season_adjust) %>%
  autoplot() +
  labs(title = "Outlier Near Back") + 
  geom_line(aes(y = q3.7$season_adjust, color = "Original")) +
  scale_color_manual(name='',
                     breaks=c('Outlier', 'Original'),
                     values=c('Outlier'='black', 'Original'='blue'))

## Plot variable not specified, automatically selected `.vars = season_adjust`

plot_grid(p1, p2, p3,
          nrow = 2, ncol = 2)

Exercise 3.8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Solution 3.8

There appears to be a huge spike in the early 2000s, indicating the presence of an outlier.

set.seed(227)
x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of total AUS retail employment using X-11.")

Exercise 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Is the recession of 1991/1992 visible in the estimated components?

Solution 3.9

It is observe that the civilian labor force in Australia has had a consistent upward trend for the entire duration of the timeseries (Feb 1978 to Aug 1995). The seasonality shows to be slightly constant, though it should be noted that the variability has increased in the later years in the series. There are large outliers in the remainder portion of the STL decomposition in the early 1990s. This could be due to a recession in Australia in 1991. The seasonal plot shows that a major recession occurred in the early 1990s as most months have a large dip at the 1990 mark.
Due to the outliers observe in the period 1991/1992, the recession of that period is quite visible due to the component observe

Homework 2 - Time Series Decomposition

Exercises 3.1

Solution 3.1

Exercise 3.2

United States GDP from global_economy.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

Victorian Electricity Demand from vic_elec.

Gas production from aus_production.

Exercise 3.3

Solution 3.3

Exercise 3.4

Solution 3.4

Exercise 3.5

Solution 3.5

Tobacco from aus_production

Economy class passengers between Melbourne and Sydney from ansett

Pedestrian counts at Southern Cross Station from pedestrian

Exercise 3.7

Solution 3.7

Exercise 3.8

Solution 3.8

Exercise 3.9

Solution 3.9

United States GDP from `global_economy`.

Slaughter of Victorian “Bulls, bullocks and steers” in `aus_livestock`.

Victorian Electricity Demand from `vic_elec`.

Gas production from `aus_production`.

Tobacco from `aus_production`

Economy class passengers between Melbourne and Sydney from `ansett`

Pedestrian counts at Southern Cross Station from `pedestrian`