D624 Homework 2: Time series decomposition

Exercise 3.7.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

The USA had the highest GDP every year from 1960 to 2017 but the story changes when you look at GDP per Capita. Then the USA had the highest from 1960 to 1969 except for two years when Kuwait took the top spot (1965-1966). Monaco had the longest reigning run from 1970 to 2016 except for two years that the United Arab Emirates had the top spot (1976-1977) and the two years Liechtenstein had the top spot (2013 and 2015). Luxembourg took the top spot in 2017 however it’s not visible on the graph below since we don’t have data for 2018.

Also not captured in the graph below is the complicated code where I created a table with the top country by year after eliminating all of the aggregated categories like EU region, World, etc. You can view this code in the appendix at the end.

Notably, Kuwait reported no data during the Gulf War.

Exercise 3.7.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

Part a

For “United States GDP from global_economy” we would not consider a Box-Cox transformation because there is no seasonality. If you look at the original curve there’s a logrithmic slope so we could take the log which is reflected in the graph below:

Part b

For “Slaughter of Victorian ‘Bulls, bullocks and steers’ in aus_livestock” it’s not clear that we would use a transformation. It looks like the variation of the seasonality in the last eight years is lower than the previous years however before I would use a box-cox transformation I would expect to see a graph with a smooth transition from low to high or high to low seasonality variation. An alternative consideration would be to rerepresent the tsibble as yearly instead of monthly data.

Part c

For “Victorian Electricity Demand from vic_elec” the graph is a cloud of detail. I would change the index period from every thirty minutes to daily.

Part d

For “Gas production from aus_production” we can see increasing variation in seasonality across the time series so it would be appropriate to do a box-cox transformation. We could use a guerrero function to identify the box-cox lambda that would produce the most consistent variation across seasonality however, experimenting into slightly positive numbers, 0.1 as lambda seems to work well.

Exercise 3.7.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

We can’t use a Box-Cox transformation for the canadian_gas data because look how the seasonal variation bulges up in the middle. (If it had legs it could be a giraffeasaur.) We would need the seasonal variance to consistently increase or decrease to use a box-cox transformation.

Exercise 3.7.4

What Box-Cox transformation would you select for your retail data generated from aus_retail using a particular seed value

We generated a time series from aus_retail using a seed value of 20240915. Then we used the guerrero feature to identify an optimal lambda value of 0.03543504. I will echo the code so you can view it without going to the appendix.

### Exercise 3.7.4

set.seed(20240915)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)
myseries |>
  autoplot(box_cox(Turnover, lambda))

Exercise 3.7.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

Part a

For “Tobacco from aus_production” we did a box-cox transformation using a guerrero feature-generated lambda of 0.9264636.

Part b

For “Economy class passengers between Melbourne and Sydney from ansett” we did a box-cox transformation using a guerrero feature-generated lambda of 1.999927.

Without deeper prior experience I’m worried that the box-cox transformation is using a lambda greater than 1 (which would be equivalent to the natural log). Also there was a period where all flights were grounded in the late 80s. That may have been due to a giant volcanic eruption in Indonesia that I remember hearing about once. Without being asked to do a box-cox transformation I wouldn’t have done one because that one event creates a seasonal variance distortion for just a segment of the data.

Part c

For “Pedestrian counts at Southern Cross Station from pedestrian” we did a box-cox transformation using a guerrero feature-generated lambda of -0.2501616.

However, for the guerrero feature to pull a negative lambda seems to indicate a box-cox transformation is not appropriate. It could be because the tsibble needs to be adjusted so that all passenger crossings on a given day are collapsed into one record.

Exercise 3.7.7

Considering the last five years of the Gas data from aus_production

Part a

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

Yes, there is a seasonal fluctuation where in Q4 and Q1 (the Australian winter since the Australian fiscal year ends June 30th) gas usage is the lowest and in Q2 and Q3 (the Australian summer) gas usage is the highest (Travel! Beach!).

Additionally it looks like there is an increasing trend.

Part b

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

You can clearly see the trend and seasonality and tease out the unexplained randomness in the data to possibly look for events that would explain a larger than normal travel season in the Australian summer of 2006 and events that would explain a lower than normal travel season in the Australian summer of 2008 (also a global financial crisis).

Notice however the head and tail of the trend and random components are absent!

Part c

Do the results support the graphical interpretation from part a?

The results do support the graphical interpretation from Part a.

Part d

Compute and plot the seasonally adjusted data.

You can verify this is the seasonally adjusted data because it looks like the trend component and random component from Part b are combined into one line.

Part e

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

It looks like the +300 went almost entirely to the random component and so is included in the seasonally adjusted data almost entirely.

Part f

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

It doesn’t seem to make a difference whether the outlier is near the end rather than the middle of the time series. I plotted the outlier 30% of the way through the first time and now below plot it at 60 and 90% of the way through and observe that the outlier in the middle has the seasonally adjusted trend line not go up so high but I believe that is because I’m plotting the outlier during the start of the Australian summer, not the start of winter like the first and last plots.

However it may be that when the outlier is in the beginning or end of the series it tips the trend line, compared to being in the middle.

Exercise 3.7.8

Recall your retail time series data (from Exercise 3.7.4). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

We didn’t run a classical decomposition on the retail time series data last time so we can’t compare the trend, seasonality and irregularity. However we know from section 3.5 that the advantages of the X-11 method over classical decomposition are (1) trend-cycle estimates for endpoints, (2) the seasonal component is allowed to vary over time, (3) the X-11 method can handle more complex irregularities, and (4) tends to be robust to outliers and level shifts. (Level shifts being like a state jump in the time series data to a new baseline level.)

What I’d expect from the reading is that the trend line tells a more nuanced tale that we can mine for looking for events. Like why was December 2009 a peak in the retail data?

I also wonder if the allowance for slow adjustment in seasonality makes it less important to transform your data to adjust for variance in the seasonality.

My understanding from further reading to help with 3.7.9 is that we would use the X-11 method if there are unusual irregularities like trading days and STL otherwise.

Exercise 3.7.9

Two figures from the text show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Part a

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

We show an increase in the variance in seasonality over time (with seasonal maximums changing from ~75 to ~100) which confirms we’re not using classical decomposition.

We do show a flattening of the trend curve around the 1991/1992 recession, however most of that decrease in civilian labor force has leaked into the remainder meaning that maybe we should have used a smaller trend window to have a less smoothed trend and capture more of that short term variation due to the recession.

The seasonal sub-series plot should help us visualize the variation in the seasonal component over time, however it looks like there are only small changes over time. Most of the changes are less than 25 and there’s not a consistent pattern in their changes or from month to month.

Part b

Is the recession of 1991/1992 visible in the estimated components?

Yes, it’s clearly visible in the remainder component.

References

These exercises come from ‘Forecasting: Principles and Practice’ by Rob J Hyndman and George Athanasopoulos, 3rd Ed
https://otexts.com/fpp3/

The introductory video at the beginning of Section 3.1, Transformations and adjustments, was incredibly helpful for understanding the thrust of the assignment.

No other work was referenced in the production of this assignment (for better or for worse).

Code Appendix

### Exercise 3.7.1

# Load Libraries
library(fpp3)

# Find out the countries with the highest per capita GDP per year
global_economy_df <- global_economy %>%
  as_tibble()
highest_gdppc_per_year <- global_economy_df %>%
  filter(!Code %in% c("WLD","HIC","OED","PST","NAC","ECS","EUU","EMU","EAS","IBT","LMY","IBD","MIC","UMC","LTE")) %>%
  group_by(Year) %>%
  slice_max(order_by = GDP / Population, n = 1) %>%
  ungroup() %>%
  mutate(GDPPC = GDP / Population)
highest_gdppc_tsibble <- highest_gdppc_per_year %>%
  as_tsibble(index = Year)
#print(highest_gdppc_tsibble)

# Graph not pretty enough with only the highest years
#autoplot(highest_gdppc_tsibble, GDPPC) +
#  aes(color = Code)

# Graphing the six countries who at one point had the top spot
global_economy %>%
  filter(Code %in% c("USA","KWT","MCO","ARE","LIE","LUX")) %>%
  autoplot(GDP / Population)



### Exercise 3.7.2

global_economy %>%
  filter(Code == "USA") %>%
  autoplot(log(GDP))

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers") %>%
  filter(State == "Victoria") %>%
  autoplot(Count)

vic_elec %>%
  autoplot(Demand)

aus_production %>%
  autoplot(box_cox(Gas, 0.1))



### Exercise 3.7.3

canadian_gas %>%
  autoplot(Volume)



### Exercise 3.7.4

set.seed(20240915)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)
myseries |>
  autoplot(box_cox(Turnover, lambda))



### Exercise 3.7.5

# Tobacco from aus_production
lambda <- aus_production |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)
aus_production |>
  autoplot(box_cox(Tobacco, lambda))


# Economy class passengers between Melbourne and Sydney from ansett
lambda <- ansett |>
  filter(Class == "Economy") |>
  filter(Airports == "MEL-SYD") |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)
ansett |>
  filter(Class == "Economy") |>
  filter(Airports == "MEL-SYD") |>
  autoplot(box_cox(Passengers, lambda))


# Pedestrian counts at Southern Cross Station from pedestrian
lambda <- pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)
pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  autoplot(box_cox(Count, lambda))



### Exercise 3.7.7

# Part a
gas <- tail(aus_production, 5*4) |>
  select(Gas)
autoplot(gas)


# Part b
gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>
  autoplot()

# Part d
dcmp <- gas |>
  model(stl = STL(Gas))
components(dcmp) |>
  as_tsibble() |>
  autoplot(Gas, color="gray") +
  geom_line(aes(y=season_adjust), color="#D55E00")


# Part e
gas[6,1] <- 192
gas[6,1] <- gas[6,1]+300

dcmp <- gas |>
  model(stl = STL(Gas))
components(dcmp) |>
  as_tsibble() |>
  autoplot(Gas, color="gray") +
  geom_line(aes(y=season_adjust), color="#D55E00")


# Part f
gas[6,1] <- 192
gas[12,1] <- 229
gas[12,1] <- gas[6,1]+300

dcmp <- gas |>
  model(stl = STL(Gas))
components(dcmp) |>
  as_tsibble() |>
  autoplot(Gas, color="gray") +
  geom_line(aes(y=season_adjust), color="#D55E00")

gas[6,1] <- 192
gas[12,1] <- 229
gas[18,1] <- 210
gas[18,1] <- gas[6,1]+300

dcmp <- gas |>
  model(stl = STL(Gas))
components(dcmp) |>
  as_tsibble() |>
  autoplot(Gas, color="gray") +
  geom_line(aes(y=season_adjust), color="#D55E00")


### Exercise 3.7.8

library(seasonal)

x11_myseries <- myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()
autoplot(x11_myseries)

D624 Homework 2: Time series decomposition

PK O’Flaherty

2024-09-15

Exercise 3.7.1

Exercise 3.7.2

Part a

Part b

Part c

Part d

Exercise 3.7.3

Exercise 3.7.4

Exercise 3.7.5

Part a

Part b

Part c

Exercise 3.7.7

Part a

Part b

Part c

Part d

Part e

Part f

Exercise 3.7.8

Exercise 3.7.9

Part a

Part b

References

Code Appendix