library(fpp3)
In this document, we will be going through exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from Forecasting: Principles and Practice (3rd ed).
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
#help("global_economy")
#head(global_economy)
global_economy |>
slice_max(order_by = GDP/Population, n = 1) |>
with(annotate('label', label = Country, x = Year, y = 1.05*(GDP/Population - 1))) -> labels
global_economy |>
autoplot(GDP/Population, show.legend = FALSE , na.rm= TRUE) +
labs(title= "GDP per capita", y = "$US") +
labels
The country with the highest GDP per capita is Monaco. This make sense as a small island country with a smaller but rich population is going to have more spending happening per person. Those countries with more people to balance out the rich outliers end up having a smaller GDP per capita.
GDP per capita has actually remained fairly constant since data collection started. The overall trend is an almost exponential increase in GDP per capita. Yet we can still see cyclical effects from recessions having the series decrease. The 2008 recession is very visible given the peak right before it.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
par(mfrow = c(2,1))
global_economy |>
filter(Country == "United States") |>
autoplot(GDP) +
labs(title= "US GDP", y = "$US")
global_economy |>
filter(Country == "United States") |>
autoplot(GDP/CPI*100) +
labs(title= "US GDP Adjusted to 2010 Dollars", y = "$US (Adjusted to 2010)")
For US GDP, I adjusted plotting the GDP by fixing the GDP to the CPI. This is because strictly plotting the GDP would not take into account how inflation affects GDP and just leads to it going up. It’s important to note here that how we manipulate the data depends on what we are trying to present. If we wanted to decouple the influence of population and GDP we could also transform this to GDP per capita.
#head(aus_livestock)
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers") |>
autoplot(Count) +
labs(title= "Australian Bulls, Bullocks and Steers Slaughtered Over Time", y = "Count")
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers") |>
index_by(Year = year(Month)) |>
group_by(State) |>
summarise(Count = sum(Count)) |>
autoplot(Count) +
labs(title= "Australian Bulls, Bullocks and Steers Slaughtered Over Time", y = "Count")
Taking this data and graphing it with the existing time index of monthly provides us with a lot of noise without greatly enlarging the graph. One way to get around this is assuming we want just the general yearly trend of animal slaughter, we can reindex by year and sum up the counts. This allows for a much more readable graph in terms of trends.
#head(vic_elec)
vic_elec |>
autoplot(Demand) +
labs(title= "Victorian Electricity Demand Over Time", y = "Demand (MWh)")
vic_elec |>
index_by(Month = yearmonth(Time)) |>
summarize(Demand = sum(Demand)) |>
autoplot(Demand) +
labs(title= "Victorian Electricity Demand Over Time", y = "Demand (MWh)")
For Victorian electricity demand we have a very similar transformation
in re-indexing it from many half-hourly points to the sum of monthly
points. This allows us to tackle the trends at a higher level.
#head(aus_production)
aus_production |>
autoplot(Gas) +
labs(title= "Australian Gas Production Over Time", y = "Gas Produced (petajoules)")
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(title= "Transformed Australian Gas Production Over Time", y = "Gas Produced (petajoules)")
For gas production in Australia, if we want to control heteroscedasticity we can transform it with a box-cox adjustment. We utilize the guerrero approach to get a lambda of 0.11 and then transform the data with that as the box-cox parameter.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
#head(canadian_gas)
canadian_gas |>
autoplot(Volume) +
labs(title= "Canadian Gas Production Over Time", y = "Gas Produced (Billions of Cubic Metres)")
lambda <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume, lambda)) +
labs(title= "Canadian Gas Production Over Time", y = "Gas Produced (Billions of Cubic Metres)")
The reason a box-cox transformation does not work for the Canadian gas data is the fact that the increase in variation maxes out in the middle of the data. We are unable to decrease that portion of the data with box-cox without completely squishing down the data in the later years as the variation at the ends gets affected more when lambda goes up.
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
This is actually referring to exercise 7 which we have not done, so I load in the data from it below:
set.seed(1234567)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
#head(myseries)
df1 <- myseries
f1 <- "Turnover"
df1 |>
autoplot(Turnover) +
labs(title= sprintf("%s Over Time", f1), y = sprintf("%s", f1))
lambda <- df1 |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
df1 |>
autoplot(box_cox(Turnover, lambda)) +
labs(title= sprintf("Transformed %s Over Time", f1), y = sprintf("%s", f1))
print(sprintf("The auto tuned box-cox transform with a lambda of %s is what we would use to transform this data", lambda))
## [1] "The auto tuned box-cox transform with a lambda of 0.176110327606469 is what we would use to transform this data"
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance:
df1 <- aus_production
f1 <- "Tobacco"
#head(df1)
df1 |>
autoplot(Tobacco, na.rm = TRUE) +
labs(title= sprintf("%s Over Time", f1), y = sprintf("%s", f1))
lambda <- df1 |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
df1 |>
autoplot(box_cox(Tobacco, lambda), na.rm = TRUE) +
labs(title= sprintf("Transformed %s Over Time", f1), y = sprintf("%s", f1))
print(sprintf("The auto tuned box-cox transform with a lambda of %s is what we would use to transform this data", lambda))
## [1] "The auto tuned box-cox transform with a lambda of 0.926463585274373 is what we would use to transform this data"
However, do note here that the transformation does not have a huge effect as the variance change can not be neatly transformed due to the distribution of variance not following much of a pattern.
df1 <- ansett
f1 <- "Passengers"
df1 |>
filter(Airports == "MEL-SYD", Class == "Economy") -> df1
df1 |>
autoplot(Passengers) +
labs(title= sprintf("Economy %s Over Time", f1), y = sprintf("%s", f1))
lambda <- df1 |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
df1 |>
autoplot(box_cox(Passengers, lambda)) +
labs(title= sprintf("Transformed Economy %s Over Time", f1), y = sprintf("%s", f1))
print(sprintf("The auto tuned box-cox transform with a lambda of %s is what we would use to transform this data", lambda))
## [1] "The auto tuned box-cox transform with a lambda of 1.9999267732242 is what we would use to transform this data"
Note in this case the auto tuning is trying to completely cancel out the black swan events which should not be something we do. In my opinion it is better to not transform this data as the increase in variance at points does not seem to be from a trend of changing seasonal magnitude.
df1 <- pedestrian |>
filter(Sensor == "Southern Cross Station")
f1 <- "Count"
#head(df1)
df1 |>
autoplot(Count) +
labs(title= sprintf("%s Over Time", f1), y = sprintf("%s", f1))
lambda <- df1 |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
df1 |>
autoplot(box_cox(Count, lambda)) +
labs(title= sprintf("Transformed %s Over Time", f1), y = sprintf("%s", f1))
print(sprintf("The auto tuned box-cox transform with a lambda of %s Over Time is what we would use to transform this data", lambda))
## [1] "The auto tuned box-cox transform with a lambda of -0.2501615623911 Over Time is what we would use to transform this data"
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas |>
autoplot(Gas)
There is very clear seasonality with the lowest gas production in Q4 and
Q1 with the peak of Q3 and Q2 being higher. There does seem to be a
slight upward trend as well.
gas_dc <- gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
)
gas_dc |>
components() |>
autoplot(na.rm = TRUE) +
labs(title = "Classical multiplicative decomposition of Australian Gas Production")
Yes, our graphs match the original interpretation.
gas_dc |>
components() |>
autoplot(season_adjust)
gas2 <- gas
gas2[19,1] <- gas[19,1]+300
gas2 |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
autoplot(season_adjust)
gas2 <- gas
gas2[10,1] <- gas[10,1]+300
gas2 |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
autoplot(season_adjust)
The outlier is almost wholly transparent within the seasonally adjusted graph.
No, either way the outlier does not get absorbed by the seasonal decomposition.
Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components() |>
autoplot() +
labs(title =
"Decomposition of retail time series using X-11.")
The most unusual feature that I see uncovered with the X-11 decomposition is the fact that seasonality as a whole seems to have changed. The magnitude of seasonality actually lowers as time goes on. While the graph as a whole seems to suggest the opposite.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Seasonal component from the decomposition shown in the previous figure.
Figure 3.20: Seasonal component from the decomposition shown in the previous figure.
From the decomposition we can see a strong increasing trend of the number of people in the labor force that is only slightly slowed down by events like recessions. However, all of the decrease in trend that seems more apparent in the whole graph may have been transferred to the remainder. As if we look at the values of the remainder, the remainder is mostly negative. The remainder should be more for random noise instead of large explainable events according to my understanding, so the trend calculation should be narrowed in order to be able to conform to these events a bit more. Finally, the seasonality shows a tiny bit of increasing heteroscedasticity with the subseries graph allowing us to more clearly see December and March are peak times for workers in the labor force.
The recession of 1991/2 is visible in the components of the remainder where we see a big drop around that time. The trend also shows a bit of the effect of the recession as the slope decreases during that time even though it doesn’t completely go negative. The seasonal component within both graphs seems to be properly isolated from these effects.