Forecasting: Principles and Practice - Chapter 3 Exercise: 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9
global_economy |>
mutate(GDP_per_capita = GDP / Population) |>
arrange(desc(GDP_per_capita)) |>
select(Country, Year, GDP_per_capita) |>
head(10)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## # A tsibble: 10 x 3 [1Y]
## # Key: Country [2]
## Country Year GDP_per_capita
## <fct> <dbl> <dbl>
## 1 Monaco 2014 185153.
## 2 Monaco 2008 180640.
## 3 Liechtenstein 2014 179308.
## 4 Liechtenstein 2013 173528.
## 5 Monaco 2013 172589.
## 6 Monaco 2016 168011.
## 7 Liechtenstein 2015 167591.
## 8 Monaco 2007 167125.
## 9 Liechtenstein 2016 164993.
## 10 Monaco 2015 163369.
global_economy |>
filter(Country == "Monaco") |>
mutate(GDP_per_capita = GDP / Population) |>
autoplot(GDP_per_capita) +
labs(
title = "Monaco GDP per Capita Over Time",
y = "GDP per capita"
)
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
Monaco has the highest GDP per capita values in the dataset, with the highest GDP per capita occurring in 2014 at approximately 185,153. Monaco’s GDP per capita remains extremely high throughout the series and generally increases over time. There are some fluctuations from year to year, but the overall trend shows strong economic growth.
2.For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy.
global_economy |>
filter(Country == "United States") |>
autoplot(GDP) +
labs(title = "United States GDP")
global_economy |>
filter(Country == "United States") |>
autoplot(log(GDP)) +
labs(
title = "Log Transformed United States GDP",
y = "log(GDP)"
)
The original GDP series shows strong exponential growth over time with increasing variation. Applying a log transformation makes the growth pattern more linear.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
aus_livestock |>
filter(
State == "Victoria",
Animal == "Bulls, bullocks and steers"
) |>
autoplot(Count) +
labs(
title = "Victorian Bulls, Bullocks and Steers Slaughter",
y = "Count"
)
The series displays seasonal fluctuations and some long-term variation. A transformation is not strongly necessary because the variance remains relatively stable across time.
Victorian Electricity Demand from vic_elec.
vic_elec |>
autoplot(Demand) +
labs(
title = "Victorian Electricity Demand",
y = "Demand"
)
vic_elec |>
autoplot(log(Demand)) +
labs(
title = "Log Transformed Victorian Electricity Demand",
y = "log(Demand)"
)
The electricity demand series contains strong daily and seasonal patterns. The log transformation slightly stabilizes the variation, although the original series is already fairly stable.
Gas production from aus_production.
aus_production |>
autoplot(Gas) +
labs(
title = "Australian Gas Production",
y = "Gas"
)
aus_production |>
autoplot(log(Gas)) +
labs(
title = "Log Transformed Gas Production",
y = "log(Gas)"
)
The gas production series shows increasing seasonal variation as production rises over time. The log transformation stabilizes the variance and reduces the size of seasonal swings.
A Box-Cox transformation is unhelpful for the canadian_gas data because the main issue in the series is changing seasonal behavior rather than changing variance. Box-Cox transformations are primarily used to stabilize variance.
The Box-Cox transformation selected for the retail series was based on the Guerrero method. It stabilizes the variance across seasonal periods. This is useful for retail data because variability often increases as turnover increases over time, and the transformation makes the series easier to analyze and forecast.
# Tobacco
aus_production |>
features(Tobacco, guerrero)
## # A tibble: 1 × 1
## lambda_guerrero
## <dbl>
## 1 0.926
# Economy passengers
ansett |>
filter(Airports == "MEL-SYD",
Class == "Economy") |>
features(Passengers, guerrero)
## # A tibble: 1 × 3
## Airports Class lambda_guerrero
## <chr> <chr> <dbl>
## 1 MEL-SYD Economy 2.00
# Southern Cross pedestrians
pedestrian |>
filter(Sensor == "Southern Cross Station") |>
features(Count, guerrero)
## # A tibble: 1 × 2
## Sensor lambda_guerrero
## <chr> <dbl>
## 1 Southern Cross Station -0.250
I would use the Guerrero method to select the Box-Cox transformation for each series because it finds a lambda value that stabilizes variance across seasonal periods. If the lambda value is close to 0, I would use a log transformation. If it is close to 1, little or no transformation is needed.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas <- tail(aus_production, 5*4) |>
select(Gas)
gas |>
autoplot(Gas) +
labs(
title = "Australian Gas Production: Last Five Years",
y = "Gas production"
)
The plot shows clear seasonal fluctuations, with gas production rising and falling in a repeating quarterly pattern. There also appears to be a slight trend-cycle.
dcmp <- gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
dcmp |>
autoplot()
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
Yes, the decomposition supports the graphical interpretation. The seasonal component shows a repeating quarterly pattern, while the trend-cycle component captures the smoother long-term movement in the series.
# Part c: Seasonally adjusted data
dcmp |>
autoplot(season_adjust) +
labs(
title = "Seasonally Adjusted Gas Production",
y = "Seasonally adjusted gas"
)
gas_outlier_middle <- gas |>
mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))
dcmp_outlier_middle <- gas_outlier_middle |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
dcmp_outlier_middle |>
autoplot(season_adjust) +
labs(
title = "Seasonally Adjusted Gas with Middle Outlier",
y = "Seasonally adjusted gas"
)
The outlier creates a large spike in the seasonally adjusted data. It can also distort the estimated trend-cycle and seasonal indices because classical decomposition is sensitive to extreme values.
# Part e: Add an outlier near the end
gas_outlier_end <- gas |>
mutate(Gas = if_else(row_number() == n(), Gas + 300, Gas))
dcmp_outlier_end <- gas_outlier_end |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
dcmp_outlier_end |>
autoplot(season_adjust) +
labs(
title = "Seasonally Adjusted Gas with End Outlier",
y = "Seasonally adjusted gas"
)
Yes, the position of the outlier matters. An outlier in the middle affects the surrounding trend-cycle more because it is included in the moving average calculation. An outlier near the end mainly affects the final observations and may have less influence on the estimated trend-cycle.
library(seasonal)
library(x13binary)
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
# X-11 decomposition
myseries |>
model(
x11 = X_13ARIMA_SEATS(Turnover ~ x11())
) |>
components() |>
autoplot()
The decomposition shows a strong upward trend in the Australian civilian labour force from 1978 to 1995. The trend component increases steadily over time, indicating long-term growth in the number of people participating in the labour force. The seasonal component is relatively small compared to the overall level of the series, with fluctuations generally between about -100 and 100, while the total series ranges from roughly 6500 to 9000. This indicates that seasonality exists but has a much smaller effect than the long-term trend.
Yes, the recession of 1991/1992 is visible in the decomposition, particularly in the remainder component and the trend component. Around 1991–1992 there are several unusually large negative residuals, and the upward trend temporarily slows during this period.