The following RMD contains answers to exercises in chapter 3 of the Forecasting: Principles and Practice textbook for CUNY SPS DATA 624 Spring 2025 https://otexts.com/fpp3/decomposition-exercises.html. This chapter focuses on Time Series Decomposition, its applications, uses, and methods. The 3.7 Exercises answered here include 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9.
library(fpp3)
library(tidyverse)
library(seasonal)
Prompt: Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
# Calculate GDP per capita and remove missing values
global_gdp <- global_economy |>
select(Country, Year, GDP, Population) |>
mutate(GDP_per_capita = GDP / Population) |>
drop_na(GDP_per_capita)
# Plot GDP per capita over time for each country
ggplot(global_gdp, aes(x = Year, y = GDP_per_capita, color = Country)) +
geom_line() +
labs(title = "GDP per Capita Over Time",
x = "Year",
y = "GDP per Capita") +
scale_y_continuous(labels = scales::label_dollar()) +
theme_minimal() +
theme(legend.position = "none")
# Find the country with the highest GDP per capita each year
highest_gdp_per_capita <- global_gdp |>
index_by(Year) |>
slice_max(order_by = GDP_per_capita, n = 1) |>
ungroup()
# Display the top GDP per capita countries over time
print(highest_gdp_per_capita)
## # A tsibble: 58 x 5 [1Y]
## # Key: Country [263]
## Country Year GDP Population GDP_per_capita
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 United States 1960 5.43e11 180671000 3007.
## 2 United States 1961 5.63e11 183691000 3067.
## 3 United States 1962 6.05e11 186538000 3244.
## 4 United States 1963 6.39e11 189242000 3375.
## 5 United States 1964 6.86e11 191889000 3574.
## 6 Kuwait 1965 2.10e 9 473554 4429.
## 7 Kuwait 1966 2.39e 9 524856 4556.
## 8 United States 1967 8.62e11 198712000 4336.
## 9 United States 1968 9.42e11 200706000 4696.
## 10 United States 1969 1.02e12 202677000 5032.
## # ℹ 48 more rows
# Visualize the top GDP per capita countries over time
ggplot(highest_gdp_per_capita, aes(x = Year, y = GDP_per_capita, fill = Country)) +
geom_col() +
labs(title = "Highest GDP per Capita by Year",
x = "Year",
y = "GDP per Capita",
fill = "Country") +
scale_y_continuous(labels = scales::label_dollar()) +
scale_fill_brewer(palette = "Set2") +
theme_minimal()
The country that has the highest GDP per capita is most recently Luxemborg. Across time in this data, the highest GDP per capita has been the United States, Kuwait, Monaco, the UAE, Liechtenstein, and Luxemborg. Monaco held the title most often, with a total of 43 years. The highest GDP per capita seems to indicate global cycles with dips around 1995 and 2000. Other than these dips, the data consists of an upward trend in global maximum GDP per capita.
It is worth noting that there are some rows in the data that contain non-country values such as “Middle income.” They were not discluded from the data because they did not affect the final Highest GDP per Capita by Year graph and calculation.
Prompt: For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock
Victorian Electricity Demand from vic_elec
Gas production from aus_production
# Load United States GDP
us_gdp <- global_economy |>
filter(Country == "United States")|>
select(Country, Year, GDP)
# Plot Year and GDP as a line graph
ggplot(us_gdp, aes(x = Year, y = GDP)) +
geom_line() +
scale_y_continuous("Total GDP",
breaks = scales::breaks_extended(8),
labels = scales::label_currency()
) +
theme_minimal()
# Load Slaughter of Victorian “Bulls, bullocks and steers”
slaughter_bbs <- aus_livestock |>
filter(Animal == "Bulls, bullocks and steers")|>
select(Month, Animal, Count)
# Plot Month and Count as a line graph
ggplot(slaughter_bbs, aes(x = Month, y = Count)) +
geom_line() +
scale_y_continuous("Total Slaughtered Bulls, Bullocks and Steers", labels = scales::label_comma()
) +
theme_minimal()
# Plot Month and Count with geom smooth to visualize changes over time
ggplot(slaughter_bbs, aes(x = Month, y = Count)) +
geom_smooth() +
scale_y_continuous("Total Slaughtered Bulls, Bullocks and Steers", labels = scales::label_comma()
) +
theme_minimal()
This data is quite concentrated as it spans several decades and includes monthly-level cuts. Using geom_smooth(), it is possible to see the trends that the data experiences at an aggregated level.
# Load Victorian Electricity Demand
vic_elec_demand <- vic_elec |>
select(Time, Demand)
# Plot Month and Count as a line graph
ggplot(vic_elec_demand, aes(x = Time, y = Demand)) +
geom_line() +
scale_y_continuous("Victorian Electricity Demand (MWh)", labels = scales::label_comma()
) +
theme_minimal()
# Plot Month and Count as a line graph
ggplot(vic_elec_demand, aes(x = Time, y = Demand)) +
geom_smooth() +
scale_y_continuous("Victorian Electricity Demand (MWh)", labels = scales::label_comma()
) +
theme_minimal()
Once again, this data is too concentrated to meaningfully parse trends. It uses a scale of 30 minutes and spans several years. Using geom_smooth(), it is possible to see the trends that the data experiences at an aggregated level.
# Compute Box-Cox lambda using the Guerrero method
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
# Load Gas Production, extract quarter for graphing, and apply Box-Cox transformation
gas_prod <- aus_production |>
select(Quarter, Gas) |>
mutate(
Quarter_Num = as.integer(str_sub(Quarter, 7, 7)),
Transformed_Gas = box_cox(Gas, lambda) # Use stored lambda
)
# Plot Gas Production as a transformed line graph by Quarter and Gas
ggplot(gas_prod, aes(x = Quarter, y = Transformed_Gas)) +
geom_line() +
geom_point(aes(color = factor(Quarter_Num))) + # No need to specify y again
scale_color_brewer(palette = "Set2") +
labs(title = paste0("Gas Production & Box-Cox Transformed Gas (lambda = ", round(lambda, 2), ")"),
x = "Year and Quarter",
y = "Transformed Gas Production",
color = "Quarter") +
theme_minimal()
This data includes quarterly observations. To see that seasonality, I added colored points to the line graph based on the quarter. Now it is easy to see that the increased variance within each year favors higher values for Q3 and Q2 and lower values for Q1 and Q4. As far as transformation goes, the box_cox transformation is appropriate for this data. This transformation was chosen as it “makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler” (Hyndman).
Prompt: Why is a Box-Cox transformation unhelpful for the canadian_gas data?
# Plot Month and Volume as a line graph to visualize data
ggplot(canadian_gas, aes(x = Month, y = Volume)) +
geom_smooth() +
ylab("Gas Volume (billions of cubic metres)") +
xlab("Year and Month") +
theme_minimal()
The data above is quite smooth. This means the Box-Cox transformation would not regulate any variance, as the variance is already largely steady.
Prompt: What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(64)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Other specialised food retailing A3349799R 1982 Apr 34.9
## 2 Victoria Other specialised food retailing A3349799R 1982 May 34.6
## 3 Victoria Other specialised food retailing A3349799R 1982 Jun 34.6
## 4 Victoria Other specialised food retailing A3349799R 1982 Jul 35.2
## 5 Victoria Other specialised food retailing A3349799R 1982 Aug 33.8
## 6 Victoria Other specialised food retailing A3349799R 1982 Sep 35.4
# Compute Guerrero lambda for the selected series (Turnover)
myseries_lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
# Apply the Box-Cox transformation using the computed lambda
myseries <- myseries |>
mutate(Turnover_Transformed = box_cox(Turnover, myseries_lambda))
# Plot the transformed turnover over time
ggplot(myseries, aes(x = Month, y = Turnover_Transformed)) +
geom_line() +
labs(title = paste0("Box-Cox Transformed Retail Turnover (lambda = ", round(myseries_lambda, 2), ")"),
x = "Year and Month",
y = "Transformed Turnover") +
theme_minimal()
Based on the guerrera calculation, -0.35 would be the ideal lambda to stabilize the variance of the retail data. Looking at the graph, I agree that this lambda does cause a relatively standard variance across time.
Prompt: For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.
Tobacco from aus_production
Economy class passengers between Melbourne and Sydney from ansett
Pedestrian counts at Southern Cross Station from pedestrian
Each of these transformations will follow the same logic flow:
# Compute Guerrero lambda for Tobacco series
lambda_tobacco <- aus_production |>
select(Quarter, Tobacco) |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
# Apply Box-Cox transformation to Tobacco column
tobacco_transformed <- aus_production |>
mutate(Tobacco_Transformed = box_cox(Tobacco, lambda_tobacco))
# Plot the transformed series
ggplot(tobacco_transformed, aes(x = Quarter, y = Tobacco_Transformed)) +
geom_line() +
labs(title = paste("Box-Cox Transformed Tobacco Series (lambda =", round(lambda_tobacco, 2), ")"),
x = "Quarter", y = "Transformed Tobacco") +
theme_minimal()
# Filter data for Economy class passengers between Melbourne and Sydney
econ_mel_syd <- ansett |>
filter(Airports %in% c("MEL-SYD", "SYD-MEL")) |>
filter(Class == "Economy")
# Compute Guerrero lambda for Economy Medbourne Sydney data
lambda_econ_ms <- econ_mel_syd |>
select(Week, Passengers) |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
# Apply Box-Cox transformation to Passengers column
econ_ms_transformed <- econ_mel_syd |>
mutate(Econ_MS_Transformed = box_cox(Passengers, lambda_econ_ms))
# Plot the transformed series
ggplot(econ_ms_transformed, aes(x = Week, y = Econ_MS_Transformed)) +
geom_line() +
labs(title = paste("Box-Cox Transformed Passenger Series (lambda =", round(lambda_econ_ms, 2), ")"),
x = "Week", y = "Transformed Passengers") +
theme_minimal()
pedestrian
## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
## # Key: Sensor [4]
## Sensor Date_Time Date Time Count
## <chr> <dttm> <date> <int> <int>
## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630
## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826
## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567
## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264
## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139
## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77
## 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44
## 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56
## 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113
## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166
## # ℹ 66,027 more rows
# Filter data for pedestrian counts at Southern Cross Station
scs_peds <- pedestrian |>
filter(Sensor == "Southern Cross Station") |>
select(Date, Count)
# Compute Guerrero lambda for pedestrian counts at Southern Cross Station data
lambda_scs_peds <- scs_peds |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
# Apply Box-Cox transformation to Passengers column
scs_peds_transformed <- scs_peds |>
mutate(SCS_Peds_Transformed = box_cox(Count, lambda_scs_peds))
# Plot the transformed series
ggplot(scs_peds_transformed, aes(x = Date, y = SCS_Peds_Transformed)) +
geom_line() +
labs(title = paste("Box-Cox Transformed Pedestrian Count Series (lambda =", round(lambda_scs_peds, 2), ")"),
x = "Date", y = "Transformed Passengers") +
theme_minimal()
Wow, what a graph. In this case, the data needs more work to become
insightful. To fix the readability of this graph, I would take a
smoothing algorithm like geom_smooth(). This would allow for trends to
be read. Also, the seasonality of the data can give insights, so
gg_season() would come in handy here.
Prompt: Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
head(gas)
## # A tsibble: 6 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 221 2005 Q3
## 2 180 2005 Q4
## 3 171 2006 Q1
## 4 224 2006 Q2
## 5 233 2006 Q3
## 6 192 2006 Q4
a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
c. Do the results support the graphical interpretation from part a?
d. Compute and plot the seasonally adjusted data.
e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?
# Extract quarter number for graphing
gas <- gas |> mutate(Quarter_Num = as.integer(str_sub(Quarter, 7, 7)))
# Plot with colored points by quarter
ggplot(gas, aes(x = Quarter, y = Gas)) +
geom_line() +
geom_point(aes(color = factor(Quarter_Num)), size = 3) +
scale_color_brewer(palette = "Set2") +
labs(title = "Gas Production Over Last 5 Years",
x = "Year and Quarter",
y = "Gas Production",
color = "Quarter") +
theme_minimal()
The plot fluctuates by quarter, where Q1 < Q2 < Q3 > Q4. It
seems April to September are the months containing the highest gas
production based on their quarter averages. The overall trend seems to
be increasing over time, with many quarters greater than the previous
observation of that quarter.
# Perform classical decomposition with multiplicative type
gas_decomposed <- gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
# Plot decomposition results with autoplot
autoplot(gas_decomposed) +
labs(title = "Classical Decomposition of Gas Production (Multiplicative)")
Using this time series decomposition, it is easy to see the trend is
increasing over time, the seasonality is consistent over quarters and
completes a loop after 4 quarters (yearly). The remainder graph shows
the variation in the remainder is much lower than the other
components.
My analysis in question a still stands - gentle upward trend with mostly consistent quarterly data as seasons. After looking at the decomposition graphs in b, it is easier to see the actual significance of the upward trend, the seasonality by quarter, and how much variability affects these.
# Calculate multiplicative seasonal adjustment
gas_decomposed <- gas_decomposed |>
mutate(Seasonally_Adjusted = Gas / seasonal)
# Plot this seasonally adjusted data
autoplot(gas_decomposed, Seasonally_Adjusted) +
labs(title = "Seasonally Adjusted Gas Production",
x = "Year and Quarter",
y = "Seasonally Adjusted Gas Production") +
theme_minimal()
After seasonally adjusting the data, the upward trend and its randomness
become easier to see. There are some peaks and dips in the time series
above, but the trend is still overall upward.
# Increase the sixth observation by 300
gas_outlier <- gas
gas_outlier$Gas[6] <- gas_outlier$Gas[6] + 300
# Recompute multiplicative decomposition
gas_outlier_decomposed <- gas_outlier |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
mutate(Seasonally_Adjusted = Gas / seasonal)
# Plot seasonally adjusted data with outlier
autoplot(gas_outlier_decomposed, Seasonally_Adjusted) +
labs(title = "Seasonally Adjusted Gas Production (With Outlier)",
x = "Year and Quarter",
y = "Seasonally Adjusted Gas Production") +
theme_minimal()
Oh my, it seems the outlier has really upset the seasonality
adjustment’s ability to determine the quarterly consistancies. This
result likely comes from the fact that the sixth observation was
changed, which corresponds to Q4, one of the lower seasonal quarters.
The seasonaly adjusted graph includes the Q3 and Q4 dips that would
otherwise be seasonally adjusted for.
# Outlier near the end
gas_outlier_latter <- gas
gas_outlier_latter$Gas[18] <- gas_outlier_latter$Gas[18] + 300
# Recompute decomposition
gas_outlier_latter_decomposed <- gas_outlier_latter |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
mutate(Seasonally_Adjusted = Gas / seasonal)
# Plot seasonally adjusted data with outlier near the end
autoplot(gas_outlier_latter_decomposed, Seasonally_Adjusted) +
labs(title = "Seasonally Adjusted Gas Production for Latter Outlier",
x = "Year and Quarter",
y = "Seasonally Adjusted Gas Production") +
theme_minimal()
There does not seem to be much of a difference between the near-start
outlier and the near-end outlier as far as seasonality adjustment
ability is affected. If anything, the latter outlier graph has slightly
decreased variability in the seasonal trend.
Prompt: Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
# Mirroring question 4's logic, get retail time series data from Exercise 7
set.seed(64)
myseries_x11 <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
# X-11 Decomposition to myseries_x11
x11_dcmp <- myseries_x11 |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
# Autoplot the X-11 decomposition
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Australian retail employment using X-11.")
The data does have an upward trend, but it is easier to see the
different bumps in that trend with this X-11 trend graph. Seasonality
shows that there is a large increase in a single season with the other
seasons’ variabilities not as extreme. For the irregularity graph, the
most variability seems to happen around 1994, 2000-2002, and 2009-2015.
These jumps likely correlate with the Australian economic cycles at
these times.
Prompt: Figures 3.19 and 3.20 (see textbook) show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
a. Figure 3.20 is remarkable helpful when analysing the trends of seasonality across years. Through this breakdown, we can see the seasons that go against the upward trend at the second half of the graph - March, August and November all end below their averages - and the months that have the most variability overall - March, July, and August are the only months having greater than 25 units of variance.
b. Is the recession of 1991/1992 visible in the estimated components?
b. The Figure 3.19 is much more helpful in visualizing the recession of 91/92. The remainder graph here shows the major dip where the recession happened, in the otherwise relatively smooth variability line graph. Also, the value line graph that shows the full data does show a dip in its upward trend during this recession.