library(fpp3)
library(tidyverse)
library(ggrepel)
library(seasonal)
3.1 Consider the GDP information in global_economy. Plot
the GDP per capita for each country over time. Which country has the
highest GDP per capita? How has this changed over time?
max_gdp = global_economy |>
filter(Year == 2017) |>
slice_max(order_by = GDP / Population, n = 3)
global_economy |>
autoplot(GDP / Population) +
theme_bw() +
theme(legend.position = "none") +
geom_label_repel(data = max_gdp, aes(label = Country), min.segment.length = 0,
direction = "y", nudge_x = 10) +
labs(x = "Year", y = "GDP per Capita ($USD)", title = "GDP per Capita Since 1960")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
Based on the plot above, the top country in terms of GPD per Capita in 2017 was Luxembourg. However, these labels don’t include the 2 highest countries from previous years. As a result, the data was looked at further to find the maximum instances, regardless of year.
global_economy |>
mutate(GDP_per_Capita = GDP / Population) |>
arrange(desc(GDP_per_Capita))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
Looking at the above tsibble, 2 of the top 3 years overall in
terms of GDP per Capita occurred in 2014, one by Monaco and one by
Liechtenstein. The reason neither of these countries were included in
the original plot, given that they were clearly the top 2 countries in
terms of GDP per Capita over the previous 20+ years, is that both Monaco
and Liechtenstein have an NA value for the GDP, as seen in
the tsibble below. Thus, the GDP per Capita could not be calculated for
these 2 countries in 2017.
global_economy |>
filter(Year == 2017, Country %in% c("Monaco", "Liechtenstein"))
In general, the GDP per Capita has increased over time for most countries. Additionally, the spread between the highest and lowest countries has increased over time. Finally, based on the tsibble below, the country with the highest GDP per Capita was the United States in 1960, whereas in 2017, as previously discussed, it was Luxembourg.
global_economy |>
filter(Year == 1960) |>
mutate(GDP_per_Capita = GDP / Population) |>
arrange(desc(GDP_per_Capita))
3.2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
global_economy.aus_livestock.vic_elec.aus_production.global_economy |>
filter(Country == "United States") |>
autoplot(GDP) +
labs(x = "Year", y = "GDP ($USD)", title = "GDP in US Since 1960") +
theme_bw()
global_economy |>
filter(Country == "United States") |>
autoplot(GDP / Population) +
labs(x = "Year", y = "GDP per Capita ($USD)", title = "GDP per Capita in US Since 1960") +
theme_bw()
The first plot above shows the US ADP since 1960, whereas the second plot shows the US GDP per Capita since 1960, where the GDP was divided by the US population. Based on the two plots above, the general trend does not change. That is, they both have a positive trend, with a slight dip around 2008, followed by the continued positive trend.
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") |>
autoplot(Count / 1000) +
labs(x = "Month", y = "Number of Animals ('000s)",
title = "Number of Bulls, Bullocks, and Steers Slaugtered in Victoria since 1976") +
theme_bw()
The above plot shows a general decreasing trend in total number of bulls, bullocks, and steers slaugtered in Victoria since 1976. The total count was divided by 1000, simply for readability on the plot. This only affects the scale of the plot, not the general trends or seasonality.
vic_elec |>
autoplot(Demand) +
labs(y = "Demand (MWh)",
title = "Half-Hourly Electricity Demand in Victoria since 2012") +
theme_bw()
elec_lambda = vic_elec |>
features(Demand, features = guerrero) |>
pull(lambda_guerrero)
vic_elec |>
autoplot(box_cox(Demand, elec_lambda)) +
labs(y = "",title = latex2exp::TeX(paste0(
"Transformed Half-Hourly Electricity Demand in Victoria with $\\lambda$ = ",
round(elec_lambda,2)))) +
theme_bw()
The first plot above shows the half-hourly electricity demand for Victoria since 2012. The second plot above shows the same data, but transformed using a Box-Cox transformation and a \(\lambda\) value of 0.1. Comparing the 2 plots, the higher peaks in the first plot are reduced to be more in line with the other peaks. Otherwise, the rest of the transformed plot is similar to the original plot.
aus_production |>
autoplot(Gas) +
labs(y = "Production (petajoules)",
title = "Quarterly Gas Production in Australia since 1956") +
theme_bw()
gas_lambda = aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, gas_lambda)) +
labs(y = "", title = latex2exp::TeX(paste0(
"Transformed Quarterly Gas Production in Australia since 1956 using $\\lambda$ = ",
round(gas_lambda,2)))) +
theme_bw()
The first plot above shows the quarterly gas production in Australia since 1956. The second plot above shows the same data, but transformed using a Box-Cox transformation and a \(\lambda\) value of 0.11. Comparing the 2 plots, the higher peaks and lower troughs in the first plot are brought more in line with one another. For example, in the original plot, the data before 1970 has very small peaks, and the data after 1990 has larger peaks. In the transformed plot, the peaks throughout all of the data are about the same.
3.3 Why is a Box-Cox transformation unhelpful for the
canadian_gas data?
canadian_gas |>
autoplot() +
labs(y = "Production (billion cubic metres)",
title = "Monthly Gas Production in Canada sinc 1960") +
theme_bw()
## Plot variable not specified, automatically selected `.vars = Volume`
can_gas_lambda = canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume, can_gas_lambda)) +
labs(y = "", title = latex2exp::TeX(paste0(
"Transformed Monthly Gas Production in Canada since 1960 using $\\lambda$ = ",
round(can_gas_lambda,2)))) +
theme_bw()
Above are 2 plots of the monthly Canadian gas production. The
first shows the data as is, whereas the second plot shows the data
that’s been transformed using a Box-Cox transformation. Based on the two
plots above, the Box-Cox transformation did not modify seasonality
enough to bring the peaks in line with one another. As a result, a
Box-Cox transformation is not useful for the canadian_gas
data.
3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(9304)
aus_retail_sample <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
series_id = aus_retail_sample$`Series ID`[1]
aus_retail_sample |>
autoplot(Turnover) +
labs(y = "Turnover ($Million AUD)", title = latex2exp::TeX(paste0(
"Monthly Retail Turnover for Series ID ", series_id))) +
theme_bw()
retail_lambda = aus_retail_sample |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
aus_retail_sample |>
autoplot(box_cox(Turnover, retail_lambda)) +
labs(y = "", title = latex2exp::TeX(paste0(
"Transformed Monthly Retail Turnover for Series ID ", series_id,
" using $\\lambda$ = ", round(retail_lambda,2)))) +
theme_bw()
Based on the analysis above using the guerrero
feature, a \(\lambda\)
value of -0.03. The above 2 plots show the differences between
the non-transformed and transformed data. The second plot with the
transformed data shows that the variation in seasonality has been
minimized over time.
3.5 For the following series, find an appropriate Box-Cox
transformation in order to stabilise the variance. Tobacco from
aus_production, Economy class passengers between Melbourne
and Sydney from ansett, and Pedestrian counts at Southern
Cross Station from pedestrian.
tobacco_lambda = aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
tobacco_lambda
## [1] 0.9264636
For the Tobacco series from aus_production,
using the guerrero method, the lambda value for the Box-Cox
transformation is about 0.93.
economy_lambda = ansett |>
filter(Airports == "MEL-SYD", Class == "Economy") |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
economy_lambda
## [1] 1.999927
For the Economy series from ansett, filtered for
flights between Melbourne and Sydney, using the guerrero
method, the lambda value for the Box-Cox transformation is about
2.
pedestrian_lambda = pedestrian |>
filter(Sensor == "Southern Cross Station") |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
pedestrian_lambda
## [1] -0.2501616
For the Pedestrian series from pedestrian,
filtered by Southern Cross Station, using the guerrero
method, the lambda value for the Box-Cox transformation is about
-0.25.
3.7 Consider the last five years of the Gas data from
aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas |>
autoplot() +
labs(x = "Quarter", y = "Production (petajoules",
title = "Quarterly Gas Production since Q3 2005") +
theme_bw()
## Plot variable not specified, automatically selected `.vars = Gas`
Based on the above plot, there is seasonality throughout the year, with a peak in Q3, and a trough in Q1. There is also a positive trend over the past 5 years.
classical_decomposition with
type=multiplicative to calculate the trend-cycle and
seasonal indices.gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
autoplot() +
labs(title = "Classical Decomposition of Gas Production") +
theme_bw()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
The second plot above shows the calculated trend-cycle, whereas the third plot above shows the seasonal component, which includes the seasonal indices.
Yes, the seasonal component in the plot above shows a low in Q1, followed by a peak in Q3, which is consistent over time.
gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
as_tsibble() |>
autoplot(Gas, color = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(x = "Quarter", y = "Production (petajoules)",
title = "Seasonally Adjusted Gas Production") +
theme_bw()
gas_outlier = gas |>
mutate(Outlier = case_when(
Quarter == yearquarter("2008 Q2") ~ Gas + 300,
.default = Gas)
)
gas_outlier |>
autoplot(Outlier) +
labs(x = "Quarter", y = "Production (petajoules",
title = "Quarterly Gas Production since Q3 2005 with Outlier in 2008 Q2") +
theme_bw()
gas_outlier |>
model(classical_decomposition(Outlier, type = "multiplicative")) |>
components() |>
as_tsibble() |>
autoplot(Outlier, color = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(x = "Quarter", y = "Production (petajoules)",
title = "Seasonally Adjusted Gas Production with Outlier in 2008 Q2") +
theme_bw()
When adding an outlier into the dataset, not only is there a large spike in the seasonally adjusted plot, but the remainder of the seasonally adjusted plot is not as smooth as the original data. Additionally, there is not a clear positive trend like there was in the original plot.
gas_outlier = gas |>
mutate(Outlier = case_when(
Quarter == yearquarter("2010 Q1") ~ Gas + 300,
.default = Gas)
)
gas_outlier |>
model(classical_decomposition(Outlier, type = "multiplicative")) |>
components() |>
as_tsibble() |>
autoplot(Outlier, color = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(x = "Quarter", y = "Production (petajoules)",
title = "Seasonally Adjusted Gas Production with Outlier in 2010 Q1") +
theme_bw()
Looking at the above plot with the outlier towards the end, it shows that there is a difference in the seasonally adjusted plot. That is, there is still a large peak near the outlier, but the remainder of the plot is smoother than the plot with the outlier in the middle. Additionally, the plot above with the outlier at the end shows more of a positive trend than the plot with the outlier in the middle.
3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
aus_retail_sample |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components() |>
autoplot() +
labs(title = latex2exp::TeX(paste0(
"X11 Decomposed Retail Data for Series ID ", series_id))) +
theme_bw()
Looking at the X-11 decomposition above, it appears there are
some outliers, based on the irregular portion of the plot.
That is, there are high peaks some time in 1994, 2000-2001, and then a
low peak in 2002-2003. These were not evident in the original plots
generated.
3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Looking at the trend portion of
the STL decomposition above, there is a clear positive trend. Based on
the scales of both the original plot and the trend
plot, this increases from about 6500 to about 9000 from 1978 to
1995. The season_year portion of the STL
decomposition above shows there is annual seasonality within the data,
with the lowest civilian labor numbers being in January and July, and
the highest being in March and December. Based on the scale of the
seasonality plot, this accounts for +/- 100 civilians from the general
trend. The remainder portion of the STL
decomposition shows that the remainder of the variation throughout the
data accounted for roughly +/- 100 civilians after accounting for both
the trend and seasonality. The exception to this is in approximately
1991-1992, where the remainder accounts for a drop in around 400
civilians.
Yes, the 1991/1992 recession is visible in the remainder portion of the STC decomposition, where there was a large drop in civilian labor numbers of close to 400 civilians. In all other years, the most the civilian labor numbers dropped–after accounting for both the trend and seasonality–was by about 125 civilians.