library(tidyverse)
library(janitor)
library(fpp3)
library(USgas)
library(readxl)
library(feasts)
library(ggplot2)
library(scales)
library(latex2exp)
library(seasonal)
library(lubridate)
set.seed(9219)Homework 2
3.1
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_gdp_per_cap <- global_economy |>
mutate(GDP_per_cap = GDP / Population / 1000) |>
arrange(desc(GDP_per_cap)) |>
filter(!(
# Removing non-countries
Code %in% c(
"WLD",
"IBT",
"LMY",
"MIC",
"IBD",
"EAR",
"LMC",
"UMC",
"EAS",
"LTE",
"EAP",
"TEA",
"SAS",
"TSA",
"IDA",
"HIC",
"OED",
"PST",
"ECS",
"NAC",
"EUU",
"EMU",
"LCN",
"TLA",
"LAC",
"TEC",
"ECA",
"MEA",
"ARB"
)
)) |>
select(Country, GDP_per_cap)
# Top per-capita GDPs all time
global_gdp_per_cap |> head()# A tsibble: 6 x 3 [1Y]
# Key: Country [2]
Country GDP_per_cap Year
<fct> <dbl> <dbl>
1 Monaco 185. 2014
2 Monaco 181. 2008
3 Liechtenstein 179. 2014
4 Liechtenstein 174. 2013
5 Monaco 173. 2013
6 Monaco 168. 2016
# Top per-capita GDPs in 2017
global_gdp_per_cap |> filter(Year == 2017) |> head()# A tsibble: 6 x 3 [1Y]
# Key: Country [6]
Country GDP_per_cap Year
<fct> <dbl> <dbl>
1 Luxembourg 104. 2017
2 Macao SAR, China 80.9 2017
3 Switzerland 80.2 2017
4 Norway 75.5 2017
5 Iceland 70.1 2017
6 Ireland 69.3 2017
# Top country each year
global_gdp_summ <- global_gdp_per_cap |>
as_tibble() |>
group_by(Year) |>
slice_max(order_by = GDP_per_cap, n = 1, with_ties = FALSE) |>
ungroup()
global_gdp_summ |> head()# A tibble: 6 × 3
Country GDP_per_cap Year
<fct> <dbl> <dbl>
1 United States 3.01 1960
2 United States 3.07 1961
3 United States 3.24 1962
4 United States 3.37 1963
5 United States 3.57 1964
6 Kuwait 4.43 1965
# Plot
global_gdp_per_cap |> autoplot() +
theme_classic() +
theme(legend.position = "none") +
labs(title = "GDP per capita by Country", x = "Year", y = "GDP per capita (1k 2019 USD)", subtitle = "Monaco has the highest GDP per capita, historically.")Monaco has generally been the world leader in per-capita GDP throughout history. In the 60’s, the US had the highest per-capita GDP, but this is due to data missing for Monaco prior to 1970. Monaco had the highest GDP per capita of all time in 2014, with Liechtenstein in second, and Luxembourg in third that year. However, in the most recent year of data, 2017, Luxembourg is highest, with Macao SAR, China in second, and Switzerland in third. The UAE beat Monaco in ’76 and ’77, and Liechtenstein beat Monaco in 2013 and 2015.
3.2
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from
global_economy.# Plot global_economy |> filter(Country == "United States") |> mutate(GDP = GDP/1e12) |> autoplot() + theme_classic() + theme(legend.position = "none") + labs(title = "U.S. GDP trends up, with slight dip during the Great Recession.", x = "Year", y = "GDP (trillions 2019 USD)")I transformed the units of the y-axis to trillions to make the large numbers easier to read.
Slaughter of Victorian “Bulls, bullocks and steers” in
aus_livestock.bulls <- aus_livestock |> filter(Animal == "Bulls, bullocks and steers") |> select(-Animal) |> mutate(Count = Count/10000) |> filter(Count > 0) bulls |> head()# A tsibble: 6 x 3 [1M] # Key: State [1] Month State Count <mth> <fct> <dbl> 1 1976 Jul Australian Capital Territory 0.23 2 1976 Aug Australian Capital Territory 0.21 3 1976 Sep Australian Capital Territory 0.21 4 1976 Oct Australian Capital Territory 0.19 5 1976 Nov Australian Capital Territory 0.21 6 1976 Dec Australian Capital Territory 0.18bulls |> autoplot() + theme_classic() + labs(title = "The Northern Territory leads in killing bulls for meat.", x = "Year", y = "10k bulls, bullocks, or steers slaughtered", subtitle = "There are distinct and consistent seasons of meat production.")I transformed the units of the y-axis to 10k bulls to make the large numbers easier to read.
Victorian Electricity Demand from
vic_elec.# Aggregate to weekly data electricity_demand_weekly <- vic_elec |> index_by(Week = yearweek(Time)) |> summarize(Demand = sum(Demand) / 100000) |> slice(-1, -n()) # Remove the first and last incomplete weeks showing misleadingly low totals # Plot weekly data electricity_demand_weekly |> autoplot() + theme_classic() + scale_x_yearweek(date_breaks = "2 month", date_labels = "%b %Y") + labs(title = "Weekly electricity demand in Victoria, Australia", x = "", y = "Total electricity demand (100k MWh)") + theme(axis.text.x = element_text(angle = 45, hjust = 1))I transformed the data to remove weekly variations by transforming the y-axis to show weekly total demand (in 100k MWh). There is a consistent annual seasonal pattern, which drops during the December holiday, then spikes in January and February (summer), and remains elevated in June and July (winter).
Gas production from
aus_production.# Plot un-transformed data: gas <- aus_production |> select(Gas) gas |> autoplot() + theme_classic() + labs(subtitle = "Production and magnitude of seasonal variation start going up in the 70's.", x = "", y = "Gas production (petajoules)") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_x_yearquarter(date_breaks = "2 year", date_labels = "%Y")I chose not to transform this data. Annual seasonality is evident, and the magnitude of its variability increases in the 70’s, along with the overall level of production. I considered transforming the data using the box-cox transformation (lambda = 0.11), to adjust the size of the seasonal variation to be about the same across the whole series, since originally, the variation that increased over the time. However, this made the plot more difficult to interpret.
3.3
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
# Graph raw data
canadian_gas |> head()# A tsibble: 6 x 2 [1M]
Month Volume
<mth> <dbl>
1 1960 Jan 1.43
2 1960 Feb 1.31
3 1960 Mar 1.40
4 1960 Apr 1.17
5 1960 May 1.12
6 1960 Jun 1.01
canadian_gas |> autoplot() +
labs(x = "",
y = "Gas production (billions of cubic meters)",
title = "Raw Monthly Canadian gas production") +
theme_classic() # Determine appropriate lambda for box cox transformation
lambda <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
# Plot box cox transformed data
canadian_gas |>
autoplot(box_cox(Volume, lambda)) +
labs(y = "Gas production (billions of cubic meters)",
x = "",
title = "Box-Cox transformed monthly Canadian gas production") +
theme_classic()The Box-Cox transformation is less helpful for this data series because the change in magnitude of the seasonality is not increasing or decreasing constantly with time. Instead, as you see in the graph of the raw data, seasonal variation increases in the late 70s and 80s, then goes back down in the 90s. The shape of the transformed data looks very similar to the raw, and the transformation doesn’t improve consistency in seasonality over time.
3.4
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
# get random retail series
random_id <- sample(aus_retail$`Series ID`, 1)
myseries <- aus_retail |>
filter(`Series ID` == random_id)
industry_name <- myseries$Industry[1]
# Determine appropriate lambda for box cox transformation
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
# Plot box cox transformed data
myseries |>
autoplot(box_cox(Turnover, lambda)) +
labs(
x = "",
title = "Australian retail trade turnover",
subtitle = industry_name,
y = latex2exp::TeX(paste0(
"Transformed monthly retail turnover ($\\lambda$ = ", round(lambda, 2), ")"
))) +
theme_classic() +
scale_x_yearmonth(breaks = "2 year", labels = label_date("%Y")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))I would use the Box-Cox transformation with lambda = 0.18 for the cafes, restaurants and takeaway food services industry, based on the guerrero feature. The trend is increasing, with slowed turnover in the early 90’s and a slight drop in the early 2010’s.
3.5
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
# Extract the series
Tobacco <- aus_production |> select(Tobacco)
Passengers <- ansett |> filter(Class == "Economy", Airports == "SYD-MEL" | Airports == "MEL-SYD") |> select(Passengers)
Count <- pedestrian |> filter(Sensor == "Southern Cross Station") |> select(Count)
series_list <- list(Tobacco, Passengers, Count)
# Loop through each series to print plot
for (s in series_list) {
# Determine appropriate lambda for Box-Cox transformation
lambda <- s |>
features(!!sym(names(s)[1]), features = guerrero) |>
pull(lambda_guerrero)
print(
paste0(
round(lambda, 5),
" is the lambda to stabilize the variance in ",
names(s)[1] |> str_to_lower(),
"."
)
)
# Plot Box-Cox transformed data
p <- s |>
autoplot(box_cox(!!sym(names(s)[1]), lambda)) +
labs(
title = latex2exp::TeX(
paste0(
"Transformed ",
names(s)[1],
" ($\\lambda$ = ",
round(lambda, 2),
")"
)
)
) +
theme_classic()
print(p)
}[1] "0.92646 is the lambda to stabilize the variance in tobacco."
[1] "1.99993 is the lambda to stabilize the variance in passengers."
[1] "-0.25016 is the lambda to stabilize the variance in count."
I determine the appropriate Box-Cox transformation parameter using the guererro feature, printing lambdas and visualizations of transformed series above.
3.7
Consider the last five years of the Gas data from
aus_production.gas <- tail(aus_production, 5*4) |> select(Gas)- Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas |> autoplot() + theme_classic() + labs(subtitle = "Gas production is seasonal (low: Q4-Q1, high Q2-Q3), trending up.", x = "", y = "Gas production (petajoules)") + scale_x_yearquarter(breaks = unique(gas$Quarter), labels = label_date("%Y Q%q")) + theme(axis.text.x = element_text(angle = 45, hjust = 1))Yes. Gas production appears seasonal (low in Q1 and Q4, high in Q2 and Q3). The trend cycle goes up over time.
- Use
classical_decompositionwithtype=multiplicativeto calculate the trend-cycle and seasonal indices.
gas_decomp <- gas |> model(classical_decomposition(Gas, type = "multiplicative")) |> components() gas_decomp |> select(trend, seasonal) |> head(5*4)# A tsibble: 20 x 3 [1Q] trend seasonal Quarter <dbl> <dbl> <qtr> 1 NA 1.13 2005 Q3 2 NA 0.925 2005 Q4 3 200. 0.875 2006 Q1 4 204. 1.07 2006 Q2 5 207 1.13 2006 Q3 6 210. 0.925 2006 Q4 7 213 0.875 2007 Q1 8 216. 1.07 2007 Q2 9 219. 1.13 2007 Q3 10 219. 0.925 2007 Q4 11 219. 0.875 2008 Q1 12 219 1.07 2008 Q2 13 219 1.13 2008 Q3 14 220. 0.925 2008 Q4 15 222. 0.875 2009 Q1 16 223. 1.07 2009 Q2 17 225. 1.13 2009 Q3 18 226 0.925 2009 Q4 19 NA 0.875 2010 Q1 20 NA 1.07 2010 Q2gas_decomp |> autoplot() + labs( title = "Classical multiplicative decomp: Australian gas production", subtitle = "Gas production is seasonal (low: Q4-Q1, high Q2-Q3), trending up.", x = "", y = "Gas production (petajoules)" ) + theme_classic() + scale_x_yearquarter(breaks = unique(gas$Quarter), labels = label_date("%Y Q%q")) + theme(axis.text.x = element_text(angle = 45, hjust = 1))I display the trend-cycle and seasonal indices in tabular form before graphing all components of the decomposition above.
- Do the results support the graphical interpretation from part a?
Yes, the trend-cycle is trending upward, as interpreted in part a. The seasonality displayed is also consistent with part a (low in Q1 and Q4, high in Q2 and Q3).
- Compute and plot the seasonally adjusted data.
gas_decomp |> select(season_adjust) |> autoplot() + labs( title = "Seasonally adjusted gas production", x = "", y = "Gas production (petajoules)" ) + theme_classic() + scale_x_yearquarter(breaks = unique(gas$Quarter), labels = label_date("%Y Q%q")) + theme(axis.text.x = element_text(angle = 45, hjust = 1))- Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas_outlier <- gas |> mutate(Gas = case_when(Quarter == yearquarter("2010 Q2") ~ Gas + 300, TRUE ~ Gas)) gas_outlier |> model(classical_decomposition(Gas, type = "multiplicative")) |> components() |> select(season_adjust) |> autoplot() + labs( title = "Seasonally adjusted gas production, with outlier in Q2 of 2010.", x = "", y = "Gas production (petajoules)" ) + theme_classic() + scale_x_yearquarter(breaks = unique(gas$Quarter), labels = label_date("%Y Q%q")) + theme(axis.text.x = element_text(angle = 45, hjust = 1))The outlier can be seen clearly in the seasonally-adjusted data, since it contains the remainder component (noise from outliers are not removed by seasonal adjustments–just the variation due to seasonality).
- Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas_outlier_end <- gas |>
mutate(Gas = case_when(Quarter == yearquarter("2008 Q1") ~ Gas + 300, TRUE ~ Gas))
gas_outlier_end |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
select(season_adjust) |>
autoplot() +
labs(
title = "Seasonally adjusted gas production, with outlier in Q1 of 2008.",
subtitle = "Regardless if in the middle or end of the data, trend is skewed by outlier.",
x = "",
y = "Gas production (petajoules)"
) +
theme_classic() +
scale_x_yearquarter(breaks = unique(gas$Quarter), labels = label_date("%Y Q%q")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))No, it does not make any difference whether the outlier is at the end or in the middle of the data–data is still skewed by outliers, because the seasonally adjusted series contains the remainder component as well as the trend-cycle.
3.8
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components() |> autoplot() +
labs(
title = "X-11 decomposition of Australian retail trade turnover",
subtitle = industry_name) +
theme_classic() +
scale_x_yearmonth(breaks = "2 year", labels = label_date("%Y")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Previously, from the box-cox data, I noted the trend was increasing, with slowed turnover in the early 90’s and a slight drop in the early 2010’s. This observation is consistent with the trend-cycle graph depicted above. Something that I could only see clearly with this decomposed version is that the seasonality has relatively higher peaks in the 80’s and early 90s, then the height of peaks decrease in the mid-90’s, and seasonal troughs get lower in 2009 and onward.
3.9
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Figure 3.20: Seasonal component from the decomposition shown in the previous figure.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
The trend in the Australian labor force goes upwards from 1978 to 1995. There is seasonality: January and August have the lowest counts, corresponding to holiday periods and reduced employment activity. There is a shock in 1991, which can be seen as a dip in the overall data values and a major low outlier in the remainder component (this aligns with the early 1990s recession in Australia). The magnitude of seasonal variation increases with time according to the season_year component graphed above.
Is the recession of 1991/1992 visible in the estimated components?
Yes, it is visible in the dip in the raw data, as well as the much larger dip in the remainder.