library(fpp3)
library(gridExtra)
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
The country with the highest GDP is Monaco, followed by Liechtenstein and Luxembourg. Since the start of the year 2000, Monaco and Liechtenstein have become closer in comparison in terms of GDP per capita, and since 2010, the gap between them and the rest of the countries has widened.
global_economy |>
mutate(gdp_per_capita = GDP / Population) |>
ggplot(aes(x=Year, y=gdp_per_capita, colour=Country, label=Country)) +
geom_line() +
geom_text(data = . %>%
group_by(Country) %>%
filter(gdp_per_capita == max(gdp_per_capita, na.rm = TRUE)) %>%
arrange(desc(gdp_per_capita)) %>%
head(3),
aes(label = Country),
vjust="bottom") +
theme(legend.position="none") +
labs(title = "Countries' GDP per Capita",
y = "GDP per Capita")
Country | gdp_per_capita | Year |
---|---|---|
Monaco | 185152.53 | 2014 |
Liechtenstein | 179308.08 | 2014 |
Luxembourg | 119225.38 | 2014 |
Norway | 103059.25 | 2013 |
Macao SAR, China | 94004.39 | 2014 |
Bermuda | 93605.75 | 2008 |
San Marino | 90682.58 | 2008 |
Isle of Man | 89941.64 | 2014 |
Qatar | 88564.82 | 2012 |
Switzerland | 88415.63 | 2011 |
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
global_economy |>
filter(Country == "United States") |>
autoplot(GDP) +
labs(title = "United States Yearly GDP")
We observe an exponential curve as the years progress. A transformation is necessary where we can align this growth more closely with the population. A GDP per Capita measure will be established. We can see the curve smoothes out more compared to the drastic change we were seeing earlier.
global_economy |>
filter(Country == "United States") |>
mutate(gdp_per_capita = GDP / Population) |>
autoplot(gdp_per_capita) +
labs(y = "",
title = "United States GDP per Capita")
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers",
State == "Victoria") |>
autoplot(Count) +
labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers")
We can observe some larger variations across the seasons in this time series, especially in the early years. Initially, a Box-Cox transformation could be considered, but these variations did not persist within the seasons.
vic_elec |>
autoplot(Demand) +
labs(title = "Victorian Electricity Demand",
y = 'MWh')
The graph illustrates the seasonal trends between the months, with significant increases in demand during the early months followed by a smoother rise around mid-year. However, since we cannot discern what might be occurring within these months, and considering that we are plotting in thirty-minute intervals, it would be preferable to scale it up to a daily basis.
vic_elec |>
group_by_key() |>
index_by(Date = ~ as_date(.)) |>
summarise(daily_demand = sum(Demand)) |>
autoplot(daily_demand) +
labs(title = "Victorian Electricity Demand",
y = 'MWh')
aus_production |>
autoplot(Gas) +
labs(title = "Gas Manufacturing Production",
y = 'Petajoules')
We can observe a significant variation from the earlier years, suggesting that a Box-Cox transformation would be appropriate to use here. Upon applying it, we notice the seasonal variation becoming more consistent across the time series.
lambda <-
aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(
paste0("Transformed Gas Manufacturing Production with $\\lambda$ = ",
round(lambda,2))))
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
We can see that the Box-Cox transformation did not help with the larger seasonal variation between 1975-1990.
plot_1 <-
canadian_gas |>
autoplot(Volume)+
labs(title = 'Canadian Gas')
lambda <-
canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
plot_2 <-
canadian_gas |>
autoplot(box_cox(Volume, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Gas with $\\lambda$ = ",
round(lambda,2))))
grid.arrange(plot_1, plot_2, ncol=2)
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(42)
retail <-
aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
retail |>
autoplot(Turnover) +
labs(title = "Retail Trade Turnover",
subtitle = "Western Australia Newspaper and Book",
y = "Million (AUD $)")
lambda <-
retail |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
retail |>
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Retail Trade Turnover with $\\lambda$ = ",
round(lambda,2))))
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.
aus_production |>
autoplot(Tobacco) +
labs(title = "Tobacco Production")
lambda <-
aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco Production with $\\lambda$ = ",
round(lambda,2))))
ansett |>
filter(Class == "Economy",
Airports == "MEL-SYD") |>
autoplot(Passengers) +
labs(title = "MEL-SYD Passenger Counts")
lambda <-
ansett |>
filter(Class == "Economy",
Airports == "MEL-SYD") |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
ansett |>
filter(Class == "Economy",
Airports == "MEL-SYD") |>
autoplot(box_cox(Passengers, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed MEL-SYD Passenger Counts with $\\lambda$ = ",
round(lambda,2))))
The original hourly time period makes this graph difficult to interpret the variance issues. We will look at this at a daily level and then transform it to see the differeces.
pedestrian |>
filter(Sensor == "Southern Cross Station") |>
autoplot(Count) +
labs(title = "Southern Cross Station Pedestrian Counts")
pedestrian_daily <-
pedestrian |>
filter(Sensor == "Southern Cross Station") |>
group_by_key() |>
index_by(Date = ~ as_date(.)) |>
summarise(daily_count = sum(Count))
pedestrian_daily |>
autoplot(daily_count) +
labs(title = "Southern Cross Station Daily Pedestrians")
lambda <-
pedestrian_daily |>
filter(Sensor == "Southern Cross Station") |>
features(daily_count, features = guerrero) |>
pull(lambda_guerrero)
pedestrian_daily |>
filter(Sensor == "Southern Cross Station") |>
autoplot(box_cox(daily_count, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Southern Cross Station Daily Pedestrians with $\\lambda$ = ",
round(lambda,2))))
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
knitr::kable(head(gas))
Gas | Quarter |
---|---|
221 | 2005 Q3 |
180 | 2005 Q4 |
171 | 2006 Q1 |
224 | 2006 Q2 |
233 | 2006 Q3 |
192 | 2006 Q4 |
We see every six months there are peaks during Q3 and troughs during Q1. As for a trend-cycle, there’s a small positive increase as the time series progresses.
gas |>
autoplot(Gas)
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
autoplot() +
labs(title = "Classical multiplicative decomposition of gas production")
We see the same results when we decompose the trend-cycle and seasonal components.
dcmp <-
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")) |>
components(dcmp) |>
as_tsibble()
dcmp |>
autoplot(season_adjust) +
labs(
y = "",
title = "Seasonally Adjusted Gas Production"
)
The outlier changes other observations to be about -10 to +3 more different when the last value is an outlier.
gas_outlier_last <-
gas
gas_outlier_last[nrow(gas_outlier_last), 1] <- gas_outlier_last[nrow(gas_outlier_last), 1] + 500
dcmp_outlier_last <-
gas_outlier_last |>
model(
classical_decomposition(Gas, type = "multiplicative")) |>
components(dcmp) |>
as_tsibble()
season_adjust_diff <- dcmp['season_adjust'] - dcmp_outlier_last['season_adjust']
knitr::kable(season_adjust_diff, align='l')
season_adjust |
---|
2.470092 |
-8.605846 |
2.458013 |
2.624169 |
2.604215 |
-9.179569 |
2.688002 |
2.741319 |
2.738337 |
-9.801102 |
2.788623 |
2.682744 |
2.783045 |
-9.705482 |
2.817372 |
2.788180 |
2.816575 |
-10.040154 |
2.946741 |
-456.939722 |
There’s not much of a difference where the outlier is located
gas_outlier_middle <-
gas
gas_outlier_middle[nrow(gas_outlier_middle), 1] <- gas_outlier_middle[nrow(gas_outlier_middle) / 2, 1] + 500
dcmp_outlier_middle <-
gas_outlier_middle |>
model(
classical_decomposition(Gas, type = "multiplicative")) |>
components(dcmp) |>
as_tsibble()
season_adjust_diff_2 <- dcmp['season_adjust'] - dcmp_outlier_middle['season_adjust']
knitr::kable(list(season_adjust_diff, season_adjust_diff_2), align='l')
|
|
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
We noticed a positive trend from 1975-2005 and some possible yearly seasonal trends. Running x11 we see that the trend went further out to 2007. We also see the seasonal trend shrank in the middle of the time series and then grew to larger variances than before. As for the irregular series, we also notice a few outliers not noticeable from 2010-2013.
set.seed(42)
x11_dcmp <-
aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Western AUS Turnover using X-11.")
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Figure 3.20: Seasonal component from the decomposition shown in the previous figure.
We can see a strong, positive trend throughout the series. This accounts for a major part of the gains throughout the time series. The seasonal fluctuations are about -100 to 100. The major change is the drop that was around 1991-1992. It was removed as irregular patterns and not included in the trend model.
We can see this recession only in the irregular graph and was removed from the trend series.