Plotting the GDP per capita for all countries over time generates a rather cluttered graph with a legend that must be omitted due to the shear size of it. There are many ways to simplify this graph, but for the purpose of this question we only care about the countries with the “highest” GDP per capita. Given that this can be interpreted different ways depending on whether we want to track changes over time or simply take the highest value achieved, we will define it as the countries with the highest GDP per capita in the final year with data.
gdp_per_cap <- global_economy |>
mutate(gdp_per_capita = GDP/Population) |>
select(Country, Year, gdp_per_capita)
gdp_per_cap |>
ggplot() +
geom_line(aes(x = Year, y = gdp_per_capita, color = Country), show.legend = FALSE)
Restricting our plot to only these countries, we can see that Luxembourg has the highest GDP per capita both in terms of the peak value and the highest value in the end year. While the early years of data saw a fair amount of jockying Luxembourg has held this top spot since the early 1990s.
top_five <- gdp_per_cap |>
filter(Year == 2017) |>
slice_max(gdp_per_capita, n = 5) |>
as_tibble() |>
select(Country)
inner_join(gdp_per_cap, top_five) |>
ggplot() +
geom_line(aes(x = Year, y = gdp_per_capita, color = Country))
The raw GDP growth for the United Sates is extremely smooth over the period in question with only a small drop during the 2008 financial crisis. This is deceptive though, and would benefit from a few adjustments.
global_economy |>
filter(Country == "United States") |>
ggplot(aes(x = Year, y = GDP / 1000000000)) +
geom_line() +
labs(title = "United States GDP Growth",
y = "GDP (Billions of Dollars)")
Looking at the GDP for the United States there are two adjustments that can be made to give us a much better understanding of the changes in the economy. The first is to adjust for inflation and get the real GDP in 2010 dollars, and the second is to then calculate the real GDP per capita. Calculating these values gives us a better understanding of the behavior of the economy by adjusting for changes due to both inflation and population growth. These adjustments accurately depict economic shocks beyond the 2008 financial crises such as the dot com bust and the 70s oil crisis.
global_economy |>
filter(Country == "United States") |>
mutate(real_gdp = GDP / CPI * 100, real_gdp_per_capita = real_gdp/Population) |>
ggplot(aes(x = Year, y = real_gdp_per_capita)) +
geom_line() +
labs(title = "United States Real GDP per Capita",
y = "Real GDP per Capita")
Looking at the slaughter numbers for bulls in Victoria we see large monthly changes in the data. We could decompose the seasonal component and look at the trend, but a simpler transformation for such a long window would be to sum the total counts per year.
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") |>
select(-Animal) |>
ggplot(aes(x = Month, y = Count)) +
geom_line() +
labs(title = "Bulls Slaughtered in Victoria, Australia",
y = "Total Slaughtered")
It is arguable whether it is better to decompose the seasonal component or simply annualize the data. I posit that it is better to start with the annualized data as the slaughter numbers are seasonal in both a monthly and quarterly manner. This can make generating the “best” decomposition a bit subjective, and we can still see interesting year to year trends.
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") |>
select(-Animal) |>
index_by(Year = ~ year(.)) |>
summarise(sum_count = sum(Count)) |>
ggplot(aes(x = Year, y = sum_count)) +
geom_line() +
labs(title = "Bulls Slaughtered in Victoria, Australia",
y = "Total Slaughtered")
The raw vic_elec data is pretty hard to interpret due to the sheer number of data points. We can see some rough seasonality but the shear scale of the variations masks any further analysis.
vic_elec |>
select(Time, Demand, Date) |>
ggplot(aes(x = Date, y = Demand)) +
geom_line()
If we adjust the data to be weekly then we can get a clearer picture of the data. The weekly window was chosen to eliminate the daily fluctuations as well as the weekday-weekend variation. We can still see some seasonality that tracks with the changes from summer to winter but the anamolous periods are easier to see. Transforming this way highlights some periods that may merit exploring further such as the peak in early 2014 or the troughs at the end of years.
vic_elec |>
select(Time, Demand, Date) |>
index_by(YearWeek = yearweek(Date)) |>
summarise(weekly_demand = sum(Demand)) |>
ggplot(aes(x = YearWeek, y = weekly_demand)) +
geom_line()
Looking at the aus_production gas data we see an obvious candidate for a Box-Cox transformation.
aus_production |>
select(Quarter, Gas) |>
ggplot(aes(x = Quarter, y = Gas)) +
geom_line()
By transforming in this way we make forecasting models on this data simpler.
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda))
Looking at the raw canadian_gas data we can immediately see that a box-cox transformation will be unhelpful due to the fact that the seasonal variation does not have a consistent increasing or decreasing trend over the period in question and instead has periods of both increasing and decreasing variance.
canadian_gas |>
select(Month, Volume) |>
ggplot(aes(x = Month, y = Volume)) +
geom_line()
By graphing the transformation we can see the data is largely unchanged as the variance cannot be effectively normalized with a box-cox.
lambda <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume, lambda))
Selecting a random seed for the aus_retail data as specified in exersize 2.7 we see a series with rapidly increasing variance in the seasonality.
set.seed(38472)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |>
autoplot()
By using the guerrero feature we can easily find a lambda value for the built in Box-Cox transformation that effectively normalizes the variance of the data. The chosen function does, however, create a bit of an inverse trend which may not be desirable. Increasing the lambda value to mitigate this would result in a larger variance but would reduce the distortion of the trend in the data. Which outcome is preferable depends entirely on the goal of the transformation, and without further context I would conclude the value of -0.22 to be reasonable.
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries |>
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Queensland Retail Data with $\\lambda$ = ",
round(lambda,2))))
Looking at the aus_production tobacco data we can see the variance does not have a significant increasing or decreasing trend. It is a relatively stable amplitude, and we can therefore assume any box-cox function would have a lambda value close to 1.
tobacco <- aus_production |>
select(Quarter, Tobacco) |>
drop_na()
tobacco |>
autoplot()
Using the guerrero function we come up with a value of 0.93, which is so close to 1 I would likely use the untransformed data instead.
lambda_tob <- tobacco |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
tobacco |>
autoplot(box_cox(Tobacco, lambda_tob)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco Production with $\\lambda$ = ",
round(lambda_tob,2))))
The economy passenger counts from Melbourne to Sydney show overall consistent variance except for some major anomolies due to a pilots strike and a promotional upgrade program that was run temporarily. If we ignored these periods (which we would likely want to for modeling purposes) we would expect another lambda value close to 1 since there is not a major increasing or decreasing trend in the variance.
economy_passengers <- ansett |>
filter(Airports == "MEL-SYD", Class == "Economy") |>
select(Week, Passengers) |>
drop_na()
economy_passengers |>
autoplot()
When we run the guerrero function we recieve a lambda of 2 which would be reasonable if we were set on including all data for the period. We can see that the massive drop to 0 in the late 1980s has technically been normalized, but the variance of all other periods has actually been exagerated. If we are strictly trying to normalize variance then the lambda of 2 is a good choice, but a better approach for any practical application would have to first address the period of the pilots strike by either excluding it from the sample or including some kind of adjustment for expected values.
lambda_econ <- economy_passengers |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
economy_passengers |>
autoplot(box_cox(Passengers, lambda_econ)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Airline Passenger Counts with $\\lambda$ = ",
round(lambda_econ,2))))
Looking at the pedestrian count for the Southern Cross Station we get a plot that is largely unintelligable due to the volume of data and high variance. In the context of the question we will not window or otherwise alter the data to make it more informative. We can see a rather large variance as one may expect from transit use. Thus we would expect a rather large adjustment to be needed.
southern_cross <- pedestrian |>
filter(Sensor == "Southern Cross Station") |>
select(-Sensor) |>
drop_na()
southern_cross |>
ggplot(aes(x = Date_Time, y = Count)) +
geom_line()
Using the guerrero function we end up with a rather nicely normalized plot considering the initial input. Using the lambda -0.25 we have significantly decreased the overall variance.
lambda_pass <- southern_cross |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
southern_cross |>
autoplot(box_cox(Count, lambda_pass)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Southern Cross Traffic with $\\lambda$ = ",
round(lambda_pass,2))))
There is a positive trend in the gas data over the period in question with strong annual seasonality.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas |>
autoplot()
Using multiplicative classical decomposition we can break out the trend and seasonal indices. A plot is also included to make the trends easier to understand
decomposed_gas <- gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components()
decomposed_gas |>
select(Quarter, trend, seasonal)
## # A tsibble: 20 x 3 [1Q]
## Quarter trend seasonal
## <qtr> <dbl> <dbl>
## 1 2005 Q3 NA 1.13
## 2 2005 Q4 NA 0.925
## 3 2006 Q1 200. 0.875
## 4 2006 Q2 204. 1.07
## 5 2006 Q3 207 1.13
## 6 2006 Q4 210. 0.925
## 7 2007 Q1 213 0.875
## 8 2007 Q2 216. 1.07
## 9 2007 Q3 219. 1.13
## 10 2007 Q4 219. 0.925
## 11 2008 Q1 219. 0.875
## 12 2008 Q2 219 1.07
## 13 2008 Q3 219 1.13
## 14 2008 Q4 220. 0.925
## 15 2009 Q1 222. 0.875
## 16 2009 Q2 223. 1.07
## 17 2009 Q3 225. 1.13
## 18 2009 Q4 226 0.925
## 19 2010 Q1 NA 0.875
## 20 2010 Q2 NA 1.07
decomposed_gas |>
autoplot()
The results support the original interpretation that there is a positive overall trend with annual seasonality.
decomposed_gas |>
ggplot(aes(x = Quarter, y = season_adjust)) +
geom_line() +
labs(title = "Seasonally Adjusted Australian Gas Production",
y = "Gas Production (PJ)")
By changing the value of the first quarter of 2008 to a large outlier, we can see the decomposition drastically changes. The outlier causes the seasonally adjusted component to include an obvious degree of seasonality and no longer reflect the general positive trend as clearly.
gas_update_middle <- gas |>
rows_update(y = tsibble(Quarter = yq("2008 Q1"), Gas = 500))
gas_update_middle |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
ggplot(aes(x = Quarter, y = season_adjust)) +
geom_line() +
labs(title = "Seasonally Adjusted Australian Gas Production",
y = "Gas Production (PJ)")
When we move the outlier to the end we can see the seasonally adjusted data is altered in a different manner than before. While we still see some hints of seasonality in the seasonally adjusted data it is less pronounced than when the outlier is in the middle. The general positive trend is still masked by the large outlier however.
gas_update_end <- gas |>
rows_update(y = tsibble(Quarter = yq("2010 Q1"), Gas = 550))
gas_update_end |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
ggplot(aes(x = Quarter, y = season_adjust)) +
geom_line() +
labs(title = "Seasonally Adjusted Australian Gas Production",
y = "Gas Production (PJ)")
There is a particularly interesting change in the trend that occured in the late 2000s that I did not notice before. This relatively sharp decline does not recover until a little over five years later and appears to coincide with an increase in the variance of the seasonal component. Looking at the plot of the raw data I would have assumed the increased seasonal variation did not start until much later than it actually did.
myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components() |>
autoplot()
The trend makes up the vast majority of the composition and is largely positive with a period of stagnation in the early 1990s. While there is a clear and relatively consistent seasonal component to the composition it is by far the smallest component. The remainder has an interesting outlier in the early 1990s that coincides with the beginning of the plateau period in the trend. When looking at the seasonal component of the decomposition in figure 3.20 we see some interesting trends with Spring months peaking in the mid 1980s before declining sharply. Other seasons show interesting trends as well which appear to indicate a gradual shifting of the seasonal trend.
Yes the 1991/1992 recession is clearly visible in the remainder component of the decomposition indicating that it is an anomolous event unrelated to the overall seasonal trends.