First, let’s take a look at our global_economy
dataset:
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
First, we’ll need to calculate the GDP per capita by time period
global_economy <- global_economy %>% mutate(gdp_per_capita = GDP / Population)
# Plot GDP per capita per country over time
autoplot(global_economy, gdp_per_capita)
## Warning: Removed 3242 rows containing missing values (`geom_line()`).
Now let’s find the countrieswith the highest GDP per capita. Looking at
our sorted result, it looks like micro-nations like Monaco,
Liechenstein, and Luxembourg top the list.
# Find GDP per Capita by Country, then sort on result
highest_gdps <- global_economy %>%
group_by(Country) %>%
arrange(desc(gdp_per_capita))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
highest_gdps
## # A tsibble: 15,150 x 10 [1Y]
## # Key: Country [263]
## # Groups: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7060236168. 7.18 NA NA NA 38132
## 2 Monaco MCO 2008 6476490406. 0.732 NA NA NA 35853
## 3 Liechtenstein LIE 2014 6657170923. NA NA NA NA 37127
## 4 Liechtenstein LIE 2013 6391735894. NA NA NA NA 36834
## 5 Monaco MCO 2013 6553372278. 9.57 NA NA NA 37971
## 6 Monaco MCO 2016 6468252212. 3.21 NA NA NA 38499
## 7 Liechtenstein LIE 2015 6268391521. NA NA NA NA 37403
## 8 Monaco MCO 2007 5867916781. 14.4 NA NA NA 35111
## 9 Liechtenstein LIE 2016 6214633651. NA NA NA NA 37666
## 10 Monaco MCO 2015 6258178995. 4.94 NA NA NA 38307
## # ℹ 15,140 more rows
## # ℹ 1 more variable: gdp_per_capita <dbl>
Not we can plot our results, let’s look at Monaco’s GDP per capita over time. We see a strong upward trend since the start of our dataset (1970)
monaco <- global_economy %>%
filter(Country == "Monaco")
# Plot results for Monaco
autoplot(monaco) + labs(x="Year", y="GDP per Capita (USD)", title="Monaco GDP per Capita over Time")
## Plot variable not specified, automatically selected `.vars = GDP`
## Warning: Removed 11 rows containing missing values (`geom_line()`).
First, plotting the USA GDP over time. There’s not much seasonal variation in this dataset, so a transformation isn’t likely necessary.
usa <- global_economy %>% filter(Country == "United States")
autoplot(usa)
## Plot variable not specified, automatically selected `.vars = GDP`
Next, we can plot from aus_livestock
bulls <- aus_livestock %>% filter (Animal == "Bulls, bullocks and steers" & State=="Victoria")
autoplot(bulls, Count)
We see much more seasonal variation from this time series unlike before.
We can us e a classic decomposition method from our text to break out
the seasonal, trend, and cyclic components of this time series.
bulls %>%
model(
classical_decomposition(Count, type = "additive")
) %>%
components() %>%
autoplot() +
labs(title = "Bulls, bullock and steer slaughters in Australia")
## Warning: Removed 6 rows containing missing values (`geom_line()`).
We can now
autoplot(vic_elec)
## Plot variable not specified, automatically selected `.vars = Demand`
Let’s transform the australian gas data using the X ARIMA method used in
the text
autoplot(aus_production, Gas)
aus_x11_dcmp <- aus_production %>%
model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) %>%
components()
autoplot(aus_x11_dcmp) +
labs(title = "X-11 Decomposition of total Australian Gas Production")
We can see the different time series components as a result of the X-11
decomposition.
We can use the SEATS method to decompose this dataset
# Use SEATS decomposition on Australian gas data
aus_gas_seats <- aus_production %>%
model(seats = X_13ARIMA_SEATS(Gas ~ seats())) %>%
components()
autoplot(aus_gas_seats) +
labs(title = "Decomposition of total Australian gas production")
First, let’s look at the canadian_gas time series data
with a basic autoplot
autoplot(canadian_gas)
## Plot variable not specified, automatically selected `.vars = Volume`
Let’s run and plot a Box-Cox transform to see if this transformation is
appropriate for this data.
lambda <- canadian_gas %>%
features(Volume, features=guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, lambda))
In this case, the Box-cox transformation didn’t change much with a
returned value of \(\lambda=0.5767...\). Additionally, the
seasonal variation within this isn’t constant over our timeframe,
indicating our selected value of \(\lambda\) may not be optimal.
# Let's pick a single series value
retail <- aus_retail %>%
filter(`Series ID` == "A3349849A")
autoplot(retail)
## Plot variable not specified, automatically selected `.vars = Turnover`
Now we can apply Box-Cox transform to this dataset, we can try this with
varying values of lambda to see which
lambdas = seq(0,5, 0.5)
# Try BC transform for varying values of lambda, then plot
for (l in lambdas){
title <- glue("Box-Cox Transformation with lambda = {l}")
p <- retail %>% autoplot(box_cox(Turnover, l)) + labs(x="Month", y="Monthly Turnover", title=title)
print(p)
}
From our plots, we see a lower value of \(\lambda\) is a better transform for our data, as towards the higher end we see more variance in our seasonal variations (a “wider” cycle). We can verify this using the Guerrero feature listed in the book.
# Find ideal lambda for retail data
lambda <-retail %>%
features(Turnover, features=guerrero) %>%
pull(lambda_guerrero)
print(lambda)
## [1] 0.5054374
# Running Box-Cox on Tobacco series from aus production
tobacco_lambda <- aus_production %>% features(Tobacco, features=guerrero) %>%
pull(lambda_guerrero)
aus_production %>% autoplot(box_cox(Tobacco, tobacco_lambda)) + labs(x="Week", y="Tobacco Production (tonnes)")
## Warning: Removed 24 rows containing missing values (`geom_line()`).
Now let’s plot the
Economy passengers series transformed.
We’ll use the same Box-Cox transformation method used above. For this
data, we see some large dips in 1989 and 1993
# Filter and select down to economy passnegers between Melbourne and Sydney
economy <- ansett %>% filter(Class == "Economy" & Airports == "MEL-SYD") %>% select(Passengers)
economy_lambda <- economy %>% features(Passengers, features=guerrero) %>%
pull(lambda_guerrero)
print(economy_lambda)
## [1] 1.999927
# Plot the box cox transform!
economy %>% autoplot(box_cox(Passengers, economy_lambda))
Now we can run a Box-Cox transformation against the
pedestrian dtaa at Southern Cross Station
scs <- pedestrian %>% filter(Sensor == "Southern Cross Station")
scs_lambda <- scs %>% features(Count, features=guerrero) %>%
pull(lambda_guerrero)
print(scs_lambda)
## [1] -0.2501616
# Plot the box cox transform!
scs %>% autoplot(box_cox(Count, scs_lambda))
# time series def from the book
gas <- tail(tsibbledata::aus_production, 5*4) %>% select(Gas)
autoplot(gas, Gas)
There are quarterly seasonal fluctuations in this data, as well as a general upward trend throughout the timespan of this data (5 years). We can run a classical multiplicative decomposition on this
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
autoplot() +
labs(title = "Classical multiplicative decomposition of Australian gas production (2006 - 2010)")
## Warning: Removed 2 rows containing missing values (`geom_line()`).
From our classical decomposition we see an upward trend in gas
production over the past 5 years and quarterly cycles. This is in line
with our observations above. We can take a 5-MA moving average to adjust
this data for seasonal cycles
# Run X-11 to make seasonal adjustment
gas_smoothed <- gas %>%
model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) %>% components()
# plot seasonally adjusted data
gas_smoothed |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = season_adjust))
In our seasonally-adjusted plot we see a clearer picture of the overall trend in our data for quarterly gas production
Now we can add an outlier point to our data and recompute the seasonal adjustment:
# Add 300 to a quarter gas value
gas_outlier <- gas %>% mutate(Gas = ifelse(`Quarter` == yearquarter("2005 Q3"), 471, Gas))
# Recompute saesonal adjustment (X-11 again) and plot
gas_smoothed <- gas_outlier %>%
model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) %>% components()
# plot seasonally adjusted data
gas_smoothed |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted (with Outlier)"))
We see a huge spike on our outlier value for Q3 2005.
This is close to the end of our series, let’s try one in the middle of
our series (Q1 2008)
# Add 300 to a quarter gas value
gas_outlier <- gas %>% mutate(Gas = ifelse(`Quarter` == yearquarter("2008 Q1"), 471, Gas))
# Recompute seasonal adjustment (X-11 again) and plot
gas_smoothed <- gas_outlier %>%
model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) %>% components()
# plot seasonally adjusted data
gas_smoothed |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted (with Outlier)"))
Here we see a huge spike in the middle, which makes it difficult to get a sense of scale with our trend cycle.
First, let’s visualize the raw data present in
aus_retail
autoplot(aus_retail)
## Plot variable not specified, automatically selected `.vars = Turnover`
Let’s first select a single series, using the method from Chapter 2 of the text
# Select single series ID
myseries <- aus_retail %>%
filter(`Series ID` == "A3349849A")
Now let’s decompose the series using the X-11 method, again, we’ll
mirror the methods used in the text using the feasts
library and it’s X-13 ARIMA SEATS method for seasonal adjustment:
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) + labs(title ="Decomposition of total Australian retail turnover using X-11.")
The decomposition in these charts shows a steady upwards trend in the data, with a seasonality of a month. There’s some outlier periods, which is emphasized by the remainder plot around 1991 - 1992. The seasonal component plot shows a wide variance in the level of the time series by month, with peaks in springtime (March) and December, and some strong dips. There’s also less of a lag pattern we can see from the
The recession is visible from the remainder plot below. This makes sense as a large recession would generally not be considered a seasonal event and would break the seasonal and trend cycles within this time series.