Exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.4.1
## ✔ lubridate 1.9.2 ✔ fable 0.4.1
## ✔ ggplot2 3.5.0
## Warning: package 'dplyr' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'tsibble' was built under R version 4.3.3
## Warning: package 'tsibbledata' was built under R version 4.3.3
## Warning: package 'feasts' was built under R version 4.3.3
## Warning: package 'fabletools' was built under R version 4.3.3
## Warning: package 'fable' was built under R version 4.3.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(dplyr)
library(tsibble)
global_economy %>%
autoplot(GDP / Population, show.legend = FALSE) +
labs(title= "GDP per Capita", x="Year")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
# The years range from 1960-2017
# unique(global_economy$Year)
# Country with highest GDP per Capita
global_economy %>%
mutate(GDP_per_Cap = GDP/Population) %>%
slice_max(GDP_per_Cap, n = 10) %>%
distinct()
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## # A tibble: 10 × 10
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7060236168. 7.18 NA NA NA 38132
## 2 Monaco MCO 2008 6476490406. 0.732 NA NA NA 35853
## 3 Liechtenstein LIE 2014 6657170923. NA NA NA NA 37127
## 4 Liechtenstein LIE 2013 6391735894. NA NA NA NA 36834
## 5 Monaco MCO 2013 6553372278. 9.57 NA NA NA 37971
## 6 Monaco MCO 2016 6468252212. 3.21 NA NA NA 38499
## 7 Liechtenstein LIE 2015 6268391521. NA NA NA NA 37403
## 8 Monaco MCO 2007 5867916781. 14.4 NA NA NA 35111
## 9 Liechtenstein LIE 2016 6214633651. NA NA NA NA 37666
## 10 Monaco MCO 2015 6258178995. 4.94 NA NA NA 38307
## # ℹ 1 more variable: GDP_per_Cap <dbl>
# Plotting the 2 countries
global_economy %>%
filter(Country == c("Monaco", "Liechtenstein")) %>%
autoplot(GDP/Population) +
labs(title= "GDP per Capita", subtitle ="Monaco & Liechtenstein")
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
The country with the greatest GDP per Capita is Monaco in 2014/2008, followed closely by Liechtenstein in 2014/2013. The dataset ranges from the time-period of 1960 to 2017, and from looking at the trends for those 2 countries, which started in 1970, that Monaco has had a higher GDP/Capita for most of the time, with a brief time period from 2010-2013 where Liechtenstein was higher as Monaco experienced a drop.
[United States GDP from global_economy.]
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP) +
labs(title="GDP", subtitle="United States")
global_economy %>%
filter(Country == "United States") %>%
autoplot(sqrt(GDP)) +
ggtitle("GDP: Square-Root Transform", subtitle="United States")
The trend seems to be an obvious quadratic increase, with a drop in 2008/2009 due to the Housing Crisis Bubble, and the Square-Root transform seemed to straighten out the original plot a bit.
[Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.]
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
autoplot(Count) +
ggtitle("Slaughter of Victorian Bulls, Bullocks and Steers", subtitle="State of Victoria")
# Normalize with Log transform?
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
autoplot(log(Count)) +
ggtitle("Slaughter of Victorian Bulls, Bullocks and Steers: Log-Transform", subtitle="State of Victoria")
# Trying a Box-Cox transform
lambda <- aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
features(Count,features = guerrero) %>%
pull(lambda_guerrero) # -0.04461887
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
autoplot(box_cox(Count,lambda)) +
labs(title = paste("Slaughter of Victorian Bulls, Bullocks and Steers: Box-Cox-Transform, λ = ", round(lambda,3)), subtitle="State of Victoria")
The slaughter of Victorian bulls, bullocks, and steers is rather erratic with quite a bit of white noise, although there does appear be cyclical trends of increases and decreases. Performing a log transform didn’t help make it more interpretable. When I run a Box-Cox transform, I get a λ of -0.0444619, which means it is a Power Transform followed by scaling. A good value of λ is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler.
[Victorian Electricity Demand from vic_elec.]
vic_elec %>%
autoplot(Demand) +
ggtitle("Half-Hour Electricity Demand for Australia", subtitle="State of Victoria")
vic_elec %>%
autoplot(log(Demand)) +
ggtitle("Half-Hour Electricity Demand for Australia: Log Transform", subtitle="State of Victoria")
# Daily
vic_elec %>%
index_by(Date) %>%
summarise(Total_Demand = sum(Demand)) %>%
autoplot(Total_Demand) +
ggtitle("Daily Electricity Demand for Australia", subtitle="State of Victoria")
# Weekly
vic_elec %>%
mutate(Week = yearweek(Time)) %>%
index_by(Week) %>%
summarise(Total_Demand = sum(Demand)) %>%
autoplot(Total_Demand) +
ggtitle("Weekly Electricity Demand for Australia", subtitle="State of Victoria")
The most effective transformation would be to change it from half-hourly to a different time period, in this case daily or weekly to remove much of the noise. However, there is still no effective transformation, and there is no immediate noticeable trend as electricity is always in high demand.
[Gas production from aus_production.]
aus_production %>%
autoplot(Gas) +
labs(title = "Australian Gas Production")
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(title = paste("Australian Gas Production: Box-Cox-Transform, λ = ", round(lambda,3)))
The auto-plot immediately reveals that a Box-Cox transformation would be effective here, as it looks like the variation increases with the level of series. The λ value is 0.10951
canadian_gas %>%
autoplot(Volume) +
labs(title = "Canadian Gas Volume")
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, lambda)) +
labs(title = paste("Canadian Gas Volume: Box-Cox-Transform, λ = ", round(lambda,3)))
A Box-Cox transformation is unhelpful for this data because the variation doesn’t only increase or decrease with the level of the series. It starts off with small variations, then larger ones from 1970-1990, and drops back down to small fluctuations until 2010. Here, λ is a larger value (0.5767648).
set.seed(22397)
my_series <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
autoplot(my_series, Turnover) +
labs(title = "Retail Turnover")
lambda <- my_series %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
my_series %>%
autoplot(box_cox(Turnover, lambda)) +
labs(title = paste("Retail Turnover: Box-Cox Transform, λ = ", round(lambda,3)))
My Box-Cox transformation would have a λ of-0.05317855 to help normalize the auto-plot
Tobacco from aus_production
aus_production %>%
autoplot(Tobacco) +
labs(title = "Tobacco Production")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda)) +
labs(title = paste("Tobacco Production: Box-Cox Transform, λ = ", round(lambda,3)))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
Economy class passengers between Melbourne and Sydney from ansett
ansett %>%
filter(Class == "Economy" & Airports == "MEL-SYD") %>%
autoplot(Passengers) +
labs(title = "Economy Class Passengers", subtitle = "Melbourne and Sydney")
lambda <- ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
features(Passengers,features = guerrero) %>%
pull(lambda_guerrero)
ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
autoplot(box_cox(Passengers, lambda)) +
labs(title = paste("Economy Class Passengers: Box-Cox Transform, λ = ", round(lambda,3)), subtitle = "Melbourne and Sydney")
Pedestrian counts at Southern Cross Station from pedestrian
southern_cross <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
southern_cross %>%
autoplot(Count) +
labs(title="Hourly Pedestrian Counts", subtitle = "Southern Cross Station")
lambda1 <- southern_cross %>%
features(Count,features = guerrero) %>%
pull(lambda_guerrero)
southern_cross %>%
autoplot(box_cox(Count,lambda1)) +
labs(title = paste("Hourly Pedestrian Counts: Box-Cox Transform, λ = ", round(lambda1,2)), subtitle = "Southern Cross Station")
# Still un-interpretable, change the time scale
southern_cross <- southern_cross %>%
mutate(Week = yearweek(Date)) %>%
index_by(Week) %>%
summarise(Count = sum(Count))
autoplot(southern_cross, Count)+
labs(title="Weekly Pedestrian Counts", subtitle = "Southern Cross Station")
lambda2 <- southern_cross %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
southern_cross %>%
autoplot(box_cox(Count, lambda2)) +
labs(title = paste("Weekly Pedestrian Counts: Box-Cox Transform, λ = ", round(lambda2,3)), subtitle = "Southern Cross Station")
gas <- tail(aus_production, 5*4) %>%
select(Gas)
gas %>%
autoplot(Gas) +
labs(title = "Quarterly Gas Production")
There is a clear trend showing positive growth, along with seasonal fluctuations that show a drop before rising again to a higher level than when the drop first started. The time-span of each of these is 1 year, as we can see the Gas volume in 2005Q3 was 221, then dropping to 171 in 2006Q1, and rising again to 233 in 2006Q3 for example.
gas_dcmp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
gas_dcmp %>% autoplot() +
labs(title = "Quarterly Gas Production: Classical Multiplicative Decomposition")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
The results confirm my notes above, there is a clear rising trend, along with seasonal fluctuations of near identical volume. Thanks to the decomposition chart, we can also see there are random fluctuations as well, although they are minuscule in amplitude and have a minor impact on the overall visual.
gas_dcmp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative"))
# Extract components
gas_dcmp_components <- components(gas_dcmp)
# Plot the seasonally adjusted data
gas_dcmp_components %>%
as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y = Gas / seasonal), colour = "green") + # Seasonally adjusted is Gas / seasonal component
labs(title = "Quarterly Gas Production: Seasonally Adjusted")
# Add 300 to the last row (236 -> 536)
gas_outlier <- gas
gas_outlier[nrow(gas_outlier), "Gas"] <- gas_outlier[nrow(gas_outlier), "Gas"] + 300
# Classical decomposition
gas_dcmp_outlier <- gas_outlier %>%
model(classical_decomposition(Gas, type = "multiplicative"))
gas_dcmp_outlier_components <- components(gas_dcmp_outlier)
# Regular plot
gas_dcmp_outlier_components %>% autoplot() +
labs(title = "Quarterly Gas Production + End Outlier: Classical Multiplicative Decomposition")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Seasonally adjusted data
gas_dcmp_outlier_components %>%
as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y = Gas / seasonal), colour = "green") + # Seasonally adjusted (Gas / seasonal component)
labs(title = "Quarterly Gas Production + End Outlier: Seasonally Adjusted")
After adding the outlier value of 300 to the last observation, which is 236 in 2010Q2, the classical decomposition model shows the impact in the trend having an overall large spike. The seasonal has no impact, but interestingly enough, the random component has a large drop as the outlier is very large relative to the rest of the observations in gas.
gas_outlier_middle <- gas
# Classical decomposition
gas_dcmp_outlier <- gas_outlier_middle %>%
model(classical_decomposition(Gas, type = "multiplicative"))
gas_dcmp_outlier_components <- components(gas_dcmp_outlier)
# Regular plot
gas_dcmp_outlier_components %>% autoplot() +
labs(title = "Quarterly Gas Production + Middle Outlier: Classical Multiplicative Decomposition")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Seasonal
gas_outlier_middle %>%
mutate(Gas = if_else(Quarter==yearquarter("2007Q4"), Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust) +
labs(title = "Quarterly Gas Production + Middle Outlier: Seasonally Adjusted")
After adding the outlier value of 300 to the middle observation, in the decomposition plot, we can see that the trend no longer has a spike at the end, but a small bump in the middle, before rising again towards the end. The random portion of the plot also shows an uptick, while the seasonal trend stays the same. However, the seasonally adjusted plot shows the outlier in the middle as expected.
set.seed(22397)
x11_dcmp <- my_series %>%
model(X11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title = "Retail Turnover: X-11 Decomposition")
After decomposing my series, I notice that the seasonality does actually vary with time- there is a decrease in 1995, then a rise in 2005, after which it seems to be rising and dropping again at odd intervals. There are also 4 large spikes in the Irregularities section, along with an overall increase in the trend apart from the peaks and valleys from 2005-2015. My original plot for my series was very irregular, but this X-11 decomposition helps to see where the changes are occurring, even after my Box-Cox transformation.
The decomposition shows a clear positive trend with the number of civilians in the labor force, with consistent seasonal fluctuations occurring over a period of 1 year. There is a significant drop from 1991/1992, which shows the recession in the remainder component. The scale shows that the seasonal fluctuations and remainder are rather small in scale in relation to the overall trend-cycle chart. This implies that seasonality does not have much bearing on the labor force, as it grows at larger rates. The seasonal component plot implies that the average on-boarding of people joining the labor force is higher in the months of March and December.
Absolutely it is, as we can see a large drop in the remainder plot.