library(fpp3)
library(seasonal)
library(USgas)
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
autoplot(global_economy, GDP/Population) +
labs(title= "GDP per Capita", y = "$US") +
theme(legend.position = "none")
The above plot illustrates the GDP per capita for each country over time. The legend has been removed because the large number of countries was rendering the graph imperceptible. In order to then identify the country with the highest GDP per capita, the code below was performed.
global_economy %>%
mutate(GDP_capita = GDP/Population) %>%
arrange(desc(GDP_capita))
## # A tsibble: 15,150 x 10 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population GDP_capita
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7.06e9 7.18 NA NA NA 38132 185153.
## 2 Monaco MCO 2008 6.48e9 0.732 NA NA NA 35853 180640.
## 3 Liecht… LIE 2014 6.66e9 NA NA NA NA 37127 179308.
## 4 Liecht… LIE 2013 6.39e9 NA NA NA NA 36834 173528.
## 5 Monaco MCO 2013 6.55e9 9.57 NA NA NA 37971 172589.
## 6 Monaco MCO 2016 6.47e9 3.21 NA NA NA 38499 168011.
## 7 Liecht… LIE 2015 6.27e9 NA NA NA NA 37403 167591.
## 8 Monaco MCO 2007 5.87e9 14.4 NA NA NA 35111 167125.
## 9 Liecht… LIE 2016 6.21e9 NA NA NA NA 37666 164993.
## 10 Monaco MCO 2015 6.26e9 4.94 NA NA NA 38307 163369.
## # ℹ 15,140 more rows
global_economy %>%
mutate(GDP_capita = GDP/Population) %>%
filter(Code == "MCO" | Code == "LIE", Year == 2016) %>%
arrange(desc(GDP_capita))
## # A tsibble: 2 x 10 [1Y]
## # Key: Country [2]
## Country Code Year GDP Growth CPI Imports Exports Population GDP_capita
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2016 6.47e9 3.21 NA NA NA 38499 168011.
## 2 Liechte… LIE 2016 6.21e9 NA NA NA NA 37666 164993.
The above tsibbles indicate Monaco to be the country with the highest GDP per capita; a trend that has held true since at least the 1970’s with Liechtenstein coming in at a close second.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.
autoplot(global_economy %>%
filter(Country == "United States"), GDP) +
labs(title= "US GDP")
The plot of United States GDP from global_economy is pretty smooth and shouldn’t require any transforming since there isn’t any variation that would need to be made uniform.
autoplot(aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers",
State == "Victoria"), Count) +
labs(title= "Slaughter of Victorian “Bulls, bullocks and steers”")
lambda_livestock <- aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers",
State == "Victoria") %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers",
State == "Victoria") %>%
autoplot(box_cox(Count, lambda_livestock)) +
labs(title= "Lambda Transformed Slaughter of Victorian “Bulls, bullocks and steers”")
The plot of Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock did require some transforming since the variation was not constant throughout. Interestingly enough, the value of lambda recommended using the guerrero feature was negative. The plot did seem more uniform after transforming.
autoplot(vic_elec, Demand) +
labs(title = "Victorian Electricity Demand")
lambda_elec <- vic_elec %>%
features(Demand, features = guerrero) %>%
pull(lambda_guerrero)
vic_elec %>%
autoplot(box_cox(Demand, lambda_elec)) +
labs(title = "Lambda Transformed Victorian Electricity Demand")
The plot of Victorian Electricity Demand from vic_elec was clearly seasonal and so one would expect the variation to also change depending on the time of year. A transformation was able to squeeze in the degree of variation slightly, however, not enough to make a notable difference. When it comes to repetitive data sets such as this one, there doesn’t seem to be such an obvious benefit to transformation.
autoplot(aus_production, Gas) +
labs(title= "Australian Gas Production")
lambda_gas <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda_gas)) +
labs(title = "Lambda Transformed Australian Gas Production")
The plot of Gas production from aus_production was apparent in it’s need for transformation. The beginning had way less variation than the end. In order to address this, a fitting lambda was calculated using the guerrero feature and a subsequent plot was graphed. The second plot is way more uniform in variation than the first.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
autoplot(canadian_gas, Volume) +
labs(title= "Canadian Gas")
lambda_canada <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, lambda_canada)) +
labs(title = "Lambda Transformed Canadian Gas")
Box-Cox transformations seem to work by either exaggerating or minimizing already existing variations throughout a plot. This approach is perfect for graphs where there are two predominant patterns of variation within the data set and manipulation of said variation could bring them closer together. However, as is the case with the canadian_gas data, there seems to be no benefit in using Box-Cox transformations when there are more than 2 predominant patterns of variation. The canadian_gas data seems to have one variation in the first third of the data, another variation in the second, and yet another variation in the last third. Applying a Box-Cox transformation seems to squeeze in the variations without really causing them to come any closer to each other. This is to be expected since whatever transformation one chooses to perform is done to the entire data set and not just a portion.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(1)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries, Turnover) +
labs(title = "Retail Turnovers")
lambda_turnover <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda_turnover
## [1] -0.009186879
myseries %>%
autoplot(box_cox(Turnover, lambda_turnover)) +
labs(title = "Lambda Transformed Retail Turnovers")
Given the preceding guerrero lambda value and plot shown above, a natural log Box-Cox transformation would be ideal. Although we obtained a lambda of -0.009, this value is much closer to 0 than it is to -1, leading to the use of natural log rather than the inverse plus one.
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
autoplot(aus_production, Tobacco) +
labs(title = "Tobacco and Cigarette Production")
lambda_tobacco <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
lambda_tobacco
## [1] 0.9264636
aus_production %>%
autoplot(box_cox(Tobacco, lambda_tobacco)) +
labs(title = "Lambda Transformed Tobacco and Cigarette Production")
autoplot(ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD"), Passengers) +
labs(title = "Economy Class Passengers between Melbourne and Sydney")
lambda_econ <- ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
lambda_econ
## [1] 1.999927
ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
autoplot(box_cox(Passengers, lambda_econ)) +
labs(title = "Lambda Transformed Economy Class Passengers between Melbourne and Sydney")
autoplot(pedestrian %>%
filter(Sensor == "Southern Cross Station"), Count) +
labs(title = "Pedestrians at Southern Cross Station")
lambda_ped <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
lambda_ped
## [1] -0.2501616
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(box_cox(Count, lambda_ped)) +
labs(title = "Lambda Transformed Pedestrians at Southern Cross Station")
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?
autoplot(gas)
The above plot illustrates strong seasonal fluctuation where gas production in Australia seems to dip Q4 and Q1 just to bounce back up for Q2 and Q3. The plot also has an upward trend where all quarters for next year’s cycle have a higher value than their respective quarters for the preceding year.
gas %>%
model(cd = classical_decomposition(Gas, type = "multiplicative")) %>%
components()
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend * seasonal * random
## .model Quarter Gas trend seasonal random season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 cd 2005 Q3 221 NA 1.13 NA 196.
## 2 cd 2005 Q4 180 NA 0.925 NA 195.
## 3 cd 2006 Q1 171 200. 0.875 0.974 195.
## 4 cd 2006 Q2 224 204. 1.07 1.02 209.
## 5 cd 2006 Q3 233 207 1.13 1.00 207.
## 6 cd 2006 Q4 192 210. 0.925 0.987 208.
## 7 cd 2007 Q1 187 213 0.875 1.00 214.
## 8 cd 2007 Q2 234 216. 1.07 1.01 218.
## 9 cd 2007 Q3 245 219. 1.13 0.996 218.
## 10 cd 2007 Q4 205 219. 0.925 1.01 222.
## 11 cd 2008 Q1 194 219. 0.875 1.01 222.
## 12 cd 2008 Q2 229 219 1.07 0.974 213.
## 13 cd 2008 Q3 249 219 1.13 1.01 221.
## 14 cd 2008 Q4 203 220. 0.925 0.996 219.
## 15 cd 2009 Q1 196 222. 0.875 1.01 224.
## 16 cd 2009 Q2 238 223. 1.07 0.993 222.
## 17 cd 2009 Q3 252 225. 1.13 0.994 224.
## 18 cd 2009 Q4 210 226 0.925 1.00 227.
## 19 cd 2010 Q1 205 NA 0.875 NA 234.
## 20 cd 2010 Q2 236 NA 1.07 NA 220.
Above is the resulting ‘dable’ from running the Gas data through classical decomposition with type = multiplicative. As we can see, values in the Gas field are correctly represented by the values found in the trend, seasonal, and random fields; these three fields multiplied give us what we see for their respective Gas value. This support the graphical interpretation from part a.
gas %>%
model(cd = classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust)
The above plot is an illustration of the seasonally adjusted data. One must first convert the resulting dable into a tsibble to then be able to autoplot the season_adjust field as a time series.
gas_outlier <- gas
gas_outlier$Gas[7] <- gas_outlier$Gas[7] + 300
gas_outlier %>%
model(cd = classical_decomposition(Gas, type = "multiplicative")) %>%
components()
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend * seasonal * random
## .model Quarter Gas trend seasonal random season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 cd 2005 Q3 221 NA 1.04 NA 212.
## 2 cd 2005 Q4 180 NA 0.867 NA 208.
## 3 cd 2006 Q1 171 200. 1.08 0.789 158.
## 4 cd 2006 Q2 224 204. 1.01 1.09 222.
## 5 cd 2006 Q3 233 244. 1.04 0.913 223.
## 6 cd 2006 Q4 192 285. 0.867 0.776 221.
## 7 cd 2007 Q1 487 288 1.08 1.56 450.
## 8 cd 2007 Q2 234 291. 1.01 0.798 232.
## 9 cd 2007 Q3 245 256. 1.04 0.916 235.
## 10 cd 2007 Q4 205 219. 0.867 1.08 236.
## 11 cd 2008 Q1 194 219. 1.08 0.820 179.
## 12 cd 2008 Q2 229 219 1.01 1.04 227.
## 13 cd 2008 Q3 249 219 1.04 1.09 238.
## 14 cd 2008 Q4 203 220. 0.867 1.06 234.
## 15 cd 2009 Q1 196 222. 1.08 0.817 181.
## 16 cd 2009 Q2 238 223. 1.01 1.06 236.
## 17 cd 2009 Q3 252 225. 1.04 1.07 241.
## 18 cd 2009 Q4 210 226 0.867 1.07 242.
## 19 cd 2010 Q1 205 NA 1.08 NA 190.
## 20 cd 2010 Q2 236 NA 1.01 NA 234.
gas_outlier %>%
model(cd = classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust)
Choosing an outlier towards the center of the data set results in more volatility throughout the rest of the seasonally adjusted data. Although data points farthest from the outlier seem to be closer to the values of the original, as we get closer to the outlier, so do the data points get farther from the original values.
gas_outlier1 <- gas
gas_outlier1$Gas[20] <- gas_outlier1$Gas[20] + 300
gas_outlier1 %>%
model(cd = classical_decomposition(Gas, type = "multiplicative")) %>%
components()
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend * seasonal * random
## .model Quarter Gas trend seasonal random season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 cd 2005 Q3 221 NA 1.14 NA 195.
## 2 cd 2005 Q4 180 NA 0.899 NA 200.
## 3 cd 2006 Q1 171 200. 0.883 0.966 194.
## 4 cd 2006 Q2 224 204. 1.08 1.02 207.
## 5 cd 2006 Q3 233 207 1.14 0.992 205.
## 6 cd 2006 Q4 192 210. 0.899 1.02 213.
## 7 cd 2007 Q1 187 213 0.883 0.995 212.
## 8 cd 2007 Q2 234 216. 1.08 1.00 216.
## 9 cd 2007 Q3 245 219. 1.14 0.987 216.
## 10 cd 2007 Q4 205 219. 0.899 1.04 228.
## 11 cd 2008 Q1 194 219. 0.883 1.00 220.
## 12 cd 2008 Q2 229 219 1.08 0.966 211.
## 13 cd 2008 Q3 249 219 1.14 1.00 219.
## 14 cd 2008 Q4 203 220. 0.899 1.02 226.
## 15 cd 2009 Q1 196 222. 0.883 1.00 222.
## 16 cd 2009 Q2 238 223. 1.08 0.985 220.
## 17 cd 2009 Q3 252 225. 1.14 0.986 222.
## 18 cd 2009 Q4 210 264. 0.899 0.886 233.
## 19 cd 2010 Q1 205 NA 0.883 NA 232.
## 20 cd 2010 Q2 536 NA 1.08 NA 495.
gas_outlier1 %>%
model(cd = classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust)
As can be seen in the dable and plot above, the position of the outlier does seem to influence its impact on the rest of the data points. Choosing an outlier towards the ends of the data set, or plot, appears to align much better with the original data set up until the arrival of the outlier as oppose to the volatility we saw in part e.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
## # A dable: 441 x 9 [1M]
## # Key: State, Industry, .model [1]
## # : Turnover = trend * seasonal * irregular
## State Industry .model Month Turnover trend seasonal irregular
## <chr> <chr> <chr> <mth> <dbl> <dbl> <dbl> <dbl>
## 1 Queensland Clothing retail… x11 1982 Apr 28.9 29.1 0.963 1.03
## 2 Queensland Clothing retail… x11 1982 May 30.6 28.9 1.05 1.01
## 3 Queensland Clothing retail… x11 1982 Jun 30.5 28.8 0.966 1.10
## 4 Queensland Clothing retail… x11 1982 Jul 27.9 28.9 0.984 0.981
## 5 Queensland Clothing retail… x11 1982 Aug 27.4 29.3 0.964 0.969
## 6 Queensland Clothing retail… x11 1982 Sep 29.1 30.1 0.997 0.969
## 7 Queensland Clothing retail… x11 1982 Oct 33.4 31.2 1.04 1.03
## 8 Queensland Clothing retail… x11 1982 Nov 35.5 32.3 1.08 1.02
## 9 Queensland Clothing retail… x11 1982 Dec 48.8 33.3 1.46 1.01
## 10 Queensland Clothing retail… x11 1983 Jan 29.7 33.9 0.881 0.994
## # ℹ 431 more rows
## # ℹ 1 more variable: season_adjust <dbl>
myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components() %>%
autoplot()
myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust)
The retail time series data (from Exercise 7 in Section 2.10), also known as ‘myseries’, perfectly illustrated how the number of turnovers had an upward trend throughout the data set with only a slight downturn between the years of 2005 and 2010. The above decomposed data, however, brings to light a few inconspicuous outliers between the years 1985 and 1990 and directly after 2000 in the irregularities component. We also see that the variability in seasonality grew throughout the years and that the downturn between 2005 and 2010 appear in the trend and irregularities components. In addition, the plot for the seasonally_adjusted data also clarifies that the 1999-2001 dip seen in the irregularities was an actual downturn which would have otherwise been attributed as part of seasonality in the original plot.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. Is the recession of 1991/1992 visible in the estimated components?
knitr::include_graphics('Figure3.19.png')
Image 1: Chapter 3 Exercise 9 STL Decomposition (3.7 Exercises | Forecasting: Principles and Practice (3rd Ed), 2025)
knitr::include_graphics('Figure3.20.png')
Image 2: Chapter 3 Exercise 9 Seasonal Component (3.7 Exercises | Forecasting: Principles and Practice (3rd Ed), 2025)
Looking at the STL decomposition plots, one can see an obvious upward trend throughout the data only slightly plateauing somewhere between the years 1990 and 1995. The seasonality held the same pattern throughout the series but had an increased distance between maximums towards the final years. The remainder showed an exaggerated dip somewhere between 1990 and 1992.
Observing Figures 3.19 and 3.20, one could say the recession of 1991/1992 is visible in the estimated components. This is especially true when looking at the remainder component of the decomposition and is slightly detectable in the trend component. The seasonal plot also illustrates the recession in that all months were going either downward or plateauing from 1990 to 1995 with the exceptions of February, September and October.
3.7 Exercises | Forecasting: Principles and Practice (3rd ed). (2025). Otexts.com. https://otexts.com/fpp3/decomposition-exercises.html