library(fpp3)
library(tidyverse)
library(ggrepel)
library(seasonal)
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
The country with the highest GDP per capita is Luxembourg in 2017 but just barely overall it looks like Macao SAR, China has had the highest historically. Over time the standings seems to have been pretty consistent throughout the graph with very little movement in the standings.
gdp_per_capita <- global_economy |>
mutate(GPC = GDP/Population)
data_ends <- gdp_per_capita |>
filter(Year == max(Year)) |>
arrange(desc(GPC)) |>
head(5)
gdp_per_capita |>
ggplot(aes(x=Year, y=GPC, color=Country)) +
geom_line() +
theme(legend.position = "none") +
geom_text_repel(aes(label = Country), data = data_ends)
For each of the following series, make a graph of the data. If
transforming seems appropriate, do so and describe the effect. -United
States GDP from global_economy
. -Slaughter of Victorian
“Bulls, bullocks and steers” in aus_livestock
. -Victorian
Electricity Demand from vic_elec
. -Gas production
fromaus_production
.
The most useful transformation for GDP is always to normalize it per capita. Although this is relatively simple and intuitive it is the best way to transform the data to provide the most data. The data also isn’t granular enough to produce any seasonal trends from it since it is only yearly data.
global_economy |>
filter(Code == 'USA') |>
ggplot(aes(x=Year, y=GDP/Population)) +
geom_line()
I was unsure how to approach transforming this data but there was a clear underlying seasonality to it so I decided that doing a STL decomposition would probably lead to the most informative graph. By applying this model to the data you can very clearly see the seasonal aspect of the data with a handful of outlier patches causing weirdness in the residuals. Overall though looking at the data this way is very informative.
aus_livestock |>
filter(State == 'Victoria',
Animal == 'Bulls, bullocks and steers') |>
model(
STL(Count ~ trend(window = 7) +
season(window = "periodic"),
robust = TRUE)) |>
components() |>
autoplot()
The data in vic_elec
being every 30 minutes was way too
granular and ended up creating a very noisy curve. In order to avoid
that it is possible to bucket by day, this way you have a much better
view of any trends without having to worry about all the noise. Another
view of vic_elec
I made was to average the data for every
hour so you can see the average Demand per hour throughout the day. This
allows for a different but still very useful view of the data which can
be used to predict how much Demand there will be based on the time.
There are many other ways to parse up this data into useful views but
these are just the ones I chose.
vic_elec |>
ggplot(aes(x=Time, y=Demand)) +
geom_line()
vic_elec_day <- vic_elec |>
as_tibble() |>
mutate(Time = floor_date(Time, unit = "1 day")) |>
group_by(Time) |>
summarise(Demand = sum(Demand)) |>
as_tsibble(index = Time) |>
mutate(Time = ymd(Time))
vic_elec_avghour <- vic_elec |>
as_tibble() |>
mutate(Time = hour(Time)) |>
group_by(Time) |>
summarise(Demand = mean(Demand)) |>
as_tsibble(index = Time)
vic_elec_day |>
ggplot(aes(x=Time, y=Demand)) +
geom_line()
vic_elec_day |>
select(Demand) |>
model(
STL(Demand ~ trend(window = 7) +
season(window = "periodic"),
robust = TRUE)) |>
components() |>
autoplot()
vic_elec_avghour |>
ggplot(aes(x=Time, y=Demand)) +
geom_line()
While it is probably best to perform the mathematical transformation on the gas data, the book does this exact example and I already did what the book did for GDP so I decided to change it up and look at it with a moving average approach. This basically shows the trend line within the data which is very clear in the resulting graph which helps remove some of the seasonality out of the data.
gas_ma <- aus_production |>
mutate(
`4-MA` = slider::slide_dbl(Gas, mean,
.before = 1, .after = 2, .complete = TRUE),
`2x4-MA` = slider::slide_dbl(`4-MA`, mean,
.before = 1, .after = 0, .complete = TRUE)
)
gas_ma |>
ggplot(aes(x=Quarter, y=Gas)) +
geom_line(aes(y = `2x4-MA`), colour = "#D55E00") +
geom_line()
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
Unfortunately for this data set the Box-Cox transformation does very little to change the variance. If you look at the original graph and the Box-Cox side-by-side you see very little difference. It seems that in trying to normalize the variance it doesn’t do much to change it at all. My theory is that the variance isn’t uniformly changing with time. As you can see the distance between the peaks and troughs start small at the beginning of the period, they then begin to grow toward the middle of the period but then begin to shrink at the end of the period. I believe this is causing issues because when the transform attempts to normalize one end of the spectrum it exacerbates the problem on the other end leading to the model choosing a \(\lambda\) so the curve stays nearly identical.
canadian_gas |> autoplot()
lambda <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume, lambda))
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
Using the modified Box-Cox model from Section 3.1 we can allow the
guerrero
feature to select a \(\lambda \approx -0.0788\) which definitely
does well to normalize the data to have the variance much more
equalized.
set.seed(87654321)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) |>
print()
## # A tsibble: 441 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Footwear and other personal accessory… A3349722T 1982 Apr 26.3
## 2 Victoria Footwear and other personal accessory… A3349722T 1982 May 27.1
## 3 Victoria Footwear and other personal accessory… A3349722T 1982 Jun 24.3
## 4 Victoria Footwear and other personal accessory… A3349722T 1982 Jul 25.6
## 5 Victoria Footwear and other personal accessory… A3349722T 1982 Aug 23.5
## 6 Victoria Footwear and other personal accessory… A3349722T 1982 Sep 24.3
## 7 Victoria Footwear and other personal accessory… A3349722T 1982 Oct 25.8
## 8 Victoria Footwear and other personal accessory… A3349722T 1982 Nov 29
## 9 Victoria Footwear and other personal accessory… A3349722T 1982 Dec 39.8
## 10 Victoria Footwear and other personal accessory… A3349722T 1983 Jan 25
## # ℹ 431 more rows
myseries |>
autoplot()
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero) |>
print()
## [1] -0.07876808
myseries |>
autoplot(box_cox(Turnover, lambda))
## 3.5 For the following series, find an appropriate Box-Cox
transformation in order to stabilise the variance. Tobacco from
aus_production, Economy class passengers between Melbourne and Sydney
from ansett, and Pedestrian counts at Southern Cross Station from
pedestrian.
aus_production |>
autoplot(Tobacco)
lambda <- aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero) |>
print()
## [1] 0.9264636
aus_production |>
autoplot(box_cox(Tobacco, lambda))
economy <- ansett |>
filter(Airports == "MEL-SYD", Class == 'Economy') |>
print()
## # A tsibble: 282 x 4 [1W]
## # Key: Airports, Class [1]
## Week Airports Class Passengers
## <week> <chr> <chr> <dbl>
## 1 1987 W26 MEL-SYD Economy 20167
## 2 1987 W27 MEL-SYD Economy 20161
## 3 1987 W28 MEL-SYD Economy 19993
## 4 1987 W29 MEL-SYD Economy 20986
## 5 1987 W30 MEL-SYD Economy 20497
## 6 1987 W31 MEL-SYD Economy 20770
## 7 1987 W32 MEL-SYD Economy 21111
## 8 1987 W33 MEL-SYD Economy 20675
## 9 1987 W34 MEL-SYD Economy 22092
## 10 1987 W35 MEL-SYD Economy 20772
## # ℹ 272 more rows
economy |>
autoplot(Passengers)
lambda <- economy |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero) |>
print()
## [1] 1.999927
economy |>
autoplot(box_cox(Passengers, lambda))
ped <- pedestrian |>
filter(Sensor == "Southern Cross Station") |>
print()
## # A tsibble: 17,539 x 5 [1h] <Australia/Melbourne>
## # Key: Sensor [1]
## Sensor Date_Time Date Time Count
## <chr> <dttm> <date> <int> <int>
## 1 Southern Cross Station 2015-01-01 00:00:00 2015-01-01 0 746
## 2 Southern Cross Station 2015-01-01 01:00:00 2015-01-01 1 312
## 3 Southern Cross Station 2015-01-01 02:00:00 2015-01-01 2 180
## 4 Southern Cross Station 2015-01-01 03:00:00 2015-01-01 3 133
## 5 Southern Cross Station 2015-01-01 04:00:00 2015-01-01 4 44
## 6 Southern Cross Station 2015-01-01 05:00:00 2015-01-01 5 16
## 7 Southern Cross Station 2015-01-01 06:00:00 2015-01-01 6 13
## 8 Southern Cross Station 2015-01-01 07:00:00 2015-01-01 7 21
## 9 Southern Cross Station 2015-01-01 08:00:00 2015-01-01 8 39
## 10 Southern Cross Station 2015-01-01 09:00:00 2015-01-01 9 36
## # ℹ 17,529 more rows
ped |>
autoplot(Count)
lambda <- ped |>
features(Count, features = guerrero) |>
pull(lambda_guerrero) |>
print()
## [1] -0.2501616
ped |>
autoplot(box_cox(Count, lambda))
ped <- pedestrian |>
filter(Sensor == "Southern Cross Station") |>
as_tibble() |>
mutate(Time = floor_date(Date_Time, unit = "1 day")) |>
group_by(Time) |>
summarise(Count = sum(Count)) |>
as_tsibble(index = Time) |>
mutate(Time = ymd(Time)) |>
print()
## # A tsibble: 731 x 2 [1D]
## Time Count
## <date> <int>
## 1 2015-01-01 2813
## 2 2015-01-02 4648
## 3 2015-01-03 1428
## 4 2015-01-04 1347
## 5 2015-01-05 11483
## 6 2015-01-06 11513
## 7 2015-01-07 11059
## 8 2015-01-08 11577
## 9 2015-01-09 11315
## 10 2015-01-10 1703
## # ℹ 721 more rows
ped |>
autoplot(Count)
lambda <- ped |>
features(Count, features = guerrero) |>
pull(lambda_guerrero) |>
print()
## [1] 0.2726316
ped |>
autoplot(box_cox(Count, lambda))
aus_production |>
autoplot(Tobacco)
lambda <- aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero) |>
print()
## [1] 0.9264636
aus_production |>
autoplot(box_cox(Tobacco, lambda))
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas) |>print()
## # A tsibble: 20 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 221 2005 Q3
## 2 180 2005 Q4
## 3 171 2006 Q1
## 4 224 2006 Q2
## 5 233 2006 Q3
## 6 192 2006 Q4
## 7 187 2007 Q1
## 8 234 2007 Q2
## 9 245 2007 Q3
## 10 205 2007 Q4
## 11 194 2008 Q1
## 12 229 2008 Q2
## 13 249 2008 Q3
## 14 203 2008 Q4
## 15 196 2009 Q1
## 16 238 2009 Q2
## 17 252 2009 Q3
## 18 210 2009 Q4
## 19 205 2010 Q1
## 20 236 2010 Q2
It seems pretty consistent that the seasonal trend is a massive increase from Q1 to Q2, then a slight increase from Q2 to Q3, then a drastic decrease from Q3 to Q4 and then a slight descrease from Q4 to Q1.
gas |> autoplot()
b) Use classical_decomposition with type=multiplicative to calculate the
trend-cycle and seasonal indices.
gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
print()
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend * seasonal * random
## .model Quarter Gas trend seasonal random season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposition(G… 2005 Q3 221 NA 1.13 NA 196.
## 2 "classical_decomposition(G… 2005 Q4 180 NA 0.925 NA 195.
## 3 "classical_decomposition(G… 2006 Q1 171 200. 0.875 0.974 195.
## 4 "classical_decomposition(G… 2006 Q2 224 204. 1.07 1.02 209.
## 5 "classical_decomposition(G… 2006 Q3 233 207 1.13 1.00 207.
## 6 "classical_decomposition(G… 2006 Q4 192 210. 0.925 0.987 208.
## 7 "classical_decomposition(G… 2007 Q1 187 213 0.875 1.00 214.
## 8 "classical_decomposition(G… 2007 Q2 234 216. 1.07 1.01 218.
## 9 "classical_decomposition(G… 2007 Q3 245 219. 1.13 0.996 218.
## 10 "classical_decomposition(G… 2007 Q4 205 219. 0.925 1.01 222.
## 11 "classical_decomposition(G… 2008 Q1 194 219. 0.875 1.01 222.
## 12 "classical_decomposition(G… 2008 Q2 229 219 1.07 0.974 213.
## 13 "classical_decomposition(G… 2008 Q3 249 219 1.13 1.01 221.
## 14 "classical_decomposition(G… 2008 Q4 203 220. 0.925 0.996 219.
## 15 "classical_decomposition(G… 2009 Q1 196 222. 0.875 1.01 224.
## 16 "classical_decomposition(G… 2009 Q2 238 223. 1.07 0.993 222.
## 17 "classical_decomposition(G… 2009 Q3 252 225. 1.13 0.994 224.
## 18 "classical_decomposition(G… 2009 Q4 210 226 0.925 1.00 227.
## 19 "classical_decomposition(G… 2010 Q1 205 NA 0.875 NA 234.
## 20 "classical_decomposition(G… 2010 Q2 236 NA 1.07 NA 220.
Yes it does seem to support the graphical interpretation because the trend has a steady increase like the graph but the seasonal and season_adjust seems to follow what is expected as I described in my answer to part a.
gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
autoplot()
The outlier doesn’t do much to the graph aside from change the ends of the the trend a variability aspects. But at the end of the day that doesn’t change much about the graph.
gas2 <- gas
gas2[20,1] = gas2[20,1] + 300
gas2 |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
print()
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend * seasonal * random
## .model Quarter Gas trend seasonal random season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposition(G… 2005 Q3 221 NA 1.14 NA 195.
## 2 "classical_decomposition(G… 2005 Q4 180 NA 0.899 NA 200.
## 3 "classical_decomposition(G… 2006 Q1 171 200. 0.883 0.966 194.
## 4 "classical_decomposition(G… 2006 Q2 224 204. 1.08 1.02 207.
## 5 "classical_decomposition(G… 2006 Q3 233 207 1.14 0.992 205.
## 6 "classical_decomposition(G… 2006 Q4 192 210. 0.899 1.02 213.
## 7 "classical_decomposition(G… 2007 Q1 187 213 0.883 0.995 212.
## 8 "classical_decomposition(G… 2007 Q2 234 216. 1.08 1.00 216.
## 9 "classical_decomposition(G… 2007 Q3 245 219. 1.14 0.987 216.
## 10 "classical_decomposition(G… 2007 Q4 205 219. 0.899 1.04 228.
## 11 "classical_decomposition(G… 2008 Q1 194 219. 0.883 1.00 220.
## 12 "classical_decomposition(G… 2008 Q2 229 219 1.08 0.966 211.
## 13 "classical_decomposition(G… 2008 Q3 249 219 1.14 1.00 219.
## 14 "classical_decomposition(G… 2008 Q4 203 220. 0.899 1.02 226.
## 15 "classical_decomposition(G… 2009 Q1 196 222. 0.883 1.00 222.
## 16 "classical_decomposition(G… 2009 Q2 238 223. 1.08 0.985 220.
## 17 "classical_decomposition(G… 2009 Q3 252 225. 1.14 0.986 222.
## 18 "classical_decomposition(G… 2009 Q4 210 264. 0.899 0.886 233.
## 19 "classical_decomposition(G… 2010 Q1 205 NA 0.883 NA 232.
## 20 "classical_decomposition(G… 2010 Q2 536 NA 1.08 NA 495.
gas2 |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
autoplot()
In the middle it has much more of an impact where the trend line and variability get extremely distorted around the outlier but the seasonality remains pretty intact.
gas3 <- gas
gas3[11,1] = gas3[11,1] + 300
gas3 |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
print()
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend * seasonal * random
## .model Quarter Gas trend seasonal random season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposition(G… 2005 Q3 221 NA 1.05 NA 211.
## 2 "classical_decomposition(G… 2005 Q4 180 NA 0.868 NA 207.
## 3 "classical_decomposition(G… 2006 Q1 171 200. 1.08 0.792 159.
## 4 "classical_decomposition(G… 2006 Q2 224 204. 1.01 1.09 222.
## 5 "classical_decomposition(G… 2006 Q3 233 207 1.05 1.08 223.
## 6 "classical_decomposition(G… 2006 Q4 192 210. 0.868 1.05 221.
## 7 "classical_decomposition(G… 2007 Q1 187 213 1.08 0.815 174.
## 8 "classical_decomposition(G… 2007 Q2 234 216. 1.01 1.07 232.
## 9 "classical_decomposition(G… 2007 Q3 245 256. 1.05 0.915 234.
## 10 "classical_decomposition(G… 2007 Q4 205 294. 0.868 0.804 236.
## 11 "classical_decomposition(G… 2008 Q1 494 294. 1.08 1.56 459.
## 12 "classical_decomposition(G… 2008 Q2 229 294 1.01 0.771 227.
## 13 "classical_decomposition(G… 2008 Q3 249 256. 1.05 0.928 238.
## 14 "classical_decomposition(G… 2008 Q4 203 220. 0.868 1.06 234.
## 15 "classical_decomposition(G… 2009 Q1 196 222. 1.08 0.820 182.
## 16 "classical_decomposition(G… 2009 Q2 238 223. 1.01 1.06 236.
## 17 "classical_decomposition(G… 2009 Q3 252 225. 1.05 1.07 241.
## 18 "classical_decomposition(G… 2009 Q4 210 226 0.868 1.07 242.
## 19 "classical_decomposition(G… 2010 Q1 205 NA 1.08 NA 190.
## 20 "classical_decomposition(G… 2010 Q2 236 NA 1.01 NA 234.
gas3 |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
autoplot()
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
There is a huge component of variability that I previously did not notice in the early 2000’s which now that I know it’s there it’s easier to see where it is but it’s hard to notice if it’s not immediately pointed out to you.
myseries |>
autoplot()
myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components() |>
autoplot()
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. Is the recession of 1991/1992 visible in the estimated components?
From the decomposition it is very easy to see the trend and seasonality of the labor force from the STL. The upward trend line is extremely distinct in the data but the trend part of the graph helps to see it without any of the seasonality. The seasonality can be clearly seen in the graph in the seasonality part of the graph and it really shows how regular the change from month to month is. In the remainder section there is a very clear delineation for when the 1990/1991 recession. You can also see the seasonality in the month to month average in Figure 3.20 and the distinct variability from year to year for each month with interesting trends for each month being different across the years.