DATA624_HW2_CameronGray

library(fpp3)
library(tidyverse)
library(ggrepel)
library(seasonal)

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

The country with the highest GDP per capita is Luxembourg in 2017 but just barely overall it looks like Macao SAR, China has had the highest historically. Over time the standings seems to have been pretty consistent throughout the graph with very little movement in the standings.

gdp_per_capita <- global_economy |>
  mutate(GPC = GDP/Population)

data_ends <- gdp_per_capita |> 
  filter(Year == max(Year)) |>
  arrange(desc(GPC)) |> 
  head(5)

gdp_per_capita |>
  ggplot(aes(x=Year, y=GPC, color=Country)) +
  geom_line() +
  theme(legend.position = "none") +
  geom_text_repel(aes(label = Country), data = data_ends)

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. -United States GDP from global_economy. -Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. -Victorian Electricity Demand from vic_elec. -Gas production fromaus_production.

The most useful transformation for GDP is always to normalize it per capita. Although this is relatively simple and intuitive it is the best way to transform the data to provide the most data. The data also isn’t granular enough to produce any seasonal trends from it since it is only yearly data.

global_economy |>
  filter(Code == 'USA') |>
  ggplot(aes(x=Year, y=GDP/Population)) +
  geom_line()

I was unsure how to approach transforming this data but there was a clear underlying seasonality to it so I decided that doing a STL decomposition would probably lead to the most informative graph. By applying this model to the data you can very clearly see the seasonal aspect of the data with a handful of outlier patches causing weirdness in the residuals. Overall though looking at the data this way is very informative.

aus_livestock |>
  filter(State == 'Victoria',
         Animal == 'Bulls, bullocks and steers') |>
  model(
    STL(Count ~ trend(window = 7) +
                   season(window = "periodic"),
    robust = TRUE)) |>
  components() |>
  autoplot()

The data in vic_elec being every 30 minutes was way too granular and ended up creating a very noisy curve. In order to avoid that it is possible to bucket by day, this way you have a much better view of any trends without having to worry about all the noise. Another view of vic_elec I made was to average the data for every hour so you can see the average Demand per hour throughout the day. This allows for a different but still very useful view of the data which can be used to predict how much Demand there will be based on the time. There are many other ways to parse up this data into useful views but these are just the ones I chose.

vic_elec |>
  ggplot(aes(x=Time, y=Demand)) +
  geom_line()

vic_elec_day <- vic_elec |>
  as_tibble() |>
  mutate(Time = floor_date(Time, unit = "1 day")) |>
  group_by(Time) |>
  summarise(Demand = sum(Demand)) |>
  as_tsibble(index = Time) |>
  mutate(Time = ymd(Time))

vic_elec_avghour <- vic_elec |>
  as_tibble() |>
  mutate(Time = hour(Time)) |>
  group_by(Time) |>
  summarise(Demand = mean(Demand)) |>
  as_tsibble(index = Time) 

vic_elec_day |>
  ggplot(aes(x=Time, y=Demand)) +
  geom_line()

vic_elec_day |>
  select(Demand) |>
  model(
    STL(Demand ~ trend(window = 7) +
                   season(window = "periodic"),
    robust = TRUE)) |>
  components() |>
  autoplot()

vic_elec_avghour |>
  ggplot(aes(x=Time, y=Demand)) +
  geom_line()

While it is probably best to perform the mathematical transformation on the gas data, the book does this exact example and I already did what the book did for GDP so I decided to change it up and look at it with a moving average approach. This basically shows the trend line within the data which is very clear in the resulting graph which helps remove some of the seasonality out of the data.

gas_ma <- aus_production |>
  mutate(
    `4-MA` = slider::slide_dbl(Gas, mean, 
                .before = 1, .after = 2, .complete = TRUE),
    `2x4-MA` = slider::slide_dbl(`4-MA`, mean,
                .before = 1, .after = 0, .complete = TRUE)
  )

gas_ma |>
  ggplot(aes(x=Quarter, y=Gas)) +
  geom_line(aes(y = `2x4-MA`), colour = "#D55E00") +
  geom_line()

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Unfortunately for this data set the Box-Cox transformation does very little to change the variance. If you look at the original graph and the Box-Cox side-by-side you see very little difference. It seems that in trying to normalize the variance it doesn’t do much to change it at all. My theory is that the variance isn’t uniformly changing with time. As you can see the distance between the peaks and troughs start small at the beginning of the period, they then begin to grow toward the middle of the period but then begin to shrink at the end of the period. I believe this is causing issues because when the transform attempts to normalize one end of the spectrum it exacerbates the problem on the other end leading to the model choosing a \(\lambda\) so the curve stays nearly identical.

canadian_gas |> autoplot()

lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)
canadian_gas |>
  autoplot(box_cox(Volume, lambda))

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Using the modified Box-Cox model from Section 3.1 we can allow the guerrero feature to select a \(\lambda \approx -0.0788\) which definitely does well to normalize the data to have the variance much more equalized.

set.seed(87654321)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) |>
  print()

## # A tsibble: 441 x 5 [1M]
## # Key:       State, Industry [1]
##    State    Industry                               `Series ID`    Month Turnover
##    <chr>    <chr>                                  <chr>          <mth>    <dbl>
##  1 Victoria Footwear and other personal accessory… A3349722T   1982 Apr     26.3
##  2 Victoria Footwear and other personal accessory… A3349722T   1982 May     27.1
##  3 Victoria Footwear and other personal accessory… A3349722T   1982 Jun     24.3
##  4 Victoria Footwear and other personal accessory… A3349722T   1982 Jul     25.6
##  5 Victoria Footwear and other personal accessory… A3349722T   1982 Aug     23.5
##  6 Victoria Footwear and other personal accessory… A3349722T   1982 Sep     24.3
##  7 Victoria Footwear and other personal accessory… A3349722T   1982 Oct     25.8
##  8 Victoria Footwear and other personal accessory… A3349722T   1982 Nov     29  
##  9 Victoria Footwear and other personal accessory… A3349722T   1982 Dec     39.8
## 10 Victoria Footwear and other personal accessory… A3349722T   1983 Jan     25  
## # ℹ 431 more rows

myseries |>
  autoplot()

lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero) |>
  print()

## [1] -0.07876808

myseries |>
  autoplot(box_cox(Turnover, lambda))

## 3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

aus_production |>
  autoplot(Tobacco)

lambda <- aus_production |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero) |>
  print()

## [1] 0.9264636

aus_production |>
  autoplot(box_cox(Tobacco, lambda))

economy <- ansett |>
  filter(Airports == "MEL-SYD", Class == 'Economy') |>
  print()

## # A tsibble: 282 x 4 [1W]
## # Key:       Airports, Class [1]
##        Week Airports Class   Passengers
##      <week> <chr>    <chr>        <dbl>
##  1 1987 W26 MEL-SYD  Economy      20167
##  2 1987 W27 MEL-SYD  Economy      20161
##  3 1987 W28 MEL-SYD  Economy      19993
##  4 1987 W29 MEL-SYD  Economy      20986
##  5 1987 W30 MEL-SYD  Economy      20497
##  6 1987 W31 MEL-SYD  Economy      20770
##  7 1987 W32 MEL-SYD  Economy      21111
##  8 1987 W33 MEL-SYD  Economy      20675
##  9 1987 W34 MEL-SYD  Economy      22092
## 10 1987 W35 MEL-SYD  Economy      20772
## # ℹ 272 more rows

economy |>
  autoplot(Passengers)

lambda <- economy |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero) |>
  print()

## [1] 1.999927

economy |>
  autoplot(box_cox(Passengers, lambda))

ped <- pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  print()

## # A tsibble: 17,539 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [1]
##    Sensor                 Date_Time           Date        Time Count
##    <chr>                  <dttm>              <date>     <int> <int>
##  1 Southern Cross Station 2015-01-01 00:00:00 2015-01-01     0   746
##  2 Southern Cross Station 2015-01-01 01:00:00 2015-01-01     1   312
##  3 Southern Cross Station 2015-01-01 02:00:00 2015-01-01     2   180
##  4 Southern Cross Station 2015-01-01 03:00:00 2015-01-01     3   133
##  5 Southern Cross Station 2015-01-01 04:00:00 2015-01-01     4    44
##  6 Southern Cross Station 2015-01-01 05:00:00 2015-01-01     5    16
##  7 Southern Cross Station 2015-01-01 06:00:00 2015-01-01     6    13
##  8 Southern Cross Station 2015-01-01 07:00:00 2015-01-01     7    21
##  9 Southern Cross Station 2015-01-01 08:00:00 2015-01-01     8    39
## 10 Southern Cross Station 2015-01-01 09:00:00 2015-01-01     9    36
## # ℹ 17,529 more rows

ped |>
  autoplot(Count)

lambda <- ped |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero) |>
  print()

## [1] -0.2501616

ped |>
  autoplot(box_cox(Count, lambda))

ped <- pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  as_tibble() |>
  mutate(Time = floor_date(Date_Time, unit = "1 day")) |>
  group_by(Time) |>
  summarise(Count = sum(Count)) |>
  as_tsibble(index = Time) |>
  mutate(Time = ymd(Time)) |>
  print()

## # A tsibble: 731 x 2 [1D]
##    Time       Count
##    <date>     <int>
##  1 2015-01-01  2813
##  2 2015-01-02  4648
##  3 2015-01-03  1428
##  4 2015-01-04  1347
##  5 2015-01-05 11483
##  6 2015-01-06 11513
##  7 2015-01-07 11059
##  8 2015-01-08 11577
##  9 2015-01-09 11315
## 10 2015-01-10  1703
## # ℹ 721 more rows

ped |>
  autoplot(Count)

lambda <- ped |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero) |>
  print()

## [1] 0.2726316

ped |>
  autoplot(box_cox(Count, lambda))

aus_production |>
  autoplot(Tobacco)

lambda <- aus_production |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero) |>
  print()

## [1] 0.9264636

aus_production |>
  autoplot(box_cox(Tobacco, lambda))

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas) |>print()

## # A tsibble: 20 x 2 [1Q]
##      Gas Quarter
##    <dbl>   <qtr>
##  1   221 2005 Q3
##  2   180 2005 Q4
##  3   171 2006 Q1
##  4   224 2006 Q2
##  5   233 2006 Q3
##  6   192 2006 Q4
##  7   187 2007 Q1
##  8   234 2007 Q2
##  9   245 2007 Q3
## 10   205 2007 Q4
## 11   194 2008 Q1
## 12   229 2008 Q2
## 13   249 2008 Q3
## 14   203 2008 Q4
## 15   196 2009 Q1
## 16   238 2009 Q2
## 17   252 2009 Q3
## 18   210 2009 Q4
## 19   205 2010 Q1
## 20   236 2010 Q2

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

It seems pretty consistent that the seasonal trend is a massive increase from Q1 to Q2, then a slight increase from Q2 to Q3, then a drastic decrease from Q3 to Q4 and then a slight descrease from Q4 to Q1.

gas |> autoplot()

b) Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  print()

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##    .model                      Quarter   Gas trend seasonal random season_adjust
##    <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposition(G… 2005 Q3   221   NA     1.13  NA              196.
##  2 "classical_decomposition(G… 2005 Q4   180   NA     0.925 NA              195.
##  3 "classical_decomposition(G… 2006 Q1   171  200.    0.875  0.974          195.
##  4 "classical_decomposition(G… 2006 Q2   224  204.    1.07   1.02           209.
##  5 "classical_decomposition(G… 2006 Q3   233  207     1.13   1.00           207.
##  6 "classical_decomposition(G… 2006 Q4   192  210.    0.925  0.987          208.
##  7 "classical_decomposition(G… 2007 Q1   187  213     0.875  1.00           214.
##  8 "classical_decomposition(G… 2007 Q2   234  216.    1.07   1.01           218.
##  9 "classical_decomposition(G… 2007 Q3   245  219.    1.13   0.996          218.
## 10 "classical_decomposition(G… 2007 Q4   205  219.    0.925  1.01           222.
## 11 "classical_decomposition(G… 2008 Q1   194  219.    0.875  1.01           222.
## 12 "classical_decomposition(G… 2008 Q2   229  219     1.07   0.974          213.
## 13 "classical_decomposition(G… 2008 Q3   249  219     1.13   1.01           221.
## 14 "classical_decomposition(G… 2008 Q4   203  220.    0.925  0.996          219.
## 15 "classical_decomposition(G… 2009 Q1   196  222.    0.875  1.01           224.
## 16 "classical_decomposition(G… 2009 Q2   238  223.    1.07   0.993          222.
## 17 "classical_decomposition(G… 2009 Q3   252  225.    1.13   0.994          224.
## 18 "classical_decomposition(G… 2009 Q4   210  226     0.925  1.00           227.
## 19 "classical_decomposition(G… 2010 Q1   205   NA     0.875 NA              234.
## 20 "classical_decomposition(G… 2010 Q2   236   NA     1.07  NA              220.

Do the results support the graphical interpretation from part a?

Yes it does seem to support the graphical interpretation because the trend has a steady increase like the graph but the seasonal and season_adjust seems to follow what is expected as I described in my answer to part a.

Compute and plot the seasonally adjusted data.

gas |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  autoplot()

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

The outlier doesn’t do much to the graph aside from change the ends of the the trend a variability aspects. But at the end of the day that doesn’t change much about the graph.

gas2 <- gas
gas2[20,1] = gas2[20,1] + 300

gas2 |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  print()

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##    .model                      Quarter   Gas trend seasonal random season_adjust
##    <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposition(G… 2005 Q3   221   NA     1.14  NA              195.
##  2 "classical_decomposition(G… 2005 Q4   180   NA     0.899 NA              200.
##  3 "classical_decomposition(G… 2006 Q1   171  200.    0.883  0.966          194.
##  4 "classical_decomposition(G… 2006 Q2   224  204.    1.08   1.02           207.
##  5 "classical_decomposition(G… 2006 Q3   233  207     1.14   0.992          205.
##  6 "classical_decomposition(G… 2006 Q4   192  210.    0.899  1.02           213.
##  7 "classical_decomposition(G… 2007 Q1   187  213     0.883  0.995          212.
##  8 "classical_decomposition(G… 2007 Q2   234  216.    1.08   1.00           216.
##  9 "classical_decomposition(G… 2007 Q3   245  219.    1.14   0.987          216.
## 10 "classical_decomposition(G… 2007 Q4   205  219.    0.899  1.04           228.
## 11 "classical_decomposition(G… 2008 Q1   194  219.    0.883  1.00           220.
## 12 "classical_decomposition(G… 2008 Q2   229  219     1.08   0.966          211.
## 13 "classical_decomposition(G… 2008 Q3   249  219     1.14   1.00           219.
## 14 "classical_decomposition(G… 2008 Q4   203  220.    0.899  1.02           226.
## 15 "classical_decomposition(G… 2009 Q1   196  222.    0.883  1.00           222.
## 16 "classical_decomposition(G… 2009 Q2   238  223.    1.08   0.985          220.
## 17 "classical_decomposition(G… 2009 Q3   252  225.    1.14   0.986          222.
## 18 "classical_decomposition(G… 2009 Q4   210  264.    0.899  0.886          233.
## 19 "classical_decomposition(G… 2010 Q1   205   NA     0.883 NA              232.
## 20 "classical_decomposition(G… 2010 Q2   536   NA     1.08  NA              495.

gas2 |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  autoplot()

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

In the middle it has much more of an impact where the trend line and variability get extremely distorted around the outlier but the seasonality remains pretty intact.

gas3 <- gas
gas3[11,1] = gas3[11,1] + 300

gas3 |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  print()

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##    .model                      Quarter   Gas trend seasonal random season_adjust
##    <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposition(G… 2005 Q3   221   NA     1.05  NA              211.
##  2 "classical_decomposition(G… 2005 Q4   180   NA     0.868 NA              207.
##  3 "classical_decomposition(G… 2006 Q1   171  200.    1.08   0.792          159.
##  4 "classical_decomposition(G… 2006 Q2   224  204.    1.01   1.09           222.
##  5 "classical_decomposition(G… 2006 Q3   233  207     1.05   1.08           223.
##  6 "classical_decomposition(G… 2006 Q4   192  210.    0.868  1.05           221.
##  7 "classical_decomposition(G… 2007 Q1   187  213     1.08   0.815          174.
##  8 "classical_decomposition(G… 2007 Q2   234  216.    1.01   1.07           232.
##  9 "classical_decomposition(G… 2007 Q3   245  256.    1.05   0.915          234.
## 10 "classical_decomposition(G… 2007 Q4   205  294.    0.868  0.804          236.
## 11 "classical_decomposition(G… 2008 Q1   494  294.    1.08   1.56           459.
## 12 "classical_decomposition(G… 2008 Q2   229  294     1.01   0.771          227.
## 13 "classical_decomposition(G… 2008 Q3   249  256.    1.05   0.928          238.
## 14 "classical_decomposition(G… 2008 Q4   203  220.    0.868  1.06           234.
## 15 "classical_decomposition(G… 2009 Q1   196  222.    1.08   0.820          182.
## 16 "classical_decomposition(G… 2009 Q2   238  223.    1.01   1.06           236.
## 17 "classical_decomposition(G… 2009 Q3   252  225.    1.05   1.07           241.
## 18 "classical_decomposition(G… 2009 Q4   210  226     0.868  1.07           242.
## 19 "classical_decomposition(G… 2010 Q1   205   NA     1.08  NA              190.
## 20 "classical_decomposition(G… 2010 Q2   236   NA     1.01  NA              234.

gas3 |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  autoplot()

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

There is a huge component of variability that I previously did not notice in the early 2000’s which now that I know it’s there it’s easier to see where it is but it’s hard to notice if it’s not immediately pointed out to you.

myseries |>
  autoplot()

myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components() |> 
  autoplot()

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. Is the recession of 1991/1992 visible in the estimated components?

From the decomposition it is very easy to see the trend and seasonality of the labor force from the STL. The upward trend line is extremely distinct in the data but the trend part of the graph helps to see it without any of the seasonality. The seasonality can be clearly seen in the graph in the seasonality part of the graph and it really shows how regular the change from month to month is. In the remainder section there is a very clear delineation for when the 1990/1991 recession. You can also see the seasonality in the month to month average in Figure 3.20 and the distinct variability from year to year for each month with interesting trends for each month being different across the years.

DATA624_HW2_CameronGray

Cameron Gray

2025-02-12

3.1

3.2

3.3

3.4

3.7

3.8

3.9