Assignment 2

Load libraries

library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.3
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## Warning: package 'lubridate' was built under R version 4.3.3
## Warning: package 'tsibble' was built under R version 4.3.3
## Warning: package 'feasts' was built under R version 4.3.3
## Warning: package 'fabletools' was built under R version 4.3.3
## Warning: package 'fable' was built under R version 4.3.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

Question 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Inspect the global_economy dataset

global_economy
## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows

Create the GDP Per capita col

global_economy <- global_economy %>%
  mutate(gdp_per_capita = GDP / Population)

plot the gdp per capita

global_economy %>%
  autoplot(gdp_per_capita, show.legend= FALSE) +
  labs(title = "GDP Per Capita by Country") +
  xlab("Year")+
  ylab("GDP Per Capita")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

Find the country with the highest GDP Per Capita

highest_avg_gdp_country <- global_economy %>%
  group_by(Country) %>%
  summarise(avg_gdp_per_capita = mean(gdp_per_capita)) %>%
  slice_max(avg_gdp_per_capita, n = 1)

highest_avg_gdp_country
## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country  Year avg_gdp_per_capita
##   <fct>   <dbl>              <dbl>
## 1 Monaco   2014            185153.

The country with the higest average GDP per capita is Monaco.
So now lets plot Monacos GDP over time.

global_economy %>%
  filter(Country == "Monaco") %>%
  autoplot(gdp_per_capita) +
  labs(title = "Monaco's GDP Per Capita") +
  xlab("Year") +
  ylab("GDP Per Capita")
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).

From the plot above we can see that Monaco’s GDP has risen consistently over the years since the data began in 1970. We can see a steady rise until the 1980s where there was a slight dip but then in 1985 there was a strong recovery. Then we see another steady rise until about the mid 90s where there was a small dip until the early 2000s where we then see a huge recovery and spike until the 2008 recession. Then finally we see one last recovery. So overall we see Monacos GDP rising over time.

Question 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy

US_gdp <- global_economy %>%
  filter(Country == "United States")

US_gdp %>%
  autoplot(gdp_per_capita) + 
  labs(title = "US GDP Per Capita Over Time") +
  xlab("Year")+
  ylab("GDP Per Capita")

The GDP is rising fast over the years with what looks like an exponential curve so I will try a log and a box-cox transformation on it to try and flatten the curve a bit. US GDP per capita with log transformation

US_gdp %>%
  autoplot(log(gdp_per_capita)) + 
  labs(title = "US GDP Per Capita Over Time (Log transformation)") +
  xlab("Year")+
  ylab("GDP Per Capita")

Try a box cox

US_gdp %>%
  features(gdp_per_capita, features=guerrero)
## # A tibble: 1 × 2
##   Country       lambda_guerrero
##   <fct>                   <dbl>
## 1 United States           0.393
US_gdp %>%
  autoplot(box_cox(gdp_per_capita, 0.3929072)) + 
  labs(title = "US GDP Per Capita Over Time (Box-Cox transformation)") +
  xlab("Year")+
  ylab("GDP Per Capita")

We can see that both the log and box-cox transformations help to flatten out the plot to change the plot from the original exponential growth to now a much more linear growth. This shows that the US GDP per capita has been growing pretty steadily over the years and this should help to stabilize any decomposition or forecast model produced from this data.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock

vic_data <- aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") 
vic_data %>%
  autoplot(Count) + 
  labs("Slaughter of Victorian Bulls, bullocks and steers") +
  xlab("Number of Slaughters") +
  ylab("Year Month")

Since the variance seems to be all over the place with a lot of flucuations lets try a box-cox transformation

vic_data %>%
  features(Count, features = guerrero)
## # A tibble: 1 × 3
##   Animal                     State    lambda_guerrero
##   <fct>                      <fct>              <dbl>
## 1 Bulls, bullocks and steers Victoria         -0.0446
vic_data %>%
  autoplot(box_cox(Count, -0.04461887   )) + 
  labs("Box-Cox Transformed Slaughter of Victorian Bulls, bullocks and steers") +
  xlab("Number of Slaughters") +
  ylab("Year Month")

After the box-cox transformation we can see while there are still a lot of flucuations the variance is much more stable than in the original plot.

Victorian Electricity Demand from vic_elec

vic_elec %>%
  autoplot(Demand) +
  labs(title ="Half-hourly electricity demand for Victoria, Australia" , x="Time (30 Minute Intervals)", y="Total Electricity Demand in MWh.")

Apply a box-cox transformation

vic_elec %>%
  features(Demand, features = guerrero)
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1          0.0999
vic_elec %>%
  autoplot(box_cox(Demand, 0.09993089)) +
  labs(title ="Box-Cox Transformed Half-hourly electricity demand for Victoria, Australia" , x="Time (30 Minute Intervals)", y="Total Electricity Demand in MWh.")

The box cox transformation does not to seemed to have much of an effect so a transformation does not seem appropriate.

Gas production from aus_production

aus_production %>%
  autoplot(Gas) +
  labs(title = "Quartly Austrialian Gas Production", x = "Time Quarter", y="Gas Production in Petajoules")

Because we can see that the variance is not stable and we have more variance at the end than at the beginning I will apply a box-cox transformation to try and stablize the variance.

aus_production %>%
  features(Gas, features = guerrero)
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.110
aus_production %>%
  autoplot(box_cox(Gas, 0.1095171)) + 
  labs(title = "Box-Cox Transformed Quartly Austrialian Gas Production", x = "Time Quarter", y="Gas Production in Petajoules")

Question 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume) +
  labs(title="Monthly Canadian Gas Production", x = "Month Year", y="Gas Production (Billions of cubic metres)")

Box-cox transformation

canadian_gas %>%
  features(Volume, features = guerrero)
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.577
canadian_gas %>%
  autoplot(box_cox(Volume, 0.5767648    )) +
    labs(title="Box-Cox Transformed Monthly Canadian Gas Production", x = "Month Year", y="Gas Production (Billions of cubic metres)")

A Box-Cox transformation is unhelpful for this data because as we can see from the first plot the variance is already stable across the plot. When I did the Box-Cox transformation just to see the results we can see that the plot is very similar to the original as it does not remove the trend so it is really just unhelpful in this case.

Question 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(777)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>% 
  features(Turnover, features = guerrero)
## # A tibble: 1 × 3
##   State           Industry                                      lambda_guerrero
##   <chr>           <chr>                                                   <dbl>
## 1 South Australia Cafes, restaurants and takeaway food services           0.371

Optimal lambda value is 0.3713155

myseries %>% 
  autoplot(box_cox(Turnover, 0.3713155)) +
  labs(title = "Box-Cox Transformed Australian Retail Turnover" , x="Month Year", y="Turnover")

I used the features function with parameters guerrero to find the optimal lambda value for the box-cox transformation and then I used that lambda in the box-cox transformation to produce the plot above. We can see it stabilizes the variance across the time series.

###Question 3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian

Tobacco from aus_production

aus_production %>%
  autoplot(Tobacco) +
  labs(title = "Quarterly Australian Tobacco Production", x="Year Quarter", y="Tobacco and Cigarette Production in Tonnes")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Find the optimal lambda for box-cox transformation

aus_production %>%
  features(Tobacco, features = guerrero)
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.926

Optimal lambda value is 0.9264636 so now perform the box-cox to see if it stabilizes the variance.

aus_production %>%
  autoplot(box_cox(Tobacco, 0.9264636)) +
  labs(title = "Box-Cox Transformed Quarterly Australian Tobacco Production" , x="Quarter Year", y="Tobacco and Cigarette Production in Tonnes")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Since the lambda value is close to 1 the transformation does not do too much because the variance is already pretty stable.

Economy class passengers between Melbourne and Sydney from ansett

econ_data <- ansett %>%
  filter(Class == "Economy",Airports == "MEL-SYD")
econ_data
## # A tsibble: 282 x 4 [1W]
## # Key:       Airports, Class [1]
##        Week Airports Class   Passengers
##      <week> <chr>    <chr>        <dbl>
##  1 1987 W26 MEL-SYD  Economy      20167
##  2 1987 W27 MEL-SYD  Economy      20161
##  3 1987 W28 MEL-SYD  Economy      19993
##  4 1987 W29 MEL-SYD  Economy      20986
##  5 1987 W30 MEL-SYD  Economy      20497
##  6 1987 W31 MEL-SYD  Economy      20770
##  7 1987 W32 MEL-SYD  Economy      21111
##  8 1987 W33 MEL-SYD  Economy      20675
##  9 1987 W34 MEL-SYD  Economy      22092
## 10 1987 W35 MEL-SYD  Economy      20772
## # ℹ 272 more rows
econ_data %>%
  autoplot(Passengers) + 
  labs(title="Economy Passenger numbers on Ansett airline flights", x="Year Week", y="Number of Economy Passengers")

econ_data %>%
  features(Passengers, features = guerrero)
## # A tibble: 1 × 3
##   Airports Class   lambda_guerrero
##   <chr>    <chr>             <dbl>
## 1 MEL-SYD  Economy            2.00
econ_data %>%
  autoplot(box_cox(Passengers, 1.999927)) + 
  labs(title="Box-Cox Transformed Economy Passenger numbers on Ansett airline flights", x="Year Week", y="Number of Economy Passengers")

With a lambda of 1.97 it is performing a transformation raising it to 1.97 and as you can see above it is able to somewhat stabilize the variance.

Pedestrian counts at Southern Cross Station from pedestrian

scs_data <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")
scs_data
## # A tsibble: 17,539 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [1]
##    Sensor                 Date_Time           Date        Time Count
##    <chr>                  <dttm>              <date>     <int> <int>
##  1 Southern Cross Station 2015-01-01 00:00:00 2015-01-01     0   746
##  2 Southern Cross Station 2015-01-01 01:00:00 2015-01-01     1   312
##  3 Southern Cross Station 2015-01-01 02:00:00 2015-01-01     2   180
##  4 Southern Cross Station 2015-01-01 03:00:00 2015-01-01     3   133
##  5 Southern Cross Station 2015-01-01 04:00:00 2015-01-01     4    44
##  6 Southern Cross Station 2015-01-01 05:00:00 2015-01-01     5    16
##  7 Southern Cross Station 2015-01-01 06:00:00 2015-01-01     6    13
##  8 Southern Cross Station 2015-01-01 07:00:00 2015-01-01     7    21
##  9 Southern Cross Station 2015-01-01 08:00:00 2015-01-01     8    39
## 10 Southern Cross Station 2015-01-01 09:00:00 2015-01-01     9    36
## # ℹ 17,529 more rows
scs_data %>%
  autoplot(Count) +
  labs(title="Hourly Pedestrian Counts in Southern Cross Station", x="Date Time", y="Pedestrian Counts")

The variance seems to be pretty stable already with just slightly higher towards the end of the time series.

scs_data %>%
  features(Count, features = guerrero)
## # A tibble: 1 × 2
##   Sensor                 lambda_guerrero
##   <chr>                            <dbl>
## 1 Southern Cross Station          -0.250
scs_data %>%
  autoplot(box_cox(Count, -0.2501616    )) +
  labs(title="Box-Cox Transformed Hourly Pedestrian Counts in Southern Cross Station", x="Date Time", y="Pedestrian Counts")

With a lambda value of -0.25 it does seem to have had an effect on the data and it does seem to have stabilized the variance across the time series.

Question 3.7

Consider the last five years of the Gas data from aus_production

####Part A Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas <- tail(aus_production, 5*4) |> select(Gas)
gas %>%
  autoplot(Gas) + 
  labs(title = "Quarterly Australian Gas Production", x="Year Quarter", y="Gas production in Petajoules")

The plot above shows that Australian gas production does follow a seasonal pattern with consistent dips in Q1 then rises until the end of Q2 where it then falls until the start of the next Q1 for the next year. There also seems to be a upward trend in production over the last 5 years.

Part B

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Australian Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

#### Part C Do the results support the graphical interpretation from part a? Yes the results do support the graphical interpretation from part a. In the first plot we saw a consistent seasonal pattern that the seasonal component in the decomposition above confirms. We also saw a slight upward trend that is also supported by the trend component above.

Part D

Compute and plot the seasonally adjusted data.

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Australian Gas Production", x="Year Quarter", y="Gas production in Petajoules")

Part E

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

Add in the outlier at 2006 Q2 by adding 300 to it

gas_out <- gas %>%
  mutate(Gas = replace(Gas, row_number() == 4, Gas + 300))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Gas = replace(Gas, row_number() == 4, Gas + 300)`.
## Caused by warning in `x[list] <- values`:
## ! number of items to replace is not a multiple of replacement length
gas_out %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Australian Gas Production with outlier added", x="Year Quarter", y="Gas production in Petajoules")

The outlier has a huge effect on the seasonal decomposition. From the plot above we can see it completely alters the decomposition adding in a huge spike where the outlier is and also changing the rest of the graph as well with much more peaks and valleys that were not present in the outlier free plot.

Part F

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_out_end <- gas %>%
  mutate(Gas = replace(Gas, row_number() == 16, Gas + 300))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Gas = replace(Gas, row_number() == 16, Gas + 300)`.
## Caused by warning in `x[list] <- values`:
## ! number of items to replace is not a multiple of replacement length
gas_out_middle <- gas %>%
    mutate(Gas = replace(Gas, row_number() == 8, Gas + 300))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Gas = replace(Gas, row_number() == 8, Gas + 300)`.
## Caused by warning in `x[list] <- values`:
## ! number of items to replace is not a multiple of replacement length
gas_out_end %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Australian Gas Production with outlier added", x="Year Quarter", y="Gas production in Petajoules")

gas_out_middle %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Australian Gas Production with outlier added", x="Year Quarter", y="Gas production in Petajoules")

From comparing the two graphs the first has the outlier at the end and the second has the outlier in the middle of the time series. We can see that it does matter where the outlier is because that is where the spike in the trend will be however it will not alter the other parts of the plot much but it will have a impact on where the spike is.

###Question 3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

myseries %>%
  autoplot(Turnover)

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
x11_dcmp %>%
  autoplot() +
  labs(title ="Decomposition of Australian Retail Trade Turnover")

Looking at the original and then the decomposed plots we can see very clearly that there is a strong seasonal pattern that the time series follows. While a seasonal trend was noticed originally I did not realize how strong it was until seeing the decomposition. We can also easily see cyclic effects like the .com bubble in the early 2000s causing a spike in the trend and the 2008 recession and 2019 covid pandemic causing dips in the trend.

###Question 3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labor force in Australia each month from February 1978 to August 1995.

####Part A Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Looking at the trend from the STL decomposition we can see that there is a clear upward trend in the Australian labor workforce, there are some drop offs but the drop offs are following by rallies keeping the overall upward trend. Then the seasonal component shows a strong annual seasonal pattern. Then looking at the monthly seasonal subplot we can see that it shows dips in the workforce in January and August and spikes in December and March this could be due to seasonal hires beginning and ending.