library(fpp3)
library(tsibbledata)
library(tidyverse)
library(seasonal)

3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

Plot of the GDP per capita for each country over time

# Creating a variable called GDP per capita
gdp_pc <- global_economy %>%
  mutate(GDP_per_capita = GDP / Population)

# The legend was in the way so, it was removed
gdp_pc %>%
  autoplot(GDP_per_capita) +
  labs(title = "GDP per capita for each country over time",
       y = "GDP per capita") +
   guides(colour = "none")  

The plot is visible now but the countries are unidentifable so I will just filter out the top 10 gdp/capita and plot them

top10_gdp_per_cap <- gdp_pc %>%
  as_tibble() %>%                      
  arrange(desc(GDP_per_capita)) %>%    
  slice_head(n = 10)                   

top10_gdp_per_cap
## # A tibble: 10 × 10
##    Country       Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>         <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Monaco        MCO    2014 7060236168.  7.18     NA      NA      NA      38132
##  2 Monaco        MCO    2008 6476490406.  0.732    NA      NA      NA      35853
##  3 Liechtenstein LIE    2014 6657170923. NA        NA      NA      NA      37127
##  4 Liechtenstein LIE    2013 6391735894. NA        NA      NA      NA      36834
##  5 Monaco        MCO    2013 6553372278.  9.57     NA      NA      NA      37971
##  6 Monaco        MCO    2016 6468252212.  3.21     NA      NA      NA      38499
##  7 Liechtenstein LIE    2015 6268391521. NA        NA      NA      NA      37403
##  8 Monaco        MCO    2007 5867916781. 14.4      NA      NA      NA      35111
##  9 Liechtenstein LIE    2016 6214633651. NA        NA      NA      NA      37666
## 10 Monaco        MCO    2015 6258178995.  4.94     NA      NA      NA      38307
## # ℹ 1 more variable: GDP_per_capita <dbl>

Monaco and Liechtenstein seem to have the highest gdp per capita. We can filter the original tstibble to just these two countries.

top_countries <- top10_gdp_per_cap %>%
  pull(Country) %>% unique()

# Filter gdp_pc back to just those countries
gdp_pc %>%
  filter(Country %in% top_countries) %>%
  autoplot(GDP_per_capita) +
  labs(title = "GDP per capita over time (Top countries)",
       y = "GDP per capita (US$)")
## Warning: Removed 22 rows containing missing values or values outside the scale range
## (`geom_line()`).

Based on this plot, Monaco has the highest GDP per capita followed by Liechtenstein. Over time, their GDP per capita has scaled drastically compared to other countries. Based on the plot with all the countires, Monaco and Liechtenstein are the only two countries that have a GDP per capita over $150,000 after 2010. All of the other groups have steadily grown in GDP per capita but none as fast as Monaco and Liechtenstein.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. - United States GDP from global_economy. - Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. - Victorian Electricity Demand from vic_elec. - Gas production from aus_production.

United States GDP

us_gdp <- global_economy %>%
  filter(Country == "United States") 

autoplot(us_gdp, GDP) +
  labs(title = "US GDP", y = "US$")

This time series shows expontenial growth so performing a log transformation would be appropraite here.

# Log transform
autoplot(us_gdp, log(GDP)) +
  labs(title = "US GDP (log scale)", y = "log(US$)")

The log transformed plot looks more linear, more smooth compared to the previous plot of the US GDP. In this case, a log transformation was appropriate.

Slaughter of Victorian of Bulls, bullocks and steers

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(Count) +
  labs(title = "Victorian Slaughter of Bulls, bullocks and steers", y = "Count")

This is count data so the y-axis is fairly steady. Unlike previously where there was exponentially growth over time, count data is more stable and to scale so transformation is not necessary.

Victorian Elecricity Demand

vic_elec %>%
  autoplot(Demand) +
  labs(title = "Victorian Electricity Demand", y = "MW")

There seems some seasonality into be a spike in electricity demand at the beginning of every year. There doesn’t seem to be much variation so transformation is not necessary here.

Gas production

aus_production %>%
  autoplot(Gas) +
  labs(title = "Australian Gas Production", y = "Terajoules")

There is strong upward growth and changes in variance in gas production. From 1960 Q1 to 1970, there does not seem to be much change in gas production. After 1970, there is a significant increase in gas production, which may warrant a log transformation.

aus_production %>%
  mutate(log_Gas = log(Gas)) %>%
  autoplot(log_Gas) +
  labs(title = "Log of Australian Gas Production", y = "log(Terajoules)")

Performing a log variation of gas production stabilizes and smooths the strong upward growth.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume) +
  labs(title = "Canadian Gas Production", y = "Volume")

# Box-Cox transformation (automatic lambda)
lambda <- canadian_gas %>% 
  features(Volume, features = guerrero) %>% 
  pull(lambda_guerrero)

canadian_gas %>% 
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

The original plot shows a strong upward trend and clear seasonality. A Box–Cox transformation is unhelpful because the variance is stable. Box-Cox transformation is useful when the seasonal variation changes in the series. The plot of the Box-Cox transformation looks almost identical to the orginial plot.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Exercise 7 in Section 2.10 looks at Monthly Australian retail data in aus_retail

head(aus_retail)
## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State                        Industry            `Series ID`    Month Turnover
##   <chr>                        <chr>               <chr>          <mth>    <dbl>
## 1 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Apr      4.4
## 2 Australian Capital Territory Cafes, restaurants… A3349849A   1982 May      3.4
## 3 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Jun      3.6
## 4 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Jul      4  
## 5 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Aug      3.6
## 6 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Sep      4.2
STATE    <- "Victoria"
INDUSTRY <- "Cafes, restaurants and takeaway food services"

retail <- aus_retail %>%
  filter(State == STATE, Industry == INDUSTRY)

autoplot(retail, Turnover) +
  labs(title = paste(STATE, "—", INDUSTRY),
       y = "Million")

I plotted one series from Exercise 7. The time plot shows a strong upward trend over time. There is a clear seasonal pattern, with spikes in sales every December. The size of these seasonal peaks increases as the series grows, indicating changing seasonal variance. A Box–Cox transformation with λ ≈ 0 (log transform) is appropriate here as it stabilizes the variance and makes the seasonal fluctuations more proportional.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco

aus_production %>%
  autoplot(Tobacco) +
  ggtitle("Tobacco Production")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

This time plot does not show a distinct change in seasonal variance. It is hard to tell if a Box-Cox transformation will have any benefit since the variance seems constant.

lambda <- aus_production %>% 
  features(Tobacco, features = guerrero) %>% 
  pull(lambda_guerrero)

aus_production %>% 
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Here we can see that the box-cox transformation had very little effect on the orginal plot. Lambda is 0.93, indicating a Box-Cox transformation is not necessary.

Economy Class Passengers

ansett %>%
  filter(Class == "Economy", Airports == "MEL-SYD") %>%
  autoplot(Passengers) +
  labs(title = "Economy Class Passengers MEL–SYD", y = "Passengers")

This plot shows that variance is not constant so a log transformation would be useful here.

x <- ansett %>%
  filter(Class == "Economy", Airports == "MEL-SYD")

lambda <- x %>%
  features(Passengers, guerrero) %>%
  pull(lambda_guerrero)

x %>%
  mutate(Passengers_bc = box_cox(Passengers, lambda)) %>%
  autoplot(Passengers_bc) +
  labs(y = "Box–Cox(Passengers)",
       title = paste0("MEL–SYD Economy (λ = ", round(lambda, 2), ")"))

The Box-Cox transformation did not improve the variation in the plot. For some reason the lambda is equal to two, which may be due to the zeros in the data.

x %>%
  mutate(Passengers_log = box_cox(Passengers + 1, 0)) %>%
  autoplot(Passengers_log) +
  labs(title = "MEL–SYD Economy (Box–Cox λ = 0, log transform)",
       y = "log(Passengers + 1)")

I forced the Box–Cox parameter to λ = 0, which corresponds to a log transformation, and added 1 passenger to each value to remove the zeros. If we exclude the couple of weeks before 1990, we can see that the transformation stabilized the variance. Thus, a log box-cox transformation is appropriate in this series.

Pedestrian

ped <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")

autoplot(ped, Count) +
  labs(title = "Pedestrian counts — Southern Cross Station",
       y = "Count")

This time series is a bit hard to see as it is hourly data over 2 years, Let’s shorten the time span to the first four weeks in Jan 2015.

ped %>%
  filter_index("2015-01-01" ~ "2015-01-28") %>%
  autoplot(Count) +
  labs(title = "Southern Cross pedestrians — Jan 2015 (four weeks)")

There seems to be five spikes consistently. Since this is pedestrian data from the Southern Cross Station, we can assume that the peaks are during weekdays rush hour. The variance is not constant so we can use a log box-cox transforamtion.

lambda_ped <- ped %>%
  features(Count, guerrero) %>%
  pull(lambda_guerrero)

ped %>%
  filter_index("2015-01-01" ~ "2015-01-28") %>% 
  mutate(Count_bc = box_cox(Count, lambda_ped)) %>%
  autoplot(Count_bc) +
  labs(title = paste0("Southern Cross Pedestrian counts (λ = ",
                      round(lambda_ped, 2), ")"),
       y = "Box–Cox(Count)")

For this pedestrian series, the plot improved slightly with a box-cox transformation.The variation in spikes during the weekends increased slightly but still not at the peaks during the weekdays.

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

a.Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(gas, Gas) + labs(title="Gas: last 5 years", y="Terajoules")

There is a upward trend and clear seasonality. There seems to be a spike the terajoules during the summer quarters and a drop in the colder quarters.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>% 
  model(classical_decomposition(Gas, type = "multiplicative")) %>% 
  components() %>% 
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Gas")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

There is a clear upward trend in gas production. The seasonal plot shows troughs at the beginning of the year and peaks around the middle of the year, indicating clear seasonality.

c. Do the results support the graphical interpretation from part a?

Part a showed upward movement in gas production and clear quarterly seasonality. This multiplicative classical decomposition confirms the gradual uptrend and seasonal fluctuations. The remainder is small compared to the total gas production, suggesting that most of the composition is explained by the trend and seasonal component.

d. Compute and plot the seasonally adjusted data.

dcmp <- gas %>% 
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components()

# seasonally adjusted series = Gas / seasonal
autoplot(dcmp, season_adjust) +
  labs(title = "Seasonally adjusted Gas production",
       y = "Seasonally adjusted Gas")

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_outlier <- gas |> mutate(Gas = if_else(row_number()==1, Gas + 300, Gas))

dcmp_outlier <- gas_outlier |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components()

autoplot(dcmp_outlier, season_adjust) +
  labs(title = "Seasonally adjusted Gas (with outlier)")

I added the outlier to the first row, which caused a spike in the beginning of the data series. This outlier distorted the trend component. It no longer shows a steady increase in gas production over time. However, the seasonal component does not show much of a change.

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Yes, it would make a difference. An outlier near the end of a series has a large impact on the trend of the data.Classical decomposition uses moving averages to estimate the trend. If the outlier was at the end, it would not be included in the moving average window at the end so it will cause a sudden spike or drop on the trend plot. If the outlier was in the middle, that one outlier would be averaged out with the other numbers.

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

retail <- aus_retail %>% 
  filter(State == "Victoria",
         Industry == "Cafes, restaurants and takeaway food services")

x11_dcmp <- retail %>% 
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()

autoplot(x11_dcmp) +
  labs(title = "X-11 decomposition of Victorian retail turnover")

The decomposed series plot shows a long term trend over time. The seasonal component shows spikes in every December. The remainder shows irregular drops in turn at the beginning for 2000, which is not shown in the trend and seasonal component.

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The STL decomposition shows a upward trend in labor force from 1978 to 1995. The seasonal component also shows the same peak and trough around the same time, indicating strong seasonality and low variations. The remainder shows a sharp, irregular drop in 1991 to 1992. However, the drop is only -400 which is small when compared to the trend which is around 9000. There this decline does not have a significant impact on the upward trend in civilian labor.

b. Is the recession of 1991/1992 visible in the estimated components?

The recession of 1991/1992 is extremely noticeable in the remainder component as a large negative deviation. However, the trend continues upward despite the recession because only a small portion of the trend is attributed to the remainder.