Homework1

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code

Exercise 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Solution

First, we load the necessary libraries and validate the columns for the dataset.

#if (!require("fpp3")) install.packages("fpp3")
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
global_economy|> glimpse()
## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country    <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code       <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year       <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP        <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports    <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports    <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…

We can see that the dataset contains the columns: Country, Year, Population, GDP, and GDP per capita.

In global_economy, the column GDP values are stored in millions of US dollars. The first record that we got is 537777811. This means that the GDP for Afghanistan in 1960 was 537,777,811 million US dollars.

  1. Lets adjust the population and GDP.

GDP is stored in billions of US dollars, so we need to multiply the GDP column by 1,000,000 to get the actual GDP value in US dollars. Same as the Population column, we need to multiply it by 1,000 to get the actual population value.

gdp_pc <- global_economy |>
  mutate(
    GDP = GDP * 1e9,         
    Population = Population * 1e3,  
    GDP_per_capita = GDP / Population
  )
  1. Now we can plot the GDP per capita for each country over time.
gdp_pc |>
  ggplot(aes(x = Year, y = GDP_per_capita, group = Country)) +
  geom_line(alpha = 0.2) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "GDP per capita for all countries",
       y = "GDP per capita (US$)", x = NULL)

  1. To find the country with the highest GDP per capita, we can filter the dataset for the maximum GDP per capita value and removing null values.
latest_year <- max(gdp_pc$Year, na.rm = TRUE)

gdp_pc |>
  arrange(desc(GDP_per_capita))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## # A tsibble: 15,150 x 10 [1Y]
## # Key:       Country [263]
##    Country       Code   Year     GDP Growth   CPI Imports Exports Population
##    <fct>         <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Monaco        MCO    2014 7.06e18  7.18     NA      NA      NA   38132000
##  2 Monaco        MCO    2008 6.48e18  0.732    NA      NA      NA   35853000
##  3 Liechtenstein LIE    2014 6.66e18 NA        NA      NA      NA   37127000
##  4 Liechtenstein LIE    2013 6.39e18 NA        NA      NA      NA   36834000
##  5 Monaco        MCO    2013 6.55e18  9.57     NA      NA      NA   37971000
##  6 Monaco        MCO    2016 6.47e18  3.21     NA      NA      NA   38499000
##  7 Liechtenstein LIE    2015 6.27e18 NA        NA      NA      NA   37403000
##  8 Monaco        MCO    2007 5.87e18 14.4      NA      NA      NA   35111000
##  9 Liechtenstein LIE    2016 6.21e18 NA        NA      NA      NA   37666000
## 10 Monaco        MCO    2015 6.26e18  4.94     NA      NA      NA   38307000
## # ℹ 15,140 more rows
## # ℹ 1 more variable: GDP_per_capita <dbl>
leaders_all <- gdp_pc |>
  index_by(Year) |>
  slice_max(order_by = GDP_per_capita, n = 1)

From the output, we can see that Monaco has the highest GDP per capita on 2014.

  1. Visualize the GDP per capita for Monaco over time.

First, I will keep only the yearly winners.

leaders_all <- gdp_pc |>
  index_by(Year) |>
  slice_max(order_by = GDP_per_capita, n = 1)

leaders_all |>
  ggplot(aes(Year, GDP_per_capita, colour = Country)) +
  geom_line(linewidth = 1.2) +
  geom_point() +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "Highest GDP per capita each year (all countries)", y = "US$")

In the graph above, we can see that Monaco has been the country with the highest GDP per capita for most of the years. However, there are some years where other countries like Liechtenstein and Luxembourg have taken the lead.

Exercise 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

Solution

  • United States GDP from global_economy.
library(fpp3)

us_gdp <- global_economy |>
  filter(Country == "United States") |>
  select(Year, GDP)        
  • Let’s plot the US GDP over time.
us_gdp |>
  autoplot(GDP) +
  labs(title = "US GDP Over Time",
       y = "Gross Domestic Product",
       x = "Year")

In this case, we should transform the GDP values to a logarithmic scale. This is because GDP values can vary widely, and a logarithmic transformation helps to compress the scale, making it easier to visualize changes over time, especially for countries with large economies.

  • Estimate the optimal Box-Cox transformation parameter for the US GDP data.
lambda <- us_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)
print(lambda)  
## [1] 0.2819443

The optimal Box-Cox transformation parameter (lambda) for the US GDP data is approximately 0.28. This indicates that a power transformation with an exponent of 0.28 would be suitable for stabilizing the variance in the GDP data.

if (!require("latex2exp")) install.packages("latex2exp")
## Loading required package: latex2exp
library(latex2exp)
us_gdp |>
  autoplot(box_cox(GDP, lambda)) +
  labs(
    title = TeX(paste0("US GDP: Box Cox transform ( $\\lambda$ = ", round(lambda, 2), " )")),
    y = "Transformed units", x = NULL
  )

In the plot above, we can see the US GDP data after applying the Box Cox transformation with the estimated lambda value. The transformation helps to stabilize the variance and makes the trend more linear.

  • Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

First, we load the dataset and filter for the specific animal and state.

library(fpp3)
library(dplyr)

livestock_bulls <- tsibbledata::aus_livestock |>
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") 

print(livestock_bulls)
## # A tsibble: 510 x 4 [1M]
## # Key:       Animal, State [1]
##       Month Animal                     State     Count
##       <mth> <fct>                      <fct>     <dbl>
##  1 1976 Jul Bulls, bullocks and steers Victoria 109200
##  2 1976 Aug Bulls, bullocks and steers Victoria  94700
##  3 1976 Sep Bulls, bullocks and steers Victoria  95500
##  4 1976 Oct Bulls, bullocks and steers Victoria  94800
##  5 1976 Nov Bulls, bullocks and steers Victoria  94100
##  6 1976 Dec Bulls, bullocks and steers Victoria  98300
##  7 1977 Jan Bulls, bullocks and steers Victoria  93500
##  8 1977 Feb Bulls, bullocks and steers Victoria 102000
##  9 1977 Mar Bulls, bullocks and steers Victoria 102600
## 10 1977 Apr Bulls, bullocks and steers Victoria  91500
## # ℹ 500 more rows

Now, we can plot the data.

autoplot(livestock_bulls, Count) +
  labs(title = "Victoria: Bulls, bullocks & steers slaughter ",
       y = "Number of animals", x = NULL)

Now we can estimate the optimal Box-Cox transformation parameter for the livestock data.

lambda_hat <- livestock_bulls |>
  features(Count, guerrero) |>
  pull(lambda_guerrero)

print(lambda_hat)
## [1] -0.04461887

This indicates that a power transformation with an exponent of -0.0446 would be suitable for stabilizing the variance in the livestock data.

Now, We will pplot the transforrmed series

livestock_bulls |>
  autoplot(box_cox(Count, lambda_hat)) +
  labs(
    title = TeX(paste0("Victoria slaughter (Box-Cox, $\\lambda$ = ",
                       round(lambda_hat, 3), ")")),
    y = "Transformed units", x = NULL
  )

This plot shows strong seasonality with a long-run downward trend.

  • Victorian Electricity Demand from vic_elec.

The first step is to load the dataset.

library(fpp3)
vic_elec <- tsibbledata::vic_elec
print(vic_elec)
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
##    Time                Demand Temperature Date       Holiday
##    <dttm>               <dbl>       <dbl> <date>     <lgl>  
##  1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
##  2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
##  3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
##  4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
##  5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
##  6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE   
##  7 2012-01-01 03:00:00  3694.        20.1 2012-01-01 TRUE   
##  8 2012-01-01 03:30:00  3562.        19.6 2012-01-01 TRUE   
##  9 2012-01-01 04:00:00  3433.        19.1 2012-01-01 TRUE   
## 10 2012-01-01 04:30:00  3359.        19.0 2012-01-01 TRUE   
## # ℹ 52,598 more rows

Now, we can plot the data.

autoplot(vic_elec, Demand) +
  labs(title = "Victoria: Electricity demand ",
       y = "Demand (MW)", x = NULL)

Choose the optimal Box-Cox transformation parameter for the electricity demand data.

lambda_vic <- vic_elec |>
  features(Demand, guerrero) |>
  pull(lambda_guerrero)
print(lambda_vic)
## [1] 0.09993089

The optimal Box-Cox transformation parameter (lambda) for the Victorian electricity demand data is approximately 0.11. This indicates that a power transformation with an exponent of 0.11 would be suitable for stabilizing the variance in the electricity demand data.

Now, we can plot the transformed series.

vic_elec |>
  autoplot(box_cox(Demand, lambda_vic)) +
  labs(
    title = TeX(paste0("Victoria electricity demand: Box--Cox ($\\lambda = ",
                       round(lambda_vic, 2), "$)")),
    y = "Transformed units", x = NULL
  )

The plot shows strong seasonality .

  • Gas production from aus_production.

First, we load the dataset.

library(fpp3)
aus_production <- tsibbledata::aus_production
print(aus_production)
## # A tsibble: 218 x 7 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
##  1 1956 Q1   284    5225    189    465        3923     5
##  2 1956 Q2   213    5178    204    532        4436     6
##  3 1956 Q3   227    5297    208    561        4806     7
##  4 1956 Q4   308    5681    197    570        4418     6
##  5 1957 Q1   262    5577    187    529        4339     5
##  6 1957 Q2   228    5651    214    604        4811     7
##  7 1957 Q3   236    5317    227    603        5259     7
##  8 1957 Q4   320    6152    222    582        4735     6
##  9 1958 Q1   272    5758    199    554        4608     5
## 10 1958 Q2   233    5641    229    620        5196     7
## # ℹ 208 more rows

We can plot the gas production data.

aus_production |>
  autoplot(Gas) +
  labs(title = "Australia: Gas production ",
       y = "Gas (millions of tonnes)", x = NULL)

Now, we can estimate the optimal Box-Cox transformation parameter for the gas production data.

lambda_gas <- aus_production |>
  features(Gas, guerrero) |>
  pull(lambda_guerrero)
print(lambda_gas)
## [1] 0.1095171

The optimal Box-Cox transformation parameter (lambda) for the gas production data is approximately 0.11 This indicates that a power transformation with an exponent of 0.11 would be suitable for stabilizing the variance in the gas production data.

Now, we can plot the transformed series.

aus_production |>
  autoplot(box_cox(Gas, lambda_gas)) +
  labs(
    title = TeX(paste0("Australia gas production: Box-Cox ($\\lambda = ",
                       round(lambda_gas, 2), "$)")),
    y = "Transformed units", x = NULL
  )

Clear quarterly seasonality with a strong long run rise.s

Exercise 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Solution

Let’s first load the necessary libraries and the dataset to understand the data.

library(fpp3)
glimpse(canadian_gas)
## Rows: 542
## Columns: 2
## $ Month  <mth> 1960 Jan, 1960 Feb, 1960 Mar, 1960 Apr, 1960 May, 1960 Jun, 196…
## $ Volume <dbl> 1.4306, 1.3059, 1.4022, 1.1699, 1.1161, 1.0113, 0.9660, 0.9773,…

The canadian_gas dataset contains monthly gas production data in billions of cubic metres.

autoplot(canadian_gas, Volume) +
  labs(title = "Canadian Gas Production (Monthly)",
       y = "Billions of cubic metres")

The plot shows a clear upward trend and seasonal patterns in the gas production data. Now, let’s estimate the Long transformation parameter for the canadian_gas data.

canadian_gas <- canadian_gas |>
  mutate(log_Volume = log(Volume))

autoplot(canadian_gas, log_Volume) +
  labs(title = "Canadian Gas Production (Log-transformed)",
       y = "log(Billions of cubic metres)")

The log transformation helps to stabilize the variance and makes the trend more linear.

This part we will estimate the Box-Cox transformation parameter via Guerrero’s method.

features(canadian_gas, Volume, features = guerrero)
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.577

Although Guerrero suggests λ ≈ 0.58, applying a Box–Cox transform does not materially improve the series. The variance was already fairly stable, and the real modeling challenge is the strong trend and multiplicative seasonality, not variance instability.

Exercise 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Solution

To determine the appropriate Box-Cox transformation for retail data, we first need to load the necessary libraries and the dataset.

library(fpp3)
retail <-  aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
glimpse(retail)
## Rows: 441
## Columns: 5
## Key: State, Industry [1]
## $ State       <chr> "New South Wales", "New South Wales", "New South Wales", "…
## $ Industry    <chr> "Liquor retailing", "Liquor retailing", "Liquor retailing"…
## $ `Series ID` <chr> "A3349627V", "A3349627V", "A3349627V", "A3349627V", "A3349…
## $ Month       <mth> 1982 Apr, 1982 May, 1982 Jun, 1982 Jul, 1982 Aug, 1982 Sep…
## $ Turnover    <dbl> 41.7, 43.1, 40.3, 40.9, 42.1, 42.0, 46.1, 46.5, 53.8, 43.8…

Now I will use Guerrero’s method to estimate the optimal Box-Cox transformation parameter for the retail data.

lambda <- retail |>
  features(Turnover, guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] -0.0344145

The result is approximately 0.1. This indicates that a power transformation with an exponent of 0.1 would be suitable for stabilizing the variance in the retail data.

Now, we can plot the transformed series.

retail |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(title = sprintf("Retail turnover: Box–Cox (λ = %.2f)", lambda),
       y = "Transformed units", x = NULL)

The plot shows the retail turnover data after applying the Box-Cox transformation with the estimated lambda value. The transformation helps to stabilize the variance and makes the trend more linear.

In summary, I would select a Box-Cox transformation with λ ≈ 0.1 for the retail data to stabilize the variance.

Exercise 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Before start working on this exercise, we need to the rule to selet convenient transform. If λ = 1, no transformation is used. If λ = 0, a log transformation is used. If λ = 0.5, a square root transformation is used.

Now we will create a function to help us pick the right transformation.

selectlambda <- function(x) {
   lambda <- features(x, y, features = guerrero) |>
     pull(lambda_guerrero)
  lambdaused <- dplyr::case_when(
    abs(lambda - 0)   < 0.15 ~ 0,     
    abs(lambda - 0.5) < 0.15 ~ 0.5,   
    abs(lambda - 1)   < 0.15 ~ 1,     
    TRUE ~ lambda
  )
  list(lambda_hat = lambda, lambda_used = lambdaused)
}
  • Tobacco from aus_production
tobacco <- aus_production |>
  select(Quarter, Tobacco) |>
  filter(!is.na(Tobacco)) |>
  rename(y = Tobacco)

lam_tob <- selectlambda(tobacco)
lam_tob
## $lambda_hat
## [1] 0.9264636
## 
## $lambda_used
## [1] 1

The optimal Box-Cox transformation parameter (lambda) for the tobacco production data is approximately 0.11. This indicates variance is alreaady stable, so no transformation is needed.

  • Economy class passengers between Melbourne and Sydney from ansett
mel_syd_econ_wk <- ansett |>
  filter(Airports == "MEL-SYD", Class == "Economy") |>
  select(Week, Passengers)|>
  rename(y = Passengers)

#mel_syd_econ_mo <- mel_syd_econ_wk |>
#  index_by(Month = ~ yearmonth(.)) |>   
#  summarise(Passengers = sum(Passengers, na.rm = TRUE))|>
#  rename(y = Passengers)

#glimpse(mel_syd_econ_mo)

lam_mel_syd <- selectlambda(mel_syd_econ_wk)
lam_mel_syd
## $lambda_hat
## [1] 1.999927
## 
## $lambda_used
## [1] 1.999927

A square root transformation is appropriate for the Melbourne-Sydney economy class passenger data.

  • Pedestrian counts at Southern Cross Station from pedestrian
library(dplyr)
library(tsibble)
library(fpp3)

pedestrian_sc <-  pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  select(Sensor, Date_Time, Count)|>
  rename(y = Count)

 
lam_ped_sc <- selectlambda(pedestrian_sc)
lam_ped_sc
## $lambda_hat
## [1] -0.2501616
## 
## $lambda_used
## [1] -0.2501616

A square root transformation is appropriate for the pedestrian counts at Southern Cross Station.

Exercise 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

a.Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

library(dplyr)
library(tsibble)
library(fpp3)

gas <- tail(aus_production, 5*4) |> select(Quarter, Gas)
autoplot(gas, Gas) + labs(title="Gas Production: last 5 years", y="Units")

In the plot, we can observe clear seasonal fluctuations in the gas production data, with peaks and troughs occurring at regular intervals.

  1. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
dc <- gas |> model(classical_decomposition(Gas, type = "multiplicative"))
autoplot(components(dc))

comp <- components(dc)

trend_cycle <- comp |> select(Quarter, trend)         
seasonal_idx <- comp |>
  mutate(Q = quarter(Quarter)) |>
  group_by(Q) |>
  summarise(seasonal_index = mean(seasonal))          

trend_cycle
## # A tsibble: 20 x 2 [1Q]
##    Quarter trend
##      <qtr> <dbl>
##  1 2005 Q3   NA 
##  2 2005 Q4   NA 
##  3 2006 Q1  200.
##  4 2006 Q2  204.
##  5 2006 Q3  207 
##  6 2006 Q4  210.
##  7 2007 Q1  213 
##  8 2007 Q2  216.
##  9 2007 Q3  219.
## 10 2007 Q4  219.
## 11 2008 Q1  219.
## 12 2008 Q2  219 
## 13 2008 Q3  219 
## 14 2008 Q4  220.
## 15 2009 Q1  222.
## 16 2009 Q2  223.
## 17 2009 Q3  225.
## 18 2009 Q4  226 
## 19 2010 Q1   NA 
## 20 2010 Q2   NA
seasonal_idx
## # A tsibble: 20 x 3 [1Q]
## # Key:       Q [4]
##        Q Quarter seasonal_index
##    <int>   <qtr>          <dbl>
##  1     1 2006 Q1          0.875
##  2     1 2007 Q1          0.875
##  3     1 2008 Q1          0.875
##  4     1 2009 Q1          0.875
##  5     1 2010 Q1          0.875
##  6     2 2006 Q2          1.07 
##  7     2 2007 Q2          1.07 
##  8     2 2008 Q2          1.07 
##  9     2 2009 Q2          1.07 
## 10     2 2010 Q2          1.07 
## 11     3 2005 Q3          1.13 
## 12     3 2006 Q3          1.13 
## 13     3 2007 Q3          1.13 
## 14     3 2008 Q3          1.13 
## 15     3 2009 Q3          1.13 
## 16     4 2005 Q4          0.925
## 17     4 2006 Q4          0.925
## 18     4 2007 Q4          0.925
## 19     4 2008 Q4          0.925
## 20     4 2009 Q4          0.925
  1. Do the results support the graphical interpretation from part a?

Yes, the multiplicative decomposition shows a stable quarterly seasonal pattern and a flat to slightly rising trend cycle, matching the visual impression from part a.

  1. Compute and plot the seasonally adjusted data.
sa <- components(dc) |> select(Quarter, SA = season_adjust)
autoplot(sa, SA) + labs(title="Gas: seasonally adjusted", y="SA")

The seasonally adjusted data shows the underlying trend in gas production without the seasonal fluctuations.

  1. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas_ol <- gas |> mutate(Gas = if_else(row_number()==10, Gas+300, Gas))
dc_ol  <- gas_ol |> model(classical_decomposition(Gas, type="multiplicative"))
sa_ol  <- components(dc_ol) |> select(Quarter, SA = season_adjust)

autoplot(sa_ol, SA) + labs(title="Gas Production: seasonally adjusted with outlier", y="SA")

In this case, the outlier has a significant impact on the seasonally adjusted data, causing a noticeable spike in the adjusted values around the time of the outlier.

  1. Does it make any difference if the outlier is near the end rather than in the middle of the time series?
library(dplyr)
n <- nrow(gas)


gas_end <- gas |> mutate(Gas = if_else(row_number() == n, Gas + 300, Gas))
dc_end  <- gas_end |> model(classical_decomposition(Gas, type = "multiplicative"))
sa_end  <- components(dc_end) |> select(Quarter, SA = season_adjust) |> mutate(case = "end")

combined <- dplyr::bind_rows(
  as_tibble(sa),      # baseline
  as_tibble(sa_ol),   # mid outlier
  as_tibble(sa_end)   # end outlier
)

ggplot(combined, aes(Quarter, SA, colour = case)) +
  geom_line() +
  labs(title = "Seasonally adjusted (baseline vs outliers)", y = "SA", x = NULL)

Yes. An outlier near the end has a bigger, more persistent impact on the seasonally adjusted series than one in the middle.

Exercise 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Solution

retail <-  aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
glimpse(retail)
## Rows: 441
## Columns: 5
## Key: State, Industry [1]
## $ State       <chr> "South Australia", "South Australia", "South Australia", "…
## $ Industry    <chr> "Other retailing", "Other retailing", "Other retailing", "…
## $ `Series ID` <chr> "A3349433W", "A3349433W", "A3349433W", "A3349433W", "A3349…
## $ Month       <mth> 1982 Apr, 1982 May, 1982 Jun, 1982 Jul, 1982 Aug, 1982 Sep…
## $ Turnover    <dbl> 34.2, 34.4, 32.7, 36.2, 36.1, 36.0, 38.4, 41.2, 57.6, 36.6…

Now, we can decompose the retail time series using the X-11 method and extract the componets. Note: to run X-11, we need to convert the data to a plain monthly ts object first and call seas() with x11 = ““.

library(fpp3)
library(seasonal)
## 
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
## 
##     view
retail <- aus_retail |>
  filter(`Series ID` == sample(unique(`Series ID`), 1)) |>
  arrange(Month) |>
  select(Month, Turnover)

ret_ts <- ts(retail$Turnover,
             start = c(lubridate::year(min(retail$Month)),
                       lubridate::month(min(retail$Month))),
             frequency = 12)

fit <- seas(ret_ts, x11 = "")  

sa    <- series(fit, "d11")  
trend <- series(fit, "d12")  
irr   <- series(fit, "d13")  
sfac  <- series(fit, "d10")  


irreg_z <- scale(as.numeric(irr))
outliers <- tibble::tibble(Month = retail$Month,
                           original = as.numeric(ret_ts),
                           sa = as.numeric(sa),
                           trend = as.numeric(trend),
                           irregular = as.numeric(irr),
                           z = as.numeric(irreg_z)) |>
  dplyr::filter(abs(z) > 3)

outliers   
## # A tibble: 8 × 6
##      Month original    sa trend irregular     z
##      <mth>    <dbl> <dbl> <dbl>     <dbl> <dbl>
## 1 1982 Jun      9.9  11.3  9.78     1.15   3.75
## 2 1986 Jan     13.4  17.1 14.1      1.21   5.14
## 3 1986 May     18.8  17.3 14.6      1.18   4.54
## 4 1987 Oct     14.1  14.4 16.5      0.874 -3.15
## 5 1993 Jul     20    20.1 23.2      0.865 -3.37
## 6 1999 Jan     24.6  29.2 24.1      1.21   5.14
## 7 2000 Jun     29.3  32.3 26.3      1.23   5.62
## 8 2000 Jul     20.8  21.2 26.4      0.804 -4.89

The output shows the months where the irregular component has a z-score greater than 3, indicating potential outliers in the retail time series data.

Exercise 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

  1. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

b.Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession of 1991/1992 is visible in the estimated components. The trend component shows a noticeable dip during this period before resuming its rise, and the remainder shows a large negative spike around the same time.