Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
First, we load the necessary libraries and validate the columns for the dataset.
#if (!require("fpp3")) install.packages("fpp3")
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
global_economy|> glimpse()
## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…
We can see that the dataset contains the columns: Country, Year, Population, GDP, and GDP per capita.
In global_economy, the column GDP values are stored in millions of US dollars. The first record that we got is 537777811. This means that the GDP for Afghanistan in 1960 was 537,777,811 million US dollars.
GDP is stored in billions of US dollars, so we need to multiply the GDP column by 1,000,000 to get the actual GDP value in US dollars. Same as the Population column, we need to multiply it by 1,000 to get the actual population value.
gdp_pc <- global_economy |>
mutate(
GDP = GDP * 1e9,
Population = Population * 1e3,
GDP_per_capita = GDP / Population
)
gdp_pc |>
ggplot(aes(x = Year, y = GDP_per_capita, group = Country)) +
geom_line(alpha = 0.2) +
scale_y_continuous(labels = scales::dollar_format()) +
labs(title = "GDP per capita for all countries",
y = "GDP per capita (US$)", x = NULL)
latest_year <- max(gdp_pc$Year, na.rm = TRUE)
gdp_pc |>
arrange(desc(GDP_per_capita))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## # A tsibble: 15,150 x 10 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7.06e18 7.18 NA NA NA 38132000
## 2 Monaco MCO 2008 6.48e18 0.732 NA NA NA 35853000
## 3 Liechtenstein LIE 2014 6.66e18 NA NA NA NA 37127000
## 4 Liechtenstein LIE 2013 6.39e18 NA NA NA NA 36834000
## 5 Monaco MCO 2013 6.55e18 9.57 NA NA NA 37971000
## 6 Monaco MCO 2016 6.47e18 3.21 NA NA NA 38499000
## 7 Liechtenstein LIE 2015 6.27e18 NA NA NA NA 37403000
## 8 Monaco MCO 2007 5.87e18 14.4 NA NA NA 35111000
## 9 Liechtenstein LIE 2016 6.21e18 NA NA NA NA 37666000
## 10 Monaco MCO 2015 6.26e18 4.94 NA NA NA 38307000
## # ℹ 15,140 more rows
## # ℹ 1 more variable: GDP_per_capita <dbl>
leaders_all <- gdp_pc |>
index_by(Year) |>
slice_max(order_by = GDP_per_capita, n = 1)
From the output, we can see that Monaco has the highest GDP per capita on 2014.
First, I will keep only the yearly winners.
leaders_all <- gdp_pc |>
index_by(Year) |>
slice_max(order_by = GDP_per_capita, n = 1)
leaders_all |>
ggplot(aes(Year, GDP_per_capita, colour = Country)) +
geom_line(linewidth = 1.2) +
geom_point() +
scale_y_continuous(labels = scales::dollar_format()) +
labs(title = "Highest GDP per capita each year (all countries)", y = "US$")
In the graph above, we can see that Monaco has been the country with the highest GDP per capita for most of the years. However, there are some years where other countries like Liechtenstein and Luxembourg have taken the lead.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
library(fpp3)
us_gdp <- global_economy |>
filter(Country == "United States") |>
select(Year, GDP)
us_gdp |>
autoplot(GDP) +
labs(title = "US GDP Over Time",
y = "Gross Domestic Product",
x = "Year")
In this case, we should transform the GDP values to a logarithmic scale. This is because GDP values can vary widely, and a logarithmic transformation helps to compress the scale, making it easier to visualize changes over time, especially for countries with large economies.
lambda <- us_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
print(lambda)
## [1] 0.2819443
The optimal Box-Cox transformation parameter (lambda) for the US GDP data is approximately 0.28. This indicates that a power transformation with an exponent of 0.28 would be suitable for stabilizing the variance in the GDP data.
if (!require("latex2exp")) install.packages("latex2exp")
## Loading required package: latex2exp
library(latex2exp)
us_gdp |>
autoplot(box_cox(GDP, lambda)) +
labs(
title = TeX(paste0("US GDP: Box Cox transform ( $\\lambda$ = ", round(lambda, 2), " )")),
y = "Transformed units", x = NULL
)
In the plot above, we can see the US GDP data after applying the Box Cox transformation with the estimated lambda value. The transformation helps to stabilize the variance and makes the trend more linear.
First, we load the dataset and filter for the specific animal and state.
library(fpp3)
library(dplyr)
livestock_bulls <- tsibbledata::aus_livestock |>
filter(Animal == "Bulls, bullocks and steers", State == "Victoria")
print(livestock_bulls)
## # A tsibble: 510 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Victoria 109200
## 2 1976 Aug Bulls, bullocks and steers Victoria 94700
## 3 1976 Sep Bulls, bullocks and steers Victoria 95500
## 4 1976 Oct Bulls, bullocks and steers Victoria 94800
## 5 1976 Nov Bulls, bullocks and steers Victoria 94100
## 6 1976 Dec Bulls, bullocks and steers Victoria 98300
## 7 1977 Jan Bulls, bullocks and steers Victoria 93500
## 8 1977 Feb Bulls, bullocks and steers Victoria 102000
## 9 1977 Mar Bulls, bullocks and steers Victoria 102600
## 10 1977 Apr Bulls, bullocks and steers Victoria 91500
## # ℹ 500 more rows
Now, we can plot the data.
autoplot(livestock_bulls, Count) +
labs(title = "Victoria: Bulls, bullocks & steers slaughter ",
y = "Number of animals", x = NULL)
Now we can estimate the optimal Box-Cox transformation parameter for the livestock data.
lambda_hat <- livestock_bulls |>
features(Count, guerrero) |>
pull(lambda_guerrero)
print(lambda_hat)
## [1] -0.04461887
This indicates that a power transformation with an exponent of -0.0446 would be suitable for stabilizing the variance in the livestock data.
Now, We will pplot the transforrmed series
livestock_bulls |>
autoplot(box_cox(Count, lambda_hat)) +
labs(
title = TeX(paste0("Victoria slaughter (Box-Cox, $\\lambda$ = ",
round(lambda_hat, 3), ")")),
y = "Transformed units", x = NULL
)
This plot shows strong seasonality with a long-run downward trend.
The first step is to load the dataset.
library(fpp3)
vic_elec <- tsibbledata::vic_elec
print(vic_elec)
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
## 7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
## 8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
## 9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
## 10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
## # ℹ 52,598 more rows
Now, we can plot the data.
autoplot(vic_elec, Demand) +
labs(title = "Victoria: Electricity demand ",
y = "Demand (MW)", x = NULL)
Choose the optimal Box-Cox transformation parameter for the electricity demand data.
lambda_vic <- vic_elec |>
features(Demand, guerrero) |>
pull(lambda_guerrero)
print(lambda_vic)
## [1] 0.09993089
The optimal Box-Cox transformation parameter (lambda) for the Victorian electricity demand data is approximately 0.11. This indicates that a power transformation with an exponent of 0.11 would be suitable for stabilizing the variance in the electricity demand data.
Now, we can plot the transformed series.
vic_elec |>
autoplot(box_cox(Demand, lambda_vic)) +
labs(
title = TeX(paste0("Victoria electricity demand: Box--Cox ($\\lambda = ",
round(lambda_vic, 2), "$)")),
y = "Transformed units", x = NULL
)
The plot shows strong seasonality .
First, we load the dataset.
library(fpp3)
aus_production <- tsibbledata::aus_production
print(aus_production)
## # A tsibble: 218 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
## 7 1957 Q3 236 5317 227 603 5259 7
## 8 1957 Q4 320 6152 222 582 4735 6
## 9 1958 Q1 272 5758 199 554 4608 5
## 10 1958 Q2 233 5641 229 620 5196 7
## # ℹ 208 more rows
We can plot the gas production data.
aus_production |>
autoplot(Gas) +
labs(title = "Australia: Gas production ",
y = "Gas (millions of tonnes)", x = NULL)
Now, we can estimate the optimal Box-Cox transformation parameter for the gas production data.
lambda_gas <- aus_production |>
features(Gas, guerrero) |>
pull(lambda_guerrero)
print(lambda_gas)
## [1] 0.1095171
The optimal Box-Cox transformation parameter (lambda) for the gas production data is approximately 0.11 This indicates that a power transformation with an exponent of 0.11 would be suitable for stabilizing the variance in the gas production data.
Now, we can plot the transformed series.
aus_production |>
autoplot(box_cox(Gas, lambda_gas)) +
labs(
title = TeX(paste0("Australia gas production: Box-Cox ($\\lambda = ",
round(lambda_gas, 2), "$)")),
y = "Transformed units", x = NULL
)
Clear quarterly seasonality with a strong long run rise.s
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
Let’s first load the necessary libraries and the dataset to understand the data.
library(fpp3)
glimpse(canadian_gas)
## Rows: 542
## Columns: 2
## $ Month <mth> 1960 Jan, 1960 Feb, 1960 Mar, 1960 Apr, 1960 May, 1960 Jun, 196…
## $ Volume <dbl> 1.4306, 1.3059, 1.4022, 1.1699, 1.1161, 1.0113, 0.9660, 0.9773,…
The canadian_gas dataset contains monthly gas production data in billions of cubic metres.
autoplot(canadian_gas, Volume) +
labs(title = "Canadian Gas Production (Monthly)",
y = "Billions of cubic metres")
The plot shows a clear upward trend and seasonal patterns in the gas production data. Now, let’s estimate the Long transformation parameter for the canadian_gas data.
canadian_gas <- canadian_gas |>
mutate(log_Volume = log(Volume))
autoplot(canadian_gas, log_Volume) +
labs(title = "Canadian Gas Production (Log-transformed)",
y = "log(Billions of cubic metres)")
The log transformation helps to stabilize the variance and makes the trend more linear.
This part we will estimate the Box-Cox transformation parameter via Guerrero’s method.
features(canadian_gas, Volume, features = guerrero)
## # A tibble: 1 × 1
## lambda_guerrero
## <dbl>
## 1 0.577
Although Guerrero suggests λ ≈ 0.58, applying a Box–Cox transform does not materially improve the series. The variance was already fairly stable, and the real modeling challenge is the strong trend and multiplicative seasonality, not variance instability.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
To determine the appropriate Box-Cox transformation for retail data, we first need to load the necessary libraries and the dataset.
library(fpp3)
retail <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
glimpse(retail)
## Rows: 441
## Columns: 5
## Key: State, Industry [1]
## $ State <chr> "New South Wales", "New South Wales", "New South Wales", "…
## $ Industry <chr> "Liquor retailing", "Liquor retailing", "Liquor retailing"…
## $ `Series ID` <chr> "A3349627V", "A3349627V", "A3349627V", "A3349627V", "A3349…
## $ Month <mth> 1982 Apr, 1982 May, 1982 Jun, 1982 Jul, 1982 Aug, 1982 Sep…
## $ Turnover <dbl> 41.7, 43.1, 40.3, 40.9, 42.1, 42.0, 46.1, 46.5, 53.8, 43.8…
Now I will use Guerrero’s method to estimate the optimal Box-Cox transformation parameter for the retail data.
lambda <- retail |>
features(Turnover, guerrero) |>
pull(lambda_guerrero)
print(lambda)
## [1] -0.0344145
The result is approximately 0.1. This indicates that a power transformation with an exponent of 0.1 would be suitable for stabilizing the variance in the retail data.
Now, we can plot the transformed series.
retail |>
autoplot(box_cox(Turnover, lambda)) +
labs(title = sprintf("Retail turnover: Box–Cox (λ = %.2f)", lambda),
y = "Transformed units", x = NULL)
The plot shows the retail turnover data after applying the Box-Cox transformation with the estimated lambda value. The transformation helps to stabilize the variance and makes the trend more linear.
In summary, I would select a Box-Cox transformation with λ ≈ 0.1 for the retail data to stabilize the variance.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
Before start working on this exercise, we need to the rule to selet convenient transform. If λ = 1, no transformation is used. If λ = 0, a log transformation is used. If λ = 0.5, a square root transformation is used.
Now we will create a function to help us pick the right transformation.
selectlambda <- function(x) {
lambda <- features(x, y, features = guerrero) |>
pull(lambda_guerrero)
lambdaused <- dplyr::case_when(
abs(lambda - 0) < 0.15 ~ 0,
abs(lambda - 0.5) < 0.15 ~ 0.5,
abs(lambda - 1) < 0.15 ~ 1,
TRUE ~ lambda
)
list(lambda_hat = lambda, lambda_used = lambdaused)
}
tobacco <- aus_production |>
select(Quarter, Tobacco) |>
filter(!is.na(Tobacco)) |>
rename(y = Tobacco)
lam_tob <- selectlambda(tobacco)
lam_tob
## $lambda_hat
## [1] 0.9264636
##
## $lambda_used
## [1] 1
The optimal Box-Cox transformation parameter (lambda) for the tobacco production data is approximately 0.11. This indicates variance is alreaady stable, so no transformation is needed.
mel_syd_econ_wk <- ansett |>
filter(Airports == "MEL-SYD", Class == "Economy") |>
select(Week, Passengers)|>
rename(y = Passengers)
#mel_syd_econ_mo <- mel_syd_econ_wk |>
# index_by(Month = ~ yearmonth(.)) |>
# summarise(Passengers = sum(Passengers, na.rm = TRUE))|>
# rename(y = Passengers)
#glimpse(mel_syd_econ_mo)
lam_mel_syd <- selectlambda(mel_syd_econ_wk)
lam_mel_syd
## $lambda_hat
## [1] 1.999927
##
## $lambda_used
## [1] 1.999927
A square root transformation is appropriate for the Melbourne-Sydney economy class passenger data.
library(dplyr)
library(tsibble)
library(fpp3)
pedestrian_sc <- pedestrian |>
filter(Sensor == "Southern Cross Station") |>
select(Sensor, Date_Time, Count)|>
rename(y = Count)
lam_ped_sc <- selectlambda(pedestrian_sc)
lam_ped_sc
## $lambda_hat
## [1] -0.2501616
##
## $lambda_used
## [1] -0.2501616
A square root transformation is appropriate for the pedestrian counts at Southern Cross Station.
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
a.Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
library(dplyr)
library(tsibble)
library(fpp3)
gas <- tail(aus_production, 5*4) |> select(Quarter, Gas)
autoplot(gas, Gas) + labs(title="Gas Production: last 5 years", y="Units")
In the plot, we can observe clear seasonal fluctuations in the gas production data, with peaks and troughs occurring at regular intervals.
dc <- gas |> model(classical_decomposition(Gas, type = "multiplicative"))
autoplot(components(dc))
comp <- components(dc)
trend_cycle <- comp |> select(Quarter, trend)
seasonal_idx <- comp |>
mutate(Q = quarter(Quarter)) |>
group_by(Q) |>
summarise(seasonal_index = mean(seasonal))
trend_cycle
## # A tsibble: 20 x 2 [1Q]
## Quarter trend
## <qtr> <dbl>
## 1 2005 Q3 NA
## 2 2005 Q4 NA
## 3 2006 Q1 200.
## 4 2006 Q2 204.
## 5 2006 Q3 207
## 6 2006 Q4 210.
## 7 2007 Q1 213
## 8 2007 Q2 216.
## 9 2007 Q3 219.
## 10 2007 Q4 219.
## 11 2008 Q1 219.
## 12 2008 Q2 219
## 13 2008 Q3 219
## 14 2008 Q4 220.
## 15 2009 Q1 222.
## 16 2009 Q2 223.
## 17 2009 Q3 225.
## 18 2009 Q4 226
## 19 2010 Q1 NA
## 20 2010 Q2 NA
seasonal_idx
## # A tsibble: 20 x 3 [1Q]
## # Key: Q [4]
## Q Quarter seasonal_index
## <int> <qtr> <dbl>
## 1 1 2006 Q1 0.875
## 2 1 2007 Q1 0.875
## 3 1 2008 Q1 0.875
## 4 1 2009 Q1 0.875
## 5 1 2010 Q1 0.875
## 6 2 2006 Q2 1.07
## 7 2 2007 Q2 1.07
## 8 2 2008 Q2 1.07
## 9 2 2009 Q2 1.07
## 10 2 2010 Q2 1.07
## 11 3 2005 Q3 1.13
## 12 3 2006 Q3 1.13
## 13 3 2007 Q3 1.13
## 14 3 2008 Q3 1.13
## 15 3 2009 Q3 1.13
## 16 4 2005 Q4 0.925
## 17 4 2006 Q4 0.925
## 18 4 2007 Q4 0.925
## 19 4 2008 Q4 0.925
## 20 4 2009 Q4 0.925
Yes, the multiplicative decomposition shows a stable quarterly seasonal pattern and a flat to slightly rising trend cycle, matching the visual impression from part a.
sa <- components(dc) |> select(Quarter, SA = season_adjust)
autoplot(sa, SA) + labs(title="Gas: seasonally adjusted", y="SA")
The seasonally adjusted data shows the underlying trend in gas production without the seasonal fluctuations.
gas_ol <- gas |> mutate(Gas = if_else(row_number()==10, Gas+300, Gas))
dc_ol <- gas_ol |> model(classical_decomposition(Gas, type="multiplicative"))
sa_ol <- components(dc_ol) |> select(Quarter, SA = season_adjust)
autoplot(sa_ol, SA) + labs(title="Gas Production: seasonally adjusted with outlier", y="SA")
In this case, the outlier has a significant impact on the seasonally adjusted data, causing a noticeable spike in the adjusted values around the time of the outlier.
library(dplyr)
n <- nrow(gas)
gas_end <- gas |> mutate(Gas = if_else(row_number() == n, Gas + 300, Gas))
dc_end <- gas_end |> model(classical_decomposition(Gas, type = "multiplicative"))
sa_end <- components(dc_end) |> select(Quarter, SA = season_adjust) |> mutate(case = "end")
combined <- dplyr::bind_rows(
as_tibble(sa), # baseline
as_tibble(sa_ol), # mid outlier
as_tibble(sa_end) # end outlier
)
ggplot(combined, aes(Quarter, SA, colour = case)) +
geom_line() +
labs(title = "Seasonally adjusted (baseline vs outliers)", y = "SA", x = NULL)
Yes. An outlier near the end has a bigger, more persistent impact on the seasonally adjusted series than one in the middle.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
retail <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
glimpse(retail)
## Rows: 441
## Columns: 5
## Key: State, Industry [1]
## $ State <chr> "South Australia", "South Australia", "South Australia", "…
## $ Industry <chr> "Other retailing", "Other retailing", "Other retailing", "…
## $ `Series ID` <chr> "A3349433W", "A3349433W", "A3349433W", "A3349433W", "A3349…
## $ Month <mth> 1982 Apr, 1982 May, 1982 Jun, 1982 Jul, 1982 Aug, 1982 Sep…
## $ Turnover <dbl> 34.2, 34.4, 32.7, 36.2, 36.1, 36.0, 38.4, 41.2, 57.6, 36.6…
Now, we can decompose the retail time series using the X-11 method and extract the componets. Note: to run X-11, we need to convert the data to a plain monthly ts object first and call seas() with x11 = ““.
library(fpp3)
library(seasonal)
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
##
## view
retail <- aus_retail |>
filter(`Series ID` == sample(unique(`Series ID`), 1)) |>
arrange(Month) |>
select(Month, Turnover)
ret_ts <- ts(retail$Turnover,
start = c(lubridate::year(min(retail$Month)),
lubridate::month(min(retail$Month))),
frequency = 12)
fit <- seas(ret_ts, x11 = "")
sa <- series(fit, "d11")
trend <- series(fit, "d12")
irr <- series(fit, "d13")
sfac <- series(fit, "d10")
irreg_z <- scale(as.numeric(irr))
outliers <- tibble::tibble(Month = retail$Month,
original = as.numeric(ret_ts),
sa = as.numeric(sa),
trend = as.numeric(trend),
irregular = as.numeric(irr),
z = as.numeric(irreg_z)) |>
dplyr::filter(abs(z) > 3)
outliers
## # A tibble: 8 × 6
## Month original sa trend irregular z
## <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1982 Jun 9.9 11.3 9.78 1.15 3.75
## 2 1986 Jan 13.4 17.1 14.1 1.21 5.14
## 3 1986 May 18.8 17.3 14.6 1.18 4.54
## 4 1987 Oct 14.1 14.4 16.5 0.874 -3.15
## 5 1993 Jul 20 20.1 23.2 0.865 -3.37
## 6 1999 Jan 24.6 29.2 24.1 1.21 5.14
## 7 2000 Jun 29.3 32.3 26.3 1.23 5.62
## 8 2000 Jul 20.8 21.2 26.4 0.804 -4.89
The output shows the months where the irregular component has a z-score greater than 3, indicating potential outliers in the retail time series data.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
The trend rises steadily from around 6.0k to 9k, so the long run movement dominates the series.
Paying attention to the different y-axis scales makes it clear that trend is the main driver, seasonality is consistent but comparatively small
The remainder is usually modest , but there’s a large negative outlier in the early 1990s , this can be also noise.
b.Is the recession of 1991/1992 visible in the estimated components?
Yes, the recession of 1991/1992 is visible in the estimated components. The trend component shows a noticeable dip during this period before resuming its rise, and the remainder shows a large negative spike around the same time.