Objective:

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(scales)

ge_pc <- global_economy |>
  mutate(gdp_pc = GDP / Population)
# Plotting all countries at once here
ge_pc |>
  ggplot(aes(Year, gdp_pc, group = Country)) +
  geom_line(alpha = 0.15) +
  scale_y_continuous(labels = label_dollar()) +
  labs(title = "GDP per capita", y = "$US per person", x = NULL)
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Highest year
leaders <- ge_pc %>%
  as_tibble() %>%
  dplyr::group_by(Year) %>%
  dplyr::slice_max(gdp_pc, n = 1, with_ties = FALSE) %>%
  dplyr::ungroup()
leaders
## # A tibble: 58 × 10
##    Country    Code   Year     GDP Growth   CPI Imports Exports Population gdp_pc
##    <fct>      <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>  <dbl>
##  1 United St… USA    1960 5.43e11  NA     13.6    4.20    4.97  180671000  3007.
##  2 United St… USA    1961 5.63e11   2.30  13.7    4.03    4.90  183691000  3067.
##  3 United St… USA    1962 6.05e11   6.10  13.9    4.13    4.81  186538000  3244.
##  4 United St… USA    1963 6.39e11   4.40  14.0    4.09    4.87  189242000  3375.
##  5 United St… USA    1964 6.86e11   5.80  14.2    4.10    5.10  191889000  3574.
##  6 Kuwait     KWT    1965 2.10e 9  NA     NA     23.1    67.7      473554  4429.
##  7 Kuwait     KWT    1966 2.39e 9  12.3   NA     24.4    65.7      524856  4556.
##  8 United St… USA    1967 8.62e11   2.50  15.3    4.63    5.05  198712000  4336.
##  9 United St… USA    1968 9.42e11   4.80  16.0    4.94    5.08  200706000  4696.
## 10 United St… USA    1969 1.02e12   3.10  16.8    4.95    5.09  202677000  5032.
## # ℹ 48 more rows
latest_leader <- leaders |>
  filter(Year == max(Year))
latest_leader
## # A tibble: 1 × 10
##   Country    Code   Year      GDP Growth   CPI Imports Exports Population gdp_pc
##   <fct>      <fct> <dbl>    <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>  <dbl>
## 1 Luxembourg LUX    2017  6.24e10   2.30  111.    194.    230.     599449 1.04e5
# Here are the highest GDP countries (only) and their changes over time
ge_pc |>
  filter(Country %in% leaders$Country) |>
  ggplot(aes(Year, gdp_pc, colour = Country)) +
  geom_line(linewidth = 0.8) +
  scale_y_continuous(labels = label_dollar()) +
  labs(title = "Countries that topped GDP per capita (by year)",
       y = "$US per person", x = NULL)
## Warning: Removed 42 rows containing missing values or values outside the scale range
## (`geom_line()`).

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. United States GDP from global_economy.

us_gdp <- global_economy |>
  filter(Country == "United States")
autoplot(us_gdp, GDP) +
  labs(title = "United States GDP", y = "$US (current)")

# Log Transformation here because the differences are shown in percentage growth compared to the previous graph.
us_gdp <- us_gdp |> mutate(GDP_log = log(GDP))
autoplot(us_gdp, GDP_log) +
  labs(title = "United States GDP (log)", y = "log($US)")

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

vic_bulls <- aus_livestock |>
  filter(State == "Victoria",
         Animal == "Bulls, bullocks and steers")
autoplot(vic_bulls, Count) +
  labs(title = "Victoria Bulls, bullocks and steers slaughtered",
       y = "Head", x = NULL)

Victorian Electricity Demand from vic_elec.

daily <- vic_elec |>
  index_by(Date = as_date(Time)) |>
  summarise(Demand = sum(Demand))
autoplot(daily, Demand) +
  labs(title = "Victoria electricity demand (Daily)", y = "MWh", x = NULL)

Gas production from aus_production.

gas <- aus_production |> select(Gas)
autoplot(gas, Gas) +
  labs(title = "Aus gas production (quarterly)",
       y = "Petajoules", x = NULL)

# Box-Cox transformation is needed here because the seasonal swings grow as the level rises from before 1970s towards the 2000s, the Box-Cox transformation makes the seasonal polarity more even and leveled compared to the rest of the data.
lambda <- gas |>
  features(Gas, guerrero, period = 4) |>
  dplyr::pull(lambda_guerrero)
lambda
## [1] 0.1095171
gas |>
  mutate(Gas_bc = box_cox(Gas, lambda)) |>
  autoplot(Gas_bc) +
  labs(title = paste0("AusGas production Box–Cox", round(lambda, 2), ")"),
       y = "Transformed", x = NULL)

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Box-Cox transformations would be unhelpful for the canadian_gas data because the point of the transformation is when the data’s variance grows with the mean, but here in the example below, seasonal amplitude is roughly constant over time nor does the variability doesn’t increase with the level.

autoplot(canadian_gas, Volume) +
  labs(title = "Canadian gas production",
       x = "Month", y = "Volume")

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

I would select a Box-Cox transformation with lambda = 0.2151641 found from Guerrero’s method

set.seed(123)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1)) |>
  select(Month, Turnover)

# using Guerrero's method
lambda <- myseries |>
  features(Turnover, guerrero, period = 12) |>
  dplyr::pull(lambda_guerrero)
lambda
## [1] 0.2151641
myseries_bc <- myseries |>
  mutate(Turnover_bc = box_cox(Turnover, lambda))
autoplot(myseries_bc, Turnover_bc) +
  labs(title = paste0("Aus Retail Box–Cox (lambda = ", round(lambda, 2), ")"),
       y = "Transformed", x = NULL)

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

# Tobacco from aus_production
tobacco <- aus_production |> select(Tobacco)
lambda_tob <- tobacco |>
  features(Tobacco, guerrero, period = 4) |>
  dplyr::pull(lambda_guerrero)
lambda_tob # No Transformation as variance is near stable 
## [1] 0.9264636
# Economy class passengers
ans_melsyd <- ansett |>
  filter(Class == "Economy", Airports %in% c("MEL-SYD","SYD-MEL")) |>
  summarise(Passengers = sum(Passengers))   # combine both directions each week
lambda_ans <- ans_melsyd |>
  features(Passengers, guerrero, period = 52) |>
  dplyr::pull(lambda_guerrero)
lambda_ans
## [1] 1.999927
# appropriate Box-Cox transformation
ans_tr <- ans_melsyd |> mutate(Pax_bc = box_cox(Passengers, lambda_ans))

# Pedestrian counts
sx <- pedestrian |> filter(Sensor == "Southern Cross Station")

# choosing period
lambda_sx_10  <- sx |> features(Count, guerrero, period = 10)  |> dplyr::pull(lambda_guerrero)
lambda_sx_150 <- sx |> features(Count, guerrero, period = 150) |> dplyr::pull(lambda_guerrero)
lambda_sx_10; lambda_sx_150
## [1] -0.2501616
## [1] -0.2501616
# appropriate Box-Cox transformation
sx_tr <- sx |> mutate(Count_bc = box_cox(Count, lambda_sx_10))

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(gas, Gas) +
  labs(title = "Australian gas production",
       x = "Quarter", y = "Petajoules")

According to the graph, yes there appears to be strong quarterly seasonality in peaking Q3 and low Q4, and an upward trend

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

dcmp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()
trend_cycle <- dcmp %>% select(Quarter, trend)

seasonal_idx <- dcmp %>%
  mutate(Qtr = lubridate::quarter(Quarter)) %>%
  group_by(Qtr) %>%
  summarise(seasonal_index = mean(seasonal, na.rm = TRUE)) %>%
  arrange(Qtr)
trend_cycle
## # A tsibble: 20 x 2 [1Q]
##    Quarter trend
##      <qtr> <dbl>
##  1 2005 Q3   NA 
##  2 2005 Q4   NA 
##  3 2006 Q1  200.
##  4 2006 Q2  204.
##  5 2006 Q3  207 
##  6 2006 Q4  210.
##  7 2007 Q1  213 
##  8 2007 Q2  216.
##  9 2007 Q3  219.
## 10 2007 Q4  219.
## 11 2008 Q1  219.
## 12 2008 Q2  219 
## 13 2008 Q3  219 
## 14 2008 Q4  220.
## 15 2009 Q1  222.
## 16 2009 Q2  223.
## 17 2009 Q3  225.
## 18 2009 Q4  226 
## 19 2010 Q1   NA 
## 20 2010 Q2   NA
seasonal_idx
## # A tsibble: 20 x 3 [1Q]
## # Key:       Qtr [4]
##      Qtr Quarter seasonal_index
##    <int>   <qtr>          <dbl>
##  1     1 2006 Q1          0.875
##  2     1 2007 Q1          0.875
##  3     1 2008 Q1          0.875
##  4     1 2009 Q1          0.875
##  5     1 2010 Q1          0.875
##  6     2 2006 Q2          1.07 
##  7     2 2007 Q2          1.07 
##  8     2 2008 Q2          1.07 
##  9     2 2009 Q2          1.07 
## 10     2 2010 Q2          1.07 
## 11     3 2005 Q3          1.13 
## 12     3 2006 Q3          1.13 
## 13     3 2007 Q3          1.13 
## 14     3 2008 Q3          1.13 
## 15     3 2009 Q3          1.13 
## 16     4 2005 Q4          0.925
## 17     4 2006 Q4          0.925
## 18     4 2007 Q4          0.925
## 19     4 2008 Q4          0.925
## 20     4 2009 Q4          0.925

c. Do the results support the graphical interpretation from part a?

Yes because the results show an increasing trend cycle and quarterly seasonal indices as Q1 is the lowest, while Q2 through Q3 is the highest, matching the upward drift and strong quarterly peaks seen in the plot.

d. Compute and plot the seasonally adjusted data.

dcmp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()
# Seasonal
sa <- dcmp %>% select(Quarter, season_adjust)
autoplot(sa, season_adjust) +
  labs(title = "Australian gas production (seasonally adjusted)",
       x = "Quarter", y = "Petajoules")

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

dcmp0 <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()
# adding outlier
gas_out <- gas %>% mutate(Gas = if_else(Quarter == yearquarter("2008 Q2"), Gas + 300, Gas))
dcmp1 <- gas_out %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()
# Compare seasonally adjusted series
bind_rows(
  dcmp0 %>% as_tibble() %>% transmute(Quarter, series = "original", SA = season_adjust),
  dcmp1 %>% as_tibble() %>% transmute(Quarter, series = "outlier",  SA = season_adjust)
) %>%
  ggplot(aes(Quarter, SA, colour = series)) +
  geom_line() +
  labs(title = "Gas: seasonally adjusted (original vs outlier)",
       y = "Petajoules", x = NULL)

# Comapare seasonal indices
cmp_idx <- function(d) {
  d %>%
    as_tibble() %>%
    mutate(Qtr = lubridate::quarter(Quarter)) %>%
    group_by(Qtr) %>%
    summarise(seasonal_index = mean(seasonal, na.rm = TRUE), .groups = "drop")
}
left_join(cmp_idx(dcmp0), cmp_idx(dcmp1),
          by = "Qtr", suffix = c("_orig", "_outlier"))
## # A tibble: 4 × 3
##     Qtr seasonal_index_orig seasonal_index_outlier
##   <int>               <dbl>                  <dbl>
## 1     1               0.875                  0.820
## 2     2               1.07                   1.27 
## 3     3               1.13                   1.06 
## 4     4               0.925                  0.859

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Yes it does make a difference that the outlier is near the end rather than in the middle of the time series because the outlier near the end mainly wraps the last few few trends while the outlier in the middle of the time series distorts both sides and the season indices more.

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(123)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
fit_x11 <- myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11()))

cmp <- components(fit_x11)

# Plot the decomposition
autoplot(cmp) +
  labs(title = "X-11 decomposition")

outliers <- cmp |>
  as_tibble() |>
  mutate(z = scale(if ("irregular" %in% names(cmp)) irregular else remainder)[,1]) |>
  filter(!is.na(z), abs(z) > 3) |>
  select(Month, Turnover, z)
outliers
## # A tibble: 7 × 3
##      Month Turnover     z
##      <mth>    <dbl> <dbl>
## 1 1983 Mar     213   4.59
## 2 1984 Mar     228.  3.40
## 3 1986 Mar     225. -3.05
## 4 1988 Jan     264. -3.97
## 5 1993 Nov     425.  3.04
## 6 2000 Jun     592.  8.93
## 7 2008 Dec    1187.  3.23

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Throughout 1978-1995 the trend clearly increases from about 6,500 to 9,000, so most of the movement is long-run growth (note the value panel uses a much bigger scale than the others). The seasonal part is small with peaks around March, September–October, and December and lows in June and July. The remainder is near zero however, it does show one big negative dip between 1991–92.

b. Is the recession of 1991/1992 visible in the estimated components?

Yes because the 1991-1992 recession is shown as a big negative remainder and a plateau in the trend following.