library(fpp3)
## Warning: package 'fpp3' was built under R version 4.5.2
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble      3.3.0     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0
## ✔ ggplot2     4.0.0
## Warning: package 'tsibble' was built under R version 4.5.2
## Warning: package 'tsibbledata' was built under R version 4.5.2
## Warning: package 'feasts' was built under R version 4.5.2
## Warning: package 'fabletools' was built under R version 4.5.2
## Warning: package 'fable' was built under R version 4.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(scales)
library(patchwork)
library(seasonal)
## Warning: package 'seasonal' was built under R version 4.5.2
## 
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
## 
##     view

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

glimpse(global_economy)
## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country    <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code       <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year       <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP        <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports    <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports    <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…
# Convert to tibble for dataset-level
global_economy %>%
  as_tibble() %>%
  summarise(
    min_year = min(Year, na.rm = TRUE),
    max_year = max(Year, na.rm = TRUE),
    n_countries = n_distinct(Country)
  )
# filter out missing GDP/Population and guard against Population = 0
gdp_pc <- global_economy %>%
  as_tibble() %>%
  filter(!is.na(GDP), !is.na(Population), Population > 0) %>%
  mutate(gdp_per_capita = GDP / Population)

# check to see for reasonable values and coverage
gdp_pc %>%
  summarise(
    min_year = min(Year),
    max_year = max(Year),
    n_countries = n_distinct(Country),
    n_rows = n(),
    missing_gdp_pc = sum(is.na(gdp_per_capita))
  )
# plot GDP per capita over time for each country
gdp_pc %>%
  ggplot(aes(x = Year, y = gdp_per_capita, group = Country)) +
  geom_line(alpha = 0.12) +
  labs(
    title = "GDP per capita over time (all countries)",
    x = NULL,
    y = "GDP per capita (GDP / Population)"
  ) +
  theme_minimal()

# which country has the highest GDP per capita (overall max)?
overall_top <- gdp_pc %>%
  slice_max(gdp_per_capita, n = 1, with_ties = FALSE) %>%
  select(Country, Year, gdp_per_capita)

overall_top
# how has the highest GDP per capita changed over time?
leaders_by_year <- gdp_pc %>%
  group_by(Year) %>%
  slice_max(gdp_per_capita, n = 1, with_ties = FALSE) %>%
  ungroup()

# which countries appear as the  #1 leader?
leaders_by_year %>%
  count(Country, sort = TRUE)
# years when the leader changes
leader_changes <- leaders_by_year %>%
  arrange(Year) %>%
  mutate(prev_country = lag(Country),
         changed = is.na(prev_country) | Country != prev_country) %>%
  filter(changed) %>%
  select(Year, Country, gdp_per_capita)

leader_changes
# plot with the leader highlighted
gdp_pc %>%
  ggplot(aes(x = Year, y = gdp_per_capita, group = Country)) +
  geom_line(alpha = 0.07) +
  geom_line(
    data = leaders_by_year,
    aes(group = 1),
    linewidth = 1
  ) +
  geom_point(
    data = leaders_by_year,
    size = 1.6
  ) +
  labs(
    title = "Yearly highest GDP per capita (highlighted)",
    x = NULL,
    y = "GDP per capita (GDP / Population)"
  ) +
  theme_minimal()

I calculated GDP per person by dividing GDP by population for each country and year, after removing missing values. The highest value in the data is Monaco in 2014. When I check who ranks first each year, the top country is not always the same. Monaco is the leader most often, but the lead shifts over time, with countries like the United States, Kuwait, the United Arab Emirates, Liechtenstein, and Luxembourg taking the top spot in some years. This shows the highest GDP per person changes across the timeline instead of staying fixed.

3.2

For each of the following series, make a graph of the data. IF transforming seems appropriate, do so and describe the effect. - United State GDP from global_economy.

# filter the global economy dataset to keep only the US series
us_gdp <- global_economy %>%
  filter(Country == "United States")

# plot the original GDP values
us_gdp %>%
  autoplot(GDP) +
  labs(
    title = "United States GDP (raw)",
    y = "GDP"
  )

# create a log-transform 
us_gdp %>%
  mutate(log_GDP = log(GDP)) %>%
  autoplot(log_GDP) +
  labs(
    title = "United States GDP (log transformed)",
    y = "log(GDP)"
  )

Raw GDP usually rises strongly over time, and the ups and downs tend to get bigger as the economy grows. After taking the log, the large values are squeezed down, so the trend looks more like a straight line and year to year changes are easier to read as rough percent changes.

# filter the series + plot raw counts
vic_bulls <- aus_livestock %>%
  filter(State == "Victoria", Animal == "Bulls, bullocks and steers")

vic_bulls %>%
  autoplot(Count) +
  labs(
    title = 'Victoria slaughter: "Bulls, bullocks and steers" (raw)',
    y = "Count"
  )

# Box-Cox + plot transformed
lambda_bulls <- vic_bulls %>%
  features(Count, guerrero) %>%
  pull(lambda_guerrero)

lambda_bulls
## [1] -0.04461887
# apply the Box-Cox transformation and plot the result
vic_bulls %>%
  mutate(Count_bc = box_cox(Count, lambda_bulls)) %>%
  autoplot(Count_bc) +
  labs(
    title = "Victoria slaughter (Box-Cox transformed)",
    y = "Box-Cox(count)"
  )

When the counts are high, the series usually swings more. Using a Box Cox or log transform evens out the ups and downs, so the variation looks more consistent and the seasonal and long term pattern is easier to compare across years.

# plot raw demand
vic_elec %>%
  autoplot(Demand) +
  labs(
    title = "Victorian electricity demand (raw)",
    y = "Demand"
  )

# Box-Cox transform + plot
lambda_demand <- vic_elec %>%
  features(Demand, guerrero) %>%
  pull(lambda_guerrero)

lambda_demand
## [1] 0.09993089
vic_elec %>%
  mutate(Demand_bc = box_cox(Demand, lambda_demand)) %>%
  autoplot(Demand_bc) + 
  labs(
    title = "Victorian electricity demand (Box-Cox transformed)",
    y = "Box-Cox (Demand)"
  )

Electricity demand often has big spikes during high use periods, and those spikes can take over the plot. A Box Cox or log transform tones down the extremes and makes the spread more even over time.

# plot raw gas production
aus_production %>%
  autoplot(Gas) +
  labs(
    title = "Australian gas production (raw)",
    y = "Gas"
  )

# Box-Cox transform + plot
lambda_gas <- aus_production %>%
  features(Gas, guerrero) %>%
  pull(lambda_guerrero)

lambda_gas
## [1] 0.1095171
# apply the Box-Cox transformation and plot the result
aus_production %>%
  mutate(Gas_bc = box_cox(Gas, lambda_gas)) %>%
  autoplot(Gas_bc) +
  labs(
    title = "Australian gas production (Box-Cox transformed)",
    y = "Box-Cox (Gas)"
  )

Production data often grows over time, and the size of the swings can change as the level increases. A Box Cox or log transform helps keep the variation more consistent, so the later years do not dominate the plot and it is easier to compare the trend and seasonality across the whole period.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

# estimate a Box-Cox lambda 
lambda_cg <- canadian_gas %>%
  features(Volume, guerrero) %>%
  pull(lambda_guerrero)

# apply the Box-Cox transformation
cg_bc <- canadian_gas %>%
  mutate(Volume_bc = box_cox(Volume, lambda_cg))

# plot the original series
p_raw <- canadian_gas %>%
  autoplot(Volume) +
  labs(title = "canadian_gas Volume (raw)", y = "Volume")

# plot the transformed series
p_bc <- cg_bc %>%
  autoplot(Volume_bc) +
  labs(title = paste0("Box-Cox transformed Volume, lambda = ", round(lambda_cg, 3)),
       y = "Box-Cox Volume")

p_raw / p_bc

Box Cox is not very helpful for this dataset because the spread of the series is already fairly steady over time. The transformation changes the scale but does not clearly make the ups and downs more consistent, so it does not add much beyond the original plot.

3.4

What Box-Cox transformation would you select for you retail data (from Exercise 7 in Section 2.10)?

# randomly select one retail time series using Series ID
myseries <- aus_retail %>%
  filter(`Series ID` == sample(unique(aus_retail$`Series ID`), 1))

# structure checking
myseries %>% glimpse()
## Rows: 369
## Columns: 5
## Key: State, Industry [1]
## $ State       <chr> "Northern Territory", "Northern Territory", "Northern Terr…
## $ Industry    <chr> "Hardware, building and garden supplies retailing", "Hardw…
## $ `Series ID` <chr> "A3349766V", "A3349766V", "A3349766V", "A3349766V", "A3349…
## $ Month       <mth> 1988 Apr, 1988 May, 1988 Jun, 1988 Jul, 1988 Aug, 1988 Sep…
## $ Turnover    <dbl> 0.6, 0.8, 0.4, 0.4, 0.5, 0.6, 0.6, 0.8, 1.5, 0.8, 0.8, 0.6…
# plot the raw data
myseries %>%
  autoplot(Turnover) +
  labs(
    title = "Selected Australian retail series (raw Turnover)",
    y = "Turnover"
  )

# check if Box-Cox is valid
myseries %>%
  ungroup()%>%
  summarise(
    min_turnover = min(Turnover, na.rm = TRUE),
    zeros = sum(Turnover == 0, na.rm = TRUE), 
    nonpositive = sum(Turnover <= 0, na.rm = TRUE)
  )
# estimate the Box-COx lambda using Guerrero's method
lambda_retail <- myseries %>%
  features(Turnover, guerrero) %>%
  pull(lambda_guerrero)

lambda_retail
## [1] 0.4348939
# apply the Box-Cox transformation and plot the series
myseries_bc <- myseries %>%
  mutate(Turnover_bc = box_cox(Turnover, lambda_retail))

myseries_bc %>%
  autoplot(Turnover_bc) +
  labs(
    title = paste0("Box-Cox transformed Turnover (lambda = ", round(lambda_retail, 3), ")"),
    y = "Box-Cox(Turnover)"
  )

I would choose a Box-Cox transform with λ about -0.014, which is basically the same as a log transform. This helps because the seasonal spikes get larger as turnover increases, and the log-like transform makes the variation more even over time.

3.5

For the following Series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

# select the Tobacco series and keep the time index 
tobacco <- aus_production %>%
  select(Quarter, Tobacco)

# clean the data 
tobacco_clean <- tobacco %>%
  filter(!is.na(Tobacco), Tobacco > 0)

# estimate the Box-Cox lambda with Guerrero's method
lambda_tobacco <- tobacco_clean %>%
  features(Tobacco, guerrero) %>%
  pull(lambda_guerrero)

# apply Box-Cox transformation
tobacco_bc <- tobacco_clean %>%
  mutate(Tobacco_bc = box_cox(Tobacco, lambda_tobacco))   # <-- create Tobacco_bc

# plot raw vs transformed for comparison
p_raw <- tobacco_clean %>%
  autoplot(Tobacco) +
  labs(title = "Tobacco production (raw)", y = "Tobacco")

p_bc <- tobacco_bc %>%
  autoplot(Tobacco_bc) +
  labs(
    title = paste0("Tobacco production (Box-Cox, λ=", round(lambda_tobacco, 3), ")"),
    y = "Box-Cox(Tobacco)"
  )

p_raw / p_bc

# filter to MEL-SYD economy passenger series
melsyd_econ <- ansett %>%
  filter(Airports == "MEL-SYD", Class == "Economy") %>%
  select(Week, Passengers)

# check positivity for Box-Cox requirement
melsyd_econ %>%
  ungroup()%>%
  summarise(
    min_val = min(Passengers, na.rm = TRUE),
    nonpositive = sum(Passengers <= 0, na.rm = TRUE)
  )
# estimate lambda using Guerrero's method
lambda_melsyd <- melsyd_econ %>%
  filter(Passengers > 0) %>%
  features(Passengers, guerrero) %>%
  pull(lambda_guerrero)

lambda_melsyd
## [1] 0.2199838
# apply Box-Cox transformation
melsyd_bc <- melsyd_econ %>%
  filter(Passengers > 0) %>%
  mutate(Passengers_bc = box_cox(Passengers, lambda_melsyd))

# plot raw vs transformed
p_raw <- melsyd_econ %>%
  autoplot(Passengers) +
  labs(
    title = "MEL-SYD Economy Passengers (raw)", y = "Passengers"
  )

p_bc <- melsyd_bc %>%
  autoplot(Passengers_bc) +
  labs(
    title = paste0("MEL-SYD Economy Passengers (Box-Cox, λ=", round(lambda_melsyd, 3), ")"),
    y = "Box-Cox(Passengers)"
  )

p_raw / p_bc

# filter to southern cross station pedestrian series
southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  arrange(Date_Time)

# one-row check for zeros/nonpositive values
southern_cross %>%
  ungroup() %>%
  summarise(
    min_count = min(Count, na.rm = TRUE),
    zeros = sum(Count == 0, na.rm = TRUE),
    nonpositive = sum(Count <= 0, na.rm = TRUE),
    na_rows = sum(is.na(Count))
  )
# Clean NAs for plotting/transforms
southern_cross_clean <- southern_cross %>%
  filter(!is.na(Count))

# choose transforms
if (min(southern_cross_clean$Count) > 0) {
  # Box–Cox path
  lambda_sc <- southern_cross_clean %>%
    features(Count, guerrero) %>%
    pull(lambda_guerrero)
  
  southern_cross_tr <- southern_cross_clean %>%
    mutate(Count_tr = box_cox(Count, lambda_sc))
  
  p_raw <- southern_cross_clean %>%
    autoplot(Count) +
    labs(title = "Southern Cross pedestrian count (raw)", y = "Count")
  
  p_tr <- southern_cross_tr %>%
    autoplot(Count_tr) +
    labs(
      title = paste0("Southern Cross pedestrian count (Box-Cox, λ=", round(lambda_sc, 3), ")"),
      y = "Box-Cox(Count)"
    )
  
} else {
  southern_cross_tr <- southern_cross_clean %>%
    mutate(Count_tr = log1p(Count))
  
  p_raw <- southern_cross_clean %>%
    autoplot(Count) +
    labs(title = "Southern Cross pedestrian count (raw)", y = "Count")
  
  p_tr <- southern_cross_tr %>%
    autoplot(Count_tr) +
    labs(title = "Southern Cross pedestrian count (log1p fallback)", y = "log(1 + Count)")
}

p_raw / p_tr

3.7

Consider the last five years of the Gas data from aus_production. gas <- tail(aus_production, 5*4) |> select(Gas)

# keep only the last 5 years of quarterly gas data
gas <- aus_production %>%
  tail(5*4) %>%
  select(Gas)

gas
  1. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
# plot the last 5 years to visually check for seasonality and trend
gas %>%
  autoplot(Gas) +
  labs(
    title = "Australian gas production (last 5 years)",
    y = "Gas"
  )

  1. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
# fit classical decomposition using a multiplicative model
gas_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

gas_decomp
# plot the decomposed components
components(gas_decomp) %>%
  autoplot() +
  labs(title = "Classical decomposition (multiplicative): Gas (last 5 years)")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

  1. Do the results support the graphical interpretation from part a?
# extract decomposition components for checking seasonal indices
gas_components <- components(gas_decomp)

# summarise seasonal index range in one row
gas_components %>%
  as_tibble() %>%
  summarise(
    seasonal_min = min(seasonal, na.rm = TRUE),
    seasonal_max = max(seasonal, na.rm = TRUE)
  )
  1. Compute and plot the seasonally adjusted data.
# compute seasonally adjusted series by dividing out the seasonal component
gas_sa <- gas_components %>%
  as_tsibble() %>%
  mutate(Gas_SA = Gas / seasonal)

# plot original vs seasonally adjusted
p1 <- gas %>%
  autoplot(Gas) +
  labs(title = "Original Gas (last 5 years)", y = "Gas")

p2 <- gas_sa %>%
  autoplot(Gas_SA) +
  labs(title = "Seasonally adjusted Gas", y = "Gas (SA)")

p1 / p2

  1. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
# create an artificial outlier in the middle of the series and re-run decomposition
n <- nrow(gas %>% as_tibble())
mid_idx <- ceiling(n/2)

gas_out_mid <- gas %>%
  mutate(Gas = if_else(row_number() == mid_idx, Gas + 300, Gas))

# Recompute decomposition & SA
decomp_mid <- gas_out_mid %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

comp_mid <- components(decomp_mid)

gas_sa_mid <- comp_mid %>%
  as_tsibble() %>%
  mutate(Gas_SA = Gas / seasonal)

# compare seasonally adjusted series with and withou the outlier
p_sa_base <- gas_sa %>%
  autoplot(Gas_SA) +
  labs(title = "Seasonally adjusted (no outlier)", y = "Gas (SA)")

p_sa_mid <- gas_sa_mid %>%
  autoplot(Gas_SA) +
  labs(title = "Seasonally adjusted (outlier in middle)", y = "Gas (SA)")

p_sa_base / p_sa_mid

  1. Does it make any difference if the outlier is near the end rather than in the middle of the time series?
# place the outlier at the end and compare impact
end_idx <- n

gas_out_end <- gas %>%
  mutate(Gas = if_else(row_number() == end_idx, Gas + 300, Gas))

decomp_end <- gas_out_end %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

comp_end <- components(decomp_end)

gas_sa_end <- comp_end %>%
  as_tsibble() %>%
  mutate(Gas_SA = Gas / seasonal)

p_sa_end <- gas_sa_end %>%
  autoplot(Gas_SA) +
  labs(title = "Seasonally adjusted (outlier near end)", y = "Gas (SA)")

p_sa_base / p_sa_end

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

# Plot the raw data
myseries  %>%
  autoplot(Turnover) +
  labs(
    title = "Retail turnover (selected series)",
    y = "Turnover"
  )

# choose a transformation before x-11
lambda_retail <- myseries %>%
  features(Turnover, guerrero) %>%
  pull(lambda_guerrero)

lambda_retail
## [1] 0.4348939
# apply the transformation and keep only valid values for Box-Cox
myseries_tr <- myseries %>%
  filter(!is.na(Turnover), Turnover > 0) %>%
  mutate(Turnover_tr = box_cox(Turnover, lambda_retail))

# fit an x-11 decomposition
fit_x11 <- myseries_tr %>%
  model(
    x11 = X_13ARIMA_SEATS(Turnover_tr ~ x11())
  )

fit_x11
# plot component 
components(fit_x11) %>%
  autoplot() +
  labs(title = "X-11 decomposition of retail turnover")

# extract the decomposition output and find the irregular column.
retail_comp <- components(fit_x11)

names(retail_comp)
## [1] "State"         "Industry"      ".model"        "Month"        
## [5] "Turnover_tr"   "trend"         "seasonal"      "irregular"    
## [9] "season_adjust"
# find the largest irregular values to flag potential outliers
comp_tbl <- retail_comp %>% as_tibble()

irr_col <- names(comp_tbl)[stringr::str_detect(names(comp_tbl), "remainder|irregular")][1]

irr_col
## [1] "irregular"
outliers <- retail_comp %>%
  as_tibble() %>%
  mutate(irregular = .data[[irr_col]]) %>%
  arrange(desc(abs(irregular))) %>%
  select(Month, Turnover_tr, irregular) %>%
  slice(1:10)

outliers
# mark the outliers on the time plot
top_dates <- outliers$Month

myseries_tr %>%
  mutate(is_outlier = Month %in% top_dates) %>%
  ggplot(aes(x = Month, y = Turnover_tr)) +
  geom_line() +
  geom_point(data = ~ dplyr::filter(.x, is_outlier), size = 2) +
  labs(
    title = "Retail series with top irregular points highlighted",
    y = "Transformed Turnover"
  ) +
  theme_minimal()

I decomposed my selected aus_retail series using an X-11 decomposition (via X-13ARIMA-SEATS). The trend component captures the long-run movement in retail turnover, while the seasonal component shows a stable monthly pattern. The decomposition highlights potential outliers through spikes in the irregular (remainder) component. Compared with my earlier visual exploration, the X-11 results made months like 1989 Dec, 1998 Oct, and 2005 Jan stand out as unusually high/low relative to the expected trend and seasonality, suggesting temporary shocks (e.g., policy changes, one-off events, or data anomalies) that were not as obvious in the raw plot.

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the Civilian labour force in Australian each month from February 1978 to August 1995.

  1. Write about 3-5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The observed labor force series shows a strong upward movement from the late 1970s to 1995, and this is captured almost entirely by the trend component. The seasonal component is relatively small by comparison. It’s swings are only around ±100, which is tiny relative to an overall level near 7,000-9,000. The remainder is usually close to zero, but there is a very large negative spike around the early 1990s, indicating an unusual shock that is not explained by the trend or seasonality. The seasonal subseries plot confirms a stable within-year pattern. Some months are consistently above average while others are consistently below average.

  1. Is the recession of 1991/1992 visible in the estimated components?

Yes—partly. You can see a noticeable slowing/flattening in the trend around 1991–1992 compared with the faster growth before and after, and the remainder shows an unusually large negative movement in the early 1990s, consistent with an economic shock during the recession period. The seasonal pattern itself stays fairly stable, so the recession signal shows up more in the trend and especially the remainder, not in seasonality.