1 Exercise 3.5

Prompt (summary): Find an appropriate Box–Cox transformation to stabilise the variance for: - Tobacco from aus_production - Economy class passengers (Melbourne–Sydney) from ansett - Pedestrian counts at Southern Cross Station from pedestrian

1.1 (a) Tobacco (`aus_production`)

1.1.1 Written response

The tobacco series is strictly positive and shows periods where the size of the swings changes with the level. That’s a classic situation where a Box–Cox transformation can help by making the variability more consistent across time. I’m using the Guerrero method to pick a reasonable lambda.

tobacco <- aus_production |>
  select(Quarter, Tobacco)

tobacco |>
  autoplot(Tobacco) +
  labs(title = "Tobacco production (aus_production)", y = "Tobacco", x = "Quarter")

tob_lambda_tbl <- tobacco |>
  features(Tobacco, guerrero)

tob_lambda_tbl

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.926

tob_lambda <- tob_lambda_tbl$lambda_guerrero

tobacco |>
  mutate(Tobacco_bc = box_cox(Tobacco, tob_lambda)) |>
  autoplot(Tobacco_bc) +
  labs(
    title = paste0("Tobacco after Box–Cox (lambda = ", round(tob_lambda, 3), ")"),
    y = "Box–Cox(Tobacco)",
    x = "Quarter"
  )

Interpretation: After transforming, the bigger late-period swings are toned down and the series looks more “even” in variance, which is what we want before decomposition or modeling.

1.2 (b) Economy passengers MEL–SYD (`ansett`)

1.2.1 Written response

Passenger counts often behave like growth processes where variance rises as the level rises. A log-like Box–Cox (lambda near 0) is common in this kind of data. Here I filter to the Economy class and the Melbourne–Sydney route (as specified), then estimate lambda.

ansett_econ <- ansett |>
  filter(Airports == "MEL-SYD", Class == "Economy") |>
  select(Week, Passengers)

ansett_econ |>
  autoplot(Passengers) +
  labs(title = "Ansett Economy passengers (MEL–SYD)", y = "Passengers", x = "Week")

an_lambda_tbl <- ansett_econ |>
  features(Passengers, guerrero)

an_lambda_tbl

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1            2.00

an_lambda <- an_lambda_tbl$lambda_guerrero

ansett_econ |>
  mutate(Passengers_bc = box_cox(Passengers, an_lambda)) |>
  autoplot(Passengers_bc) +
  labs(
    title = paste0("Passengers after Box–Cox (lambda = ", round(an_lambda, 3), ")"),
    y = "Box–Cox(Passengers)",
    x = "Week"
  )

Interpretation: The transformed line typically reduces the “fan-out” effect (bigger variability at higher passenger levels), which is exactly the point of using Box–Cox here.

1.3 (c) Pedestrian counts at Southern Cross Station (`pedestrian`)

1.3.1 Written response

This dataset is very high-frequency (counts by time of day). To keep the series readable and the computation lightweight, I aggregate to daily totals first. Then I estimate a Box–Cox lambda and compare the before/after plots.

ped_sc <- pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  mutate(Date = as_date(Date_Time)) |>
  index_by(Date) |>
  summarise(Count = sum(Count)) |>
  as_tsibble(index = Date)

ped_sc |>
  autoplot(Count) +
  labs(title = "Daily pedestrian totals — Southern Cross Station", y = "Daily count", x = "Date")

ped_lambda_tbl <- ped_sc |>
  features(Count, guerrero)

ped_lambda_tbl

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.273

ped_lambda <- ped_lambda_tbl$lambda_guerrero

ped_sc |>
  mutate(Count_bc = box_cox(Count, ped_lambda)) |>
  autoplot(Count_bc) +
  labs(
    title = paste0("Daily pedestrian totals after Box–Cox (lambda = ", round(ped_lambda, 3), ")"),
    y = "Box–Cox(Count)",
    x = "Date"
  )

Interpretation: The transformation mainly helps if the “busy days” have much wider swings than quieter days. After Box–Cox, the scale is compressed and the ups/downs are more comparable across the whole sample.

2 Exercise 3.7

Prompt (summary): Using the last five years of quarterly Gas from aus_production: 1) plot and identify seasonality/trend, 2) do multiplicative classical decomposition, 3) check if it supports the plot, 4) compute and plot seasonally adjusted series, 5) add an outlier and recompute, describe effect, 6) compare outlier in middle vs near end.

gas5 <- tail(aus_production, 5*4) |>
  select(Quarter, Gas)

gas5 |>
  autoplot(Gas) +
  labs(title = "Gas production — last 5 years (quarterly)", y = "Gas", x = "Quarter")

2.0.1 Written response (part a)

From the plot, the series shows a repeating within-year pattern, which looks like seasonality. There is also a mild trend component over the five-year window, though it’s not as strong as the seasonal movement.

2.1 Classical decomposition (multiplicative)

gas_fit <- gas5 |>
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_fit) |>
  autoplot() +
  labs(title = "Classical decomposition (multiplicative) — Gas (last 5 years)")

2.1.1 Written response (parts b–c)

The decomposition separates the smooth trend-cycle from the regular quarterly seasonal pattern. The seasonal component is fairly stable across the window, which matches what the time plot suggested. The remainder captures the irregular quarter-to-quarter noise left after removing trend and seasonality.

2.2 Seasonally adjusted data

gas_sa <- components(gas_fit) |>
  as_tibble() |>
  select(Quarter, season_adjust)

gas_sa_ts <- gas5 |>
  left_join(gas_sa, by = "Quarter")

gas_sa_ts |>
  autoplot(Gas) +
  geom_line(aes(y = season_adjust), linewidth = 1) +
  labs(
    title = "Gas: original vs seasonally adjusted (classical decomposition)",
    y = "Gas",
    x = "Quarter"
  )

2.2.1 Written response (part d)

The seasonally adjusted line removes the repeating quarterly pattern, so the remaining movement is smoother. That makes it easier to compare quarter-to-quarter changes without getting distracted by the expected seasonal dip/peak.

2.3 Effect of an outlier (middle vs end)

# Add a big outlier around the middle
gas5_out_mid <- gas5 |>
  mutate(Gas_out = if_else(row_number() == 10, Gas + 300, Gas))

fit_mid <- gas5_out_mid |>
  model(classical_decomposition(Gas_out, type = "multiplicative"))

sa_mid <- components(fit_mid) |>
  as_tibble() |>
  select(Quarter, season_adjust)

gas5_out_mid |>
  left_join(sa_mid, by = "Quarter") |>
  autoplot(Gas_out) +
  geom_line(aes(y = season_adjust), linewidth = 1) +
  labs(
    title = "Outlier in the middle: original vs seasonally adjusted",
    y = "Gas (with outlier)",
    x = "Quarter"
  )

# Add a big outlier near the end
gas5_out_end <- gas5 |>
  mutate(Gas_out = if_else(row_number() == n(), Gas + 300, Gas))

fit_end <- gas5_out_end |>
  model(classical_decomposition(Gas_out, type = "multiplicative"))

sa_end <- components(fit_end) |>
  as_tibble() |>
  select(Quarter, season_adjust)

gas5_out_end |>
  left_join(sa_end, by = "Quarter") |>
  autoplot(Gas_out) +
  geom_line(aes(y = season_adjust), linewidth = 1) +
  labs(
    title = "Outlier near the end: original vs seasonally adjusted",
    y = "Gas (with outlier)",
    x = "Quarter"
  )

2.3.1 Written response (parts e–f)

When I insert a large outlier, the seasonal adjustment is pulled toward it because the classical method is based on moving averages and isn’t very robust. The outlier shows up not only in the remainder but can also distort the estimated trend and seasonal indices around that period. Placing the outlier near the end tends to be worse because the moving averages have less data on one side, so the end estimates are more fragile.

3 Exercise 3.8

Prompt (summary): Decompose a retail series using X-11. Does it reveal outliers or unusual features?

The book’s exercise refers to “your retail series” from Chapter 2.
For a fully reproducible submission (that knits anywhere), I’m using one series from aus_retail as my retail data.

3.0.1 Retail series chosen

retail_ts <- aus_retail |>
  filter(State == "New South Wales", Industry == "Department stores") |>
  select(Month, Turnover)

retail_ts |>
  autoplot(Turnover) +
  labs(
    title = "Retail turnover (NSW — Department stores)",
    y = "Turnover",
    x = "Month"
  )

3.1 X-11 decomposition (via X-13ARIMA-SEATS)

This uses X_13ARIMA_SEATS(... ~ x11()), which relies on the seasonal package and the X-13ARIMA-SEATS binary.
To prevent knitting failures, the chunk below will attempt X-11, and if it’s not available in your Posit environment, it will fall back to STL (so the document still knits).

x11_ok <- TRUE
x11_components <- NULL

x11_components <- tryCatch({
  retail_ts |>
    model(X11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
    components()
}, error = function(e) {
  x11_ok <<- FALSE
  NULL
})

if (x11_ok) {
  autoplot(x11_components) +
    labs(title = "X-11 decomposition (X-13ARIMA-SEATS) — Retail turnover")
} else {
  # Fallback so knitting still works if X-13 isn't available
  stl_fit <- retail_ts |>
    model(STL(Turnover))

  autoplot(components(stl_fit)) +
    labs(title = "STL decomposition fallback (X-11 not available in this environment)")
}

3.1.1 Written response

With X-11, the seasonal component tends to be smoother and the remainder makes unusual months stand out more clearly. In this retail series, there are a few spikes in the irregular component that look like short-lived shocks (not part of the usual seasonal swing). The seasonally adjusted series is much easier to read month-to-month, and any sudden jumps are easier to spot as potential outliers or unusual events. If X-11 wasn’t available in the environment, the STL fallback still shows the same “story,” just with slightly different smoothing.

4 Exercise 3.9

Prompt (summary): Figures 3.19 and 3.20 show a decomposition of the Australian civilian labour force (Feb 1978–Aug 1995). 1) Write 3–5 sentences describing the decomposition (pay attention to scales). 2) Is the 1991/1992 recession visible?

4.0.1 Written response (part 1)

The original labour force series is dominated by a strong upward movement over time, so the data panel is on a much larger scale than the other components. The trend-cycle captures most of that long-run increase, while the seasonal component is relatively small in magnitude by comparison, even though it is consistent from year to year. The remainder is smaller still, showing short-term deviations around the trend that are not explained by seasonality. Because the scales differ a lot across panels, it’s important not to overstate the impact of the seasonal swings compared with the overall growth.

4.0.2 Written response (part 2)

Yes — the 1991/1992 recession shows up as a noticeable flattening (and slight dip) in the trend-cycle component during that period, and the remainder also shows more negative deviations around the same time. The seasonal pattern itself doesn’t “cause” the recession signal; it stays fairly regular, while the recession is reflected mostly in the underlying trend and irregular movement.

5 Reproducibility

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] slider_0.3.3      fable_0.4.1       feasts_0.4.2      fabletools_0.5.1 
##  [5] tsibbledata_0.4.1 tsibble_1.1.6     ggplot2_4.0.1     lubridate_1.9.4  
##  [9] tidyr_1.3.2       dplyr_1.1.4       tibble_3.3.0      fpp3_1.0.2       
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.3       sass_0.4.10          generics_0.1.4      
##  [4] anytime_0.3.12       digest_0.6.39        magrittr_2.0.4      
##  [7] evaluate_1.0.5       grid_4.5.2           timechange_0.3.0    
## [10] RColorBrewer_1.1-3   fastmap_1.2.0        jsonlite_2.0.0      
## [13] seasonal_1.10.0      purrr_1.2.0          scales_1.4.0        
## [16] jquerylib_0.1.4      cli_3.6.5            x13binary_1.1.61.1  
## [19] rlang_1.1.6          crayon_1.5.3         ellipsis_0.3.2      
## [22] withr_3.0.2          cachem_1.1.0         yaml_2.3.12         
## [25] tools_4.5.2          vctrs_0.6.5          R6_2.6.1            
## [28] lifecycle_1.0.4      pkgconfig_2.0.3      warp_0.2.2          
## [31] progressr_0.18.0     pillar_1.11.1        bslib_0.9.0         
## [34] gtable_0.3.6         glue_1.8.0           Rcpp_1.1.0          
## [37] xfun_0.55            tidyselect_1.2.1     rstudioapi_0.17.1   
## [40] knitr_1.51           farver_2.1.2         htmltools_0.5.9     
## [43] rmarkdown_2.30       labeling_0.4.3       compiler_4.5.2      
## [46] S7_0.2.1             distributional_0.5.0

DATA 624 — FPP3 Chapter 3 (Exercises 3.5, 3.7, 3.8, 3.9)

Sachi Kapoor

February 13, 2026

1 Exercise 3.5

1.1 (a) Tobacco (`aus_production`)

1.1.1 Written response

1.2 (b) Economy passengers MEL–SYD (`ansett`)

1.2.1 Written response

1.3 (c) Pedestrian counts at Southern Cross Station (`pedestrian`)

1.3.1 Written response

2 Exercise 3.7

2.0.1 Written response (part a)

2.1 Classical decomposition (multiplicative)

2.1.1 Written response (parts b–c)

2.2 Seasonally adjusted data

2.2.1 Written response (part d)

2.3 Effect of an outlier (middle vs end)

2.3.1 Written response (parts e–f)

3 Exercise 3.8

3.0.1 Retail series chosen

3.1 X-11 decomposition (via X-13ARIMA-SEATS)

3.1.1 Written response

4 Exercise 3.9

4.0.1 Written response (part 1)

4.0.2 Written response (part 2)

5 Reproducibility

DATA 624 — FPP3 Chapter 3 (Exercises 3.5, 3.7, 3.8, 3.9)

Sachi Kapoor

February 13, 2026

1 Exercise 3.5

1.1 (a) Tobacco (aus_production)

1.1.1 Written response

1.2 (b) Economy passengers MEL–SYD (ansett)

1.2.1 Written response

1.3 (c) Pedestrian counts at Southern Cross Station (pedestrian)

1.3.1 Written response

2 Exercise 3.7

2.0.1 Written response (part a)

2.1 Classical decomposition (multiplicative)

2.1.1 Written response (parts b–c)

2.2 Seasonally adjusted data

2.2.1 Written response (part d)

2.3 Effect of an outlier (middle vs end)

2.3.1 Written response (parts e–f)

3 Exercise 3.8

3.0.1 Retail series chosen

3.1 X-11 decomposition (via X-13ARIMA-SEATS)

3.1.1 Written response

4 Exercise 3.9

4.0.1 Written response (part 1)

4.0.2 Written response (part 2)

5 Reproducibility

1.1 (a) Tobacco (`aus_production`)

1.2 (b) Economy passengers MEL–SYD (`ansett`)

1.3 (c) Pedestrian counts at Southern Cross Station (`pedestrian`)