DATA 624 Homework 2

Forecasting: Principles and Practice - Chapter 3 Exercise: 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy |>
  mutate(GDP_per_capita = GDP / Population) |>
  arrange(desc(GDP_per_capita)) |>
  select(Country, Year, GDP_per_capita) |>
  head(10)

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

## # A tsibble: 10 x 3 [1Y]
## # Key:       Country [2]
##    Country        Year GDP_per_capita
##    <fct>         <dbl>          <dbl>
##  1 Monaco         2014        185153.
##  2 Monaco         2008        180640.
##  3 Liechtenstein  2014        179308.
##  4 Liechtenstein  2013        173528.
##  5 Monaco         2013        172589.
##  6 Monaco         2016        168011.
##  7 Liechtenstein  2015        167591.
##  8 Monaco         2007        167125.
##  9 Liechtenstein  2016        164993.
## 10 Monaco         2015        163369.

global_economy |>
  filter(Country == "Monaco") |>
  mutate(GDP_per_capita = GDP / Population) |>
  autoplot(GDP_per_capita) +
  labs(
    title = "Monaco GDP per Capita Over Time",
    y = "GDP per capita"
  )

## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).

Monaco has the highest GDP per capita values in the dataset, with the highest GDP per capita occurring in 2014 at approximately 185,153. Monaco’s GDP per capita remains extremely high throughout the series and generally increases over time. There are some fluctuations from year to year, but the overall trend shows strong economic growth.

2.For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy |>
  filter(Country == "United States") |>
  autoplot(GDP) +
  labs(title = "United States GDP")

global_economy |>
  filter(Country == "United States") |>
  autoplot(log(GDP)) +
  labs(
    title = "Log Transformed United States GDP",
    y = "log(GDP)"
  )

The original GDP series shows strong exponential growth over time with increasing variation. Applying a log transformation makes the growth pattern more linear.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock |>
  filter(
    State == "Victoria",
    Animal == "Bulls, bullocks and steers"
  ) |>
  autoplot(Count) +
  labs(
    title = "Victorian Bulls, Bullocks and Steers Slaughter",
    y = "Count"
  )

The series displays seasonal fluctuations and some long-term variation. A transformation is not strongly necessary because the variance remains relatively stable across time.

Victorian Electricity Demand from vic_elec.

vic_elec |>
  autoplot(Demand) +
  labs(
    title = "Victorian Electricity Demand",
    y = "Demand"
  )

vic_elec |>
  autoplot(log(Demand)) +
  labs(
    title = "Log Transformed Victorian Electricity Demand",
    y = "log(Demand)"
  )

The electricity demand series contains strong daily and seasonal patterns. The log transformation slightly stabilizes the variation, although the original series is already fairly stable.

Gas production from aus_production.

aus_production |>
  autoplot(Gas) +
  labs(
    title = "Australian Gas Production",
    y = "Gas"
  )

aus_production |>
  autoplot(log(Gas)) +
  labs(
    title = "Log Transformed Gas Production",
    y = "log(Gas)"
  )

The gas production series shows increasing seasonal variation as production rises over time. The log transformation stabilizes the variance and reduces the size of seasonal swings.

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

A Box-Cox transformation is unhelpful for the canadian_gas data because the main issue in the series is changing seasonal behavior rather than changing variance. Box-Cox transformations are primarily used to stabilize variance.

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

The Box-Cox transformation selected for the retail series was based on the Guerrero method. It stabilizes the variance across seasonal periods. This is useful for retail data because variability often increases as turnover increases over time, and the transformation makes the series easier to analyze and forecast.

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

# Tobacco
aus_production |>
  features(Tobacco, guerrero)

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.926

# Economy passengers
ansett |>
  filter(Airports == "MEL-SYD",
         Class == "Economy") |>
  features(Passengers, guerrero)

## # A tibble: 1 × 3
##   Airports Class   lambda_guerrero
##   <chr>    <chr>             <dbl>
## 1 MEL-SYD  Economy            2.00

# Southern Cross pedestrians
pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  features(Count, guerrero)

## # A tibble: 1 × 2
##   Sensor                 lambda_guerrero
##   <chr>                            <dbl>
## 1 Southern Cross Station          -0.250

I would use the Guerrero method to select the Box-Cox transformation for each series because it finds a lambda value that stabilizes variance across seasonal periods. If the lambda value is close to 0, I would use a log transformation. If it is close to 1, little or no transformation is needed.

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas <- tail(aus_production, 5*4) |>
  select(Gas)

gas |>
  autoplot(Gas) +
  labs(
    title = "Australian Gas Production: Last Five Years",
    y = "Gas production"
  )

The plot shows clear seasonal fluctuations, with gas production rising and falling in a repeating quarterly pattern. There also appears to be a slight trend-cycle.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

dcmp <- gas |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components()

dcmp |>
  autoplot()

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

Do the results support the graphical interpretation from part a?

Yes, the decomposition supports the graphical interpretation. The seasonal component shows a repeating quarterly pattern, while the trend-cycle component captures the smoother long-term movement in the series.

Compute and plot the seasonally adjusted data.

# Part c: Seasonally adjusted data
dcmp |>
  autoplot(season_adjust) +
  labs(
    title = "Seasonally Adjusted Gas Production",
    y = "Seasonally adjusted gas"
  )

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_outlier_middle <- gas |>
  mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))

dcmp_outlier_middle <- gas_outlier_middle |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components()

dcmp_outlier_middle |>
  autoplot(season_adjust) +
  labs(
    title = "Seasonally Adjusted Gas with Middle Outlier",
    y = "Seasonally adjusted gas"
  )

The outlier creates a large spike in the seasonally adjusted data. It can also distort the estimated trend-cycle and seasonal indices because classical decomposition is sensitive to extreme values.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

# Part e: Add an outlier near the end
gas_outlier_end <- gas |>
  mutate(Gas = if_else(row_number() == n(), Gas + 300, Gas))

dcmp_outlier_end <- gas_outlier_end |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components()

dcmp_outlier_end |>
  autoplot(season_adjust) +
  labs(
    title = "Seasonally Adjusted Gas with End Outlier",
    y = "Seasonally adjusted gas"
  )

Yes, the position of the outlier matters. An outlier in the middle affects the surrounding trend-cycle more because it is included in the moving average calculation. An outlier near the end mainly affects the final observations and may have less influence on the estimated trend-cycle.

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)
library(x13binary)
set.seed(12345678)

myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# X-11 decomposition
myseries |>
  model(
    x11 = X_13ARIMA_SEATS(Turnover ~ x11())
  ) |>
  components() |>
  autoplot()

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The decomposition shows a strong upward trend in the Australian civilian labour force from 1978 to 1995. The trend component increases steadily over time, indicating long-term growth in the number of people participating in the labour force. The seasonal component is relatively small compared to the overall level of the series, with fluctuations generally between about -100 and 100, while the total series ranges from roughly 6500 to 9000. This indicates that seasonality exists but has a much smaller effect than the long-term trend.

Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession of 1991/1992 is visible in the decomposition, particularly in the remainder component and the trend component. Around 1991–1992 there are several unusually large negative residuals, and the upward trend temporarily slows during this period.

DATA 624 Homework 2

Chanice Mckenzie