library(fpp3)
library(seasonal)

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

We can see that there’s a general upwards trend for the GDP data. The country that has the highest GDP per capita is Monaco, and it has been steadily increasing over time.

#Plotting GDP per capita for each country over time
global_economy |> 
  autoplot(GDP/Population, show.legend=FALSE, na.rm=TRUE) +
  labs(
    title = "GDP per capita for countries over the world", x = "Year", y = "GDP per capita ($USD)"
  )
## Warning: `autoplot.tbl_ts()` was deprecated in fabletools 0.6.0.
## ℹ Please use `ggtime::autoplot.tbl_ts()` instead.
## ℹ Graphics functions have been moved to the {ggtime} package. Please use
##   `library(ggtime)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#Which country has the highest GDP per capita - Monaco
gdp_per_capita <- global_economy |>
  mutate(gdp_per_capita = GDP/Population)

max_gdp_per_capita <- gdp_per_capita[which.max(gdp_per_capita$gdp_per_capita), ]

print(max_gdp_per_capita)
## # A tsibble: 1 x 10 [1Y]
## # Key:       Country [1]
##   Country Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>   <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Monaco  MCO    2014 7060236168.   7.18    NA      NA      NA      38132
## # ℹ 1 more variable: gdp_per_capita <dbl>
#Showing how Monaco's GDP per capita has changed over time
gdp_per_capita |>
  filter(Country == "Monaco") |>
  autoplot(gdp_per_capita, na.rm=TRUE) +
  labs(
    title = "GDP per capita for Monaco", x = "Year", y = "GDP per capita ($USD)")

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.

Based on the readings, this looked like it could use a square root transformation, but we know that GDP has grown exponentially, leading me to think I should use log. Ultimately, I calculated the lambda with Box Cox, and got 0.282, which is between 0.0 and 0.5 for the log and sqrt transformations. Because the lambda value is in the middle, I decided to do the Box Cox transformation with the lambda I got and it made the trend more linear and the distribution more uniform.

#United States GDP from global_economy

usa_economy <- global_economy |>
  filter(Country == "United States")

#Initial plots
usa_economy |>
  autoplot(GDP) +
  labs(title = "United States GDP")

ggplot(usa_economy, aes(x = GDP)) + geom_histogram()

#Calculating optimal lambda with guerrero
lambda <- usa_economy |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] 0.2819443
#Apply Box Cox transformation
usa_economy <- usa_economy |>
  mutate(boxcox_gdp = box_cox(GDP, lambda))

#Transformed plots
usa_economy |>
  autoplot(boxcox_gdp) +
  labs(title = "Box Cox transformed GDP")

ggplot(usa_economy, aes(x = boxcox_gdp)) + geom_histogram()

Lambda was close to 0, so I performed a log transformation. Doing log transform slightly stabilized the variance, but the effect was relatively minor.

#Slaughter of Victorian Bulls, bullocks and steers in aus_livestock
victoria_livestock <- aus_livestock |> 
  filter(State == "Victoria", Animal == "Bulls, bullocks and steers")

#Initial plot
victoria_livestock |>
  autoplot(Count) +
  labs(title = "Slaughter of Victorian Bulls, bullocks and steers")

#Using log
victoria_livestock <- victoria_livestock |>
  mutate(log_count = log(Count))

victoria_livestock |>
  autoplot(log_count) +
  labs(title = "Log Transformed Slaughter of Victorian Bulls, bullocks and steers")

I got a lambda close to 0, indicating that I need a log transformation. After performing a log transformation on Victorian Electricity Demand, it seems to have stabilized the variance.

#Victorian Electricity Demand from vic_elec.

#Initial plot
vic_elec |>
  autoplot(Demand) + labs(title = "Victorian Electricity Demand from vic_elec")

#Apply log transformation
vic_elec <- vic_elec |>
  mutate(log_demand = log(Demand))

vic_elec |>
  autoplot(log_demand) + labs(title = "Log transformed Victorian Electricity Demand from vic_elec")

Seasonality and variance became more consistent (same size), even if the trend became more curved. This essentially followed the book’s example.

#Gas production from aus_production.
aus_production |> autoplot(Gas) + labs(title = "Australia Gas production")

#Calculating optimal lambda with guerrero
lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] 0.1095171
#Apply Box Cox transformation
aus_production <- aus_production |>
  mutate(boxcox_gas = box_cox(Gas, lambda))

#Transformed plot
aus_production |>
  autoplot(boxcox_gas) +
  labs(title = "Box Cox transformed Australia Gas Production")

# 3.3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

A Box-Cox transformation is unhelpful for the canadian_gas data because the seasonal variance remains the same both before and after the transformation.

canadian_gas |> autoplot(Volume) + labs(title = "Canada Gas")

#Calculating optimal lambda with guerrero
lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] 0.5767648
#Apply Box Cox transformation
canadian_gas <- canadian_gas |>
  mutate(boxcox_volume = box_cox(Volume, lambda))

#Transformed plot
canadian_gas |>
  autoplot(boxcox_volume) +
  labs(title = "Box Cox transformed Canadian Gas")

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Again, the Lambda from the Box Cox test here was so low that is it essentially a log transformation. But I ended up using a Box Cox transformation and this worked to stabilize the variance.

set.seed(33)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

#Initial plot
myseries |> 
  autoplot(Turnover) + labs(title = "Australian Retail Trade Turnover")

#Calculating optimal lambda with guerrero
lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] 0.07636928
#Apply Box Cox transformation
myseries <- myseries |>
  mutate(boxcox_turnover = box_cox(Turnover, lambda))

#Transformed plot
myseries |>
  autoplot(boxcox_turnover) +
  labs(title = "Box Cox transformed Australian Retail Trade Turnover")

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

The lambda for Australian tobacco production was almost 1, meaning that it requires almost no transformation. Very little change.

aus_production |> autoplot(Tobacco) + labs(title = "Australian Tobacco Production")

#Calculating optimal lambda with guerrero
lambda <- aus_production |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] 0.9264636
#Apply Box Cox transformation
aus_production <- aus_production |>
  mutate(boxcox_tobacco = box_cox(Tobacco, lambda))

#Transformed plot
aus_production |>
  autoplot(boxcox_tobacco) +
  labs(title = "Box Cox transformed Australian Tobacco Production")

mel_syd <- ansett |>
  filter(Class == "Economy", Airports == "MEL-SYD")

mel_syd |> autoplot(Passengers) + labs(title = "Economy class passengers between Melbourne and Sydney")

#Calculating optimal lambda with guerrero
lambda <- mel_syd |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] 1.999927
#Apply Box Cox transformation
mel_syd <- mel_syd |>
  mutate(boxcox_passengers = box_cox(Passengers, lambda))

#Transformed plot
mel_syd |>
  autoplot(boxcox_passengers) +
  labs(title = "Box Cox transformed Economy class passengers between Melbourne and Sydney")

southern <- pedestrian |>
  filter(Sensor == "Southern Cross Station")

southern |> autoplot(Count) + labs(title = "Pedestrian counts at Southern Cross Station")

#Calculating optimal lambda with guerrero
lambda <- southern |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] -0.2501616
#Apply Box Cox transformation
southern <- southern |>
  mutate(boxcox_count = box_cox(Count, lambda))

#Transformed plot
southern |>
  autoplot(boxcox_count) +
  labs(title = "Box Cox transformed Pedestrian counts at Southern Cross Station")

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas) Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series

There is a seasonal variability and a slight upwards trend. The variations are between quarters, with the biggest dips being in Q1, and the peaks being Q2.

gas <- tail(aus_production, 5*4) |> select(Gas)
gas |> autoplot(Gas)

gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>
  autoplot() +
  labs(title = "Multiplicative Decomposition of total Australian Gas Production")

Yes, I believe the decomposition results do support the initial observation that there is a slight upwards trend and seasonal variability.

seasonal_gas <- gas |>
  model(classical_decomposition(Gas, type = "multiplicative")) |> components()

seasonal_gas |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(title = "Gas Production") + scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

#Beginning outlier
abnormal_gas <- gas
abnormal_gas$Gas[1] <- abnormal_gas$Gas[1] + 300

abnormal_seasonal_gas <- abnormal_gas |>
  model(classical_decomposition(Gas, type = "multiplicative")) |> components()

abnormal_seasonal_gas |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(title = "Gas Production") + scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

abnormal_seasonal_gas |> autoplot()

#Middle outlier
abnormal_gas <- gas
abnormal_gas$Gas[11] <- abnormal_gas$Gas[11] + 300

abnormal_seasonal_gas <- abnormal_gas |>
  model(classical_decomposition(Gas, type = "multiplicative")) |> components()

abnormal_seasonal_gas |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(title = "Gas Production") + scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

abnormal_seasonal_gas |> autoplot()

#Outlier in the end
abnormal_gas <- gas
abnormal_gas$Gas[20] <- abnormal_gas$Gas[20] + 300

abnormal_seasonal_gas <- abnormal_gas |>
  model(classical_decomposition(Gas, type = "multiplicative")) |> components()

abnormal_seasonal_gas |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(title = "Gas Production") + scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

abnormal_seasonal_gas |> autoplot()

The outlier being in the beginning and in the end doesn’t seem to have an impact on the seasonal pattern, but it does impact the trend. On the other hand, placing the outlier in the middle impacts both the seasonal and the trend patterns. This is because of how moving averages work. Outliers at the beginning or end primarily affect the trend at those endpoints where the moving average lacks observations on both sides. Middle outliers affect both trend and seasonality, making them more damaging to the decomposition.

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(123)

myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

x11_dcmp <- myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()


autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Australian retail turnover using X-11.")

X-11 does reveal some outliers that may be affecting the trend around 2000 and 2010, perhaps related to the dotcom bubble and the Great Recession. But the seasonal pattern has remained the same, despite the outliers.

3.9

(a) The decomposition reveals a strong upward trend in the civilian labor force. The seasonal chart shows a consistent annual pattern which has remained stable throughout - note the scale is much smaller at around 100 compared to the overall values (6500 - 9000). The remainder component shows relatively small random fluctuations around zero for most of the period, but displays a dramatic negative spike around 1991-1992, dropping to approximately -400. The second plot shows that the seasonal pattern peaks in December and dips around February/March.

(b) Yes, the recession is visible in the remainder component as a sharp negative spike dropping to -400. This represents a significant deviation from the expected trend and seasonal pattern, indicating the labor force was substantially lower than what would be predicted by the normal trend and seasonal patterns.