DATA 624 RB HW2

Exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.4.1
## ✔ lubridate   1.9.2     ✔ fable       0.4.1
## ✔ ggplot2     3.5.0

## Warning: package 'dplyr' was built under R version 4.3.3

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(dplyr)
library(tsibble)

——————————————————————————–

(1) Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy %>%
  autoplot(GDP / Population, show.legend =  FALSE) +
  labs(title= "GDP per Capita", x="Year")

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

# The years range from 1960-2017
# unique(global_economy$Year)

# Country with highest GDP per Capita
global_economy %>%
  mutate(GDP_per_Cap = GDP/Population) %>%
  slice_max(GDP_per_Cap, n = 10) %>%
  distinct()

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

## # A tibble: 10 × 10
##    Country       Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>         <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Monaco        MCO    2014 7060236168.  7.18     NA      NA      NA      38132
##  2 Monaco        MCO    2008 6476490406.  0.732    NA      NA      NA      35853
##  3 Liechtenstein LIE    2014 6657170923. NA        NA      NA      NA      37127
##  4 Liechtenstein LIE    2013 6391735894. NA        NA      NA      NA      36834
##  5 Monaco        MCO    2013 6553372278.  9.57     NA      NA      NA      37971
##  6 Monaco        MCO    2016 6468252212.  3.21     NA      NA      NA      38499
##  7 Liechtenstein LIE    2015 6268391521. NA        NA      NA      NA      37403
##  8 Monaco        MCO    2007 5867916781. 14.4      NA      NA      NA      35111
##  9 Liechtenstein LIE    2016 6214633651. NA        NA      NA      NA      37666
## 10 Monaco        MCO    2015 6258178995.  4.94     NA      NA      NA      38307
## # ℹ 1 more variable: GDP_per_Cap <dbl>

# Plotting the 2 countries
global_economy %>%
  filter(Country == c("Monaco", "Liechtenstein")) %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per Capita", subtitle ="Monaco & Liechtenstein")

## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).

The country with the greatest GDP per Capita is Monaco in 2014/2008, followed closely by Liechtenstein in 2014/2013. The dataset ranges from the time-period of 1960 to 2017, and from looking at the trends for those 2 countries, which started in 1970, that Monaco has had a higher GDP/Capita for most of the time, with a brief time period from 2010-2013 where Liechtenstein was higher as Monaco experienced a drop.

——————————————————————————–

(2) For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

[United States GDP from global_economy.]

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title="GDP", subtitle="United States")

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(sqrt(GDP)) + 
  ggtitle("GDP: Square-Root Transform", subtitle="United States")

The trend seems to be an obvious quadratic increase, with a drop in 2008/2009 due to the Housing Crisis Bubble, and the Square-Root transform seemed to straighten out the original plot a bit.

[Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.]

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(Count) + 
  ggtitle("Slaughter of Victorian Bulls, Bullocks and Steers", subtitle="State of Victoria")

# Normalize with Log transform?
aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(log(Count)) + 
  ggtitle("Slaughter of Victorian Bulls, Bullocks and Steers: Log-Transform", subtitle="State of Victoria")

# Trying a Box-Cox transform
lambda <- aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  features(Count,features = guerrero) %>%
  pull(lambda_guerrero) # -0.04461887

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(box_cox(Count,lambda)) + 
  labs(title = paste("Slaughter of Victorian Bulls, Bullocks and Steers: Box-Cox-Transform, λ = ", round(lambda,3)), subtitle="State of Victoria")

The slaughter of Victorian bulls, bullocks, and steers is rather erratic with quite a bit of white noise, although there does appear be cyclical trends of increases and decreases. Performing a log transform didn’t help make it more interpretable. When I run a Box-Cox transform, I get a λ of -0.0444619, which means it is a Power Transform followed by scaling. A good value of λ is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler.

[Victorian Electricity Demand from vic_elec.]

vic_elec %>%
  autoplot(Demand) + 
  ggtitle("Half-Hour Electricity Demand for Australia", subtitle="State of Victoria")

vic_elec %>%
  autoplot(log(Demand)) + 
  ggtitle("Half-Hour Electricity Demand for Australia: Log Transform", subtitle="State of Victoria")

# Daily
vic_elec %>%
  index_by(Date) %>%
  summarise(Total_Demand = sum(Demand)) %>%
  autoplot(Total_Demand) + 
  ggtitle("Daily Electricity Demand for Australia", subtitle="State of Victoria")

# Weekly
vic_elec %>%
  mutate(Week = yearweek(Time)) %>%
  index_by(Week) %>%
  summarise(Total_Demand = sum(Demand)) %>%
  autoplot(Total_Demand) + 
  ggtitle("Weekly Electricity Demand for Australia", subtitle="State of Victoria")

The most effective transformation would be to change it from half-hourly to a different time period, in this case daily or weekly to remove much of the noise. However, there is still no effective transformation, and there is no immediate noticeable trend as electricity is always in high demand.

[Gas production from aus_production.]

aus_production %>% 
  autoplot(Gas) +
  labs(title = "Australian Gas Production")

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(title = paste("Australian Gas Production: Box-Cox-Transform, λ = ", round(lambda,3)))

The auto-plot immediately reveals that a Box-Cox transformation would be effective here, as it looks like the variation increases with the level of series. The λ value is 0.10951

——————————————————————————–

(3) Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>% 
  autoplot(Volume) +
  labs(title = "Canadian Gas Volume")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(title = paste("Canadian Gas Volume: Box-Cox-Transform, λ = ", round(lambda,3)))

A Box-Cox transformation is unhelpful for this data because the variation doesn’t only increase or decrease with the level of the series. It starts off with small variations, then larger ones from 1970-1990, and drops back down to small fluctuations until 2010. Here, λ is a larger value (0.5767648).

——————————————————————————–

(4) What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(22397)

my_series <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

autoplot(my_series, Turnover) +
  labs(title = "Retail Turnover")

lambda <- my_series %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

my_series %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(title = paste("Retail Turnover: Box-Cox Transform, λ = ", round(lambda,3)))

My Box-Cox transformation would have a λ of-0.05317855 to help normalize the auto-plot

——————————————————————————–

(5) For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

Tobacco from aus_production

aus_production %>% 
  autoplot(Tobacco) +
  labs(title = "Tobacco Production")

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(title = paste("Tobacco Production: Box-Cox Transform, λ = ", round(lambda,3)))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Economy class passengers between Melbourne and Sydney from ansett

ansett %>% 
  filter(Class == "Economy" & Airports == "MEL-SYD") %>% 
  autoplot(Passengers) +
  labs(title = "Economy Class Passengers", subtitle = "Melbourne and Sydney")

lambda <- ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  features(Passengers,features = guerrero) %>%
  pull(lambda_guerrero)

ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(title = paste("Economy Class Passengers: Box-Cox Transform, λ = ", round(lambda,3)), subtitle = "Melbourne and Sydney")

Pedestrian counts at Southern Cross Station from pedestrian

southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") 

southern_cross %>% 
  autoplot(Count) + 
  labs(title="Hourly Pedestrian Counts", subtitle = "Southern Cross Station")

lambda1 <- southern_cross %>%
  features(Count,features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count,lambda1)) + 
  labs(title = paste("Hourly Pedestrian Counts: Box-Cox Transform, λ = ", round(lambda1,2)), subtitle = "Southern Cross Station")

# Still un-interpretable, change the time scale
southern_cross <- southern_cross %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count))

autoplot(southern_cross, Count)+
  labs(title="Weekly Pedestrian Counts", subtitle = "Southern Cross Station")

lambda2 <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda2)) +
  labs(title = paste("Weekly Pedestrian Counts: Box-Cox Transform, λ = ", round(lambda2,3)), subtitle = "Southern Cross Station")

——————————————————————————–

(7) Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>%
  select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>%
  autoplot(Gas) + 
  labs(title = "Quarterly Gas Production")

There is a clear trend showing positive growth, along with seasonal fluctuations that show a drop before rising again to a higher level than when the drop first started. The time-span of each of these is 1 year, as we can see the Gas volume in 2005Q3 was 221, then dropping to 171 in 2006Q1, and rising again to 233 in 2006Q3 for example.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_dcmp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

gas_dcmp %>% autoplot() +
  labs(title = "Quarterly Gas Production: Classical Multiplicative Decomposition")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

Do the results support the graphical interpretation from part a?
The results confirm my notes above, there is a clear rising trend, along with seasonal fluctuations of near identical volume. Thanks to the decomposition chart, we can also see there are random fluctuations as well, although they are minuscule in amplitude and have a minor impact on the overall visual.
Compute and plot the seasonally adjusted data.

gas_dcmp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

# Extract components
gas_dcmp_components <- components(gas_dcmp)

# Plot the seasonally adjusted data
gas_dcmp_components %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y = Gas / seasonal), colour = "green") +  # Seasonally adjusted is Gas / seasonal component
  labs(title = "Quarterly Gas Production: Seasonally Adjusted")

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

# Add 300 to the last row (236 -> 536)
gas_outlier <- gas
gas_outlier[nrow(gas_outlier), "Gas"] <- gas_outlier[nrow(gas_outlier), "Gas"] + 300

# Classical decomposition
gas_dcmp_outlier <- gas_outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

gas_dcmp_outlier_components <- components(gas_dcmp_outlier)

# Regular plot
gas_dcmp_outlier_components %>% autoplot() +
  labs(title = "Quarterly Gas Production + End Outlier: Classical Multiplicative Decomposition")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Seasonally adjusted data
gas_dcmp_outlier_components %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y = Gas / seasonal), colour = "green") +  # Seasonally adjusted (Gas / seasonal component)
  labs(title = "Quarterly Gas Production + End Outlier: Seasonally Adjusted")

After adding the outlier value of 300 to the last observation, which is 236 in 2010Q2, the classical decomposition model shows the impact in the trend having an overall large spike. The seasonal has no impact, but interestingly enough, the random component has a large drop as the outlier is very large relative to the rest of the observations in gas.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_outlier_middle <- gas


# Classical decomposition
gas_dcmp_outlier <- gas_outlier_middle %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

gas_dcmp_outlier_components <- components(gas_dcmp_outlier)

# Regular plot
gas_dcmp_outlier_components %>% autoplot() +
  labs(title = "Quarterly Gas Production + Middle Outlier: Classical Multiplicative Decomposition")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Seasonal
gas_outlier_middle %>%
  mutate(Gas = if_else(Quarter==yearquarter("2007Q4"), Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(season_adjust) +
  labs(title = "Quarterly Gas Production + Middle Outlier: Seasonally Adjusted")

After adding the outlier value of 300 to the middle observation, in the decomposition plot, we can see that the trend no longer has a spike at the end, but a small bump in the middle, before rising again towards the end. The random portion of the plot also shows an uptick, while the seasonal trend stays the same. However, the seasonally adjusted plot shows the outlier in the middle as expected.

——————————————————————————–

(8) Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(22397)

x11_dcmp <- my_series %>%
  model(X11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components() 

autoplot(x11_dcmp) + 
  labs(title = "Retail Turnover: X-11 Decomposition")

After decomposing my series, I notice that the seasonality does actually vary with time- there is a decrease in 1995, then a rise in 2005, after which it seems to be rising and dropping again at odd intervals. There are also 4 large spikes in the Irregularities section, along with an overall increase in the trend apart from the peaks and valleys from 2005-2015. My original plot for my series was very irregular, but this X-11 decomposition helps to see where the changes are occurring, even after my Box-Cox transformation.

——————————————————————————–

(9) Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995

Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Figure 3.20: Seasonal component from the decomposition shown in the previous figure

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The decomposition shows a clear positive trend with the number of civilians in the labor force, with consistent seasonal fluctuations occurring over a period of 1 year. There is a significant drop from 1991/1992, which shows the recession in the remainder component. The scale shows that the seasonal fluctuations and remainder are rather small in scale in relation to the overall trend-cycle chart. This implies that seasonality does not have much bearing on the labor force, as it grows at larger rates. The seasonal component plot implies that the average on-boarding of people joining the labor force is higher in the months of March and December.

Is the recession of 1991/1992 visible in the estimated components?

Absolutely it is, as we can see a large drop in the remainder plot.

DATA 624 RB HW2

Ron Balaban

2025-02-05

——————————————————————————–

(1) Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

——————————————————————————–

(2) For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

——————————————————————————–

(3) Why is a Box-Cox transformation unhelpful for the canadian_gas data?

——————————————————————————–

(4) What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

——————————————————————————–

(5) For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

——————————————————————————–

(7) Consider the last five years of the Gas data from aus_production.

——————————————————————————–

(8) Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

——————————————————————————–

(9) Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995