DATA 624: Assignment 02

Author

Curtis Elsasser

Published

February 16, 2025

Setup

library(fpp3)

Assignment

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code.

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

data <- global_economy |>
  filter(is.na(GDP) == FALSE & is.na(Population) == FALSE) |>
  mutate(
    GDP_PC = GDP/Population
  )

data |>
  autoplot(GDP_PC, show.legend = FALSE) +
  facet_wrap(~Country, ncol = 8)

It looks as if the countries with the highest and most recent GDP per capita are Liechtenstein and Monaco. Let’s run a little query to see if this is true.

as_tibble(data) |>
  group_by(Year) |>
  summarize(
    Max_GDP_PC = max(GDP_PC),
    Country = Country[which.max(GDP_PC)]
  ) |>
  arrange(desc(Year)) |>
  head(n = 20)

# A tibble: 20 × 3
    Year Max_GDP_PC Country      
   <dbl>      <dbl> <fct>        
 1  2017    104103. Luxembourg   
 2  2016    168011. Monaco       
 3  2015    167591. Liechtenstein
 4  2014    185153. Monaco       
 5  2013    173528. Liechtenstein
 6  2012    152000. Monaco       
 7  2011    162155. Monaco       
 8  2010    144569. Monaco       
 9  2009    149221. Monaco       
10  2008    180640. Monaco       
11  2007    167125. Monaco       
12  2006    133195. Monaco       
13  2005    124374. Monaco       
14  2004    123382. Monaco       
15  2003    108978. Monaco       
16  2002     89061. Monaco       
17  2001     82553. Monaco       
18  2000     82535. Monaco       
19  1999     91384. Monaco       
20  1998     93093. Monaco

Well, low lying Luxembourg had the highest GDP per capita in 2017. It’s a blip on the radar that wasn’t apparent in our plot.

Let’s see who has the highest average GDP per capita.

as_tibble(data) |>
  group_by(Country) |>
  summarize(
    Mean_GDP_PC = mean(GDP_PC)
  ) |>
  arrange(desc(Mean_GDP_PC)) |>
  head(n = 5)

# A tibble: 5 × 2
  Country         Mean_GDP_PC
  <fct>                 <dbl>
1 Monaco               86474.
2 Liechtenstein        68092.
3 San Marino           59263.
4 Cayman Islands       55868.
5 Channel Islands      50912.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy |>
  filter(Country == "United States" & is.na(GDP) == FALSE) |>
  autoplot(GDP) +
  labs(title = "United States GDP")

The GDP is certainly exponential. We could straighten it out with a log transformation. I’m not totally sure why we would want to do this, but I have no doubt understanding will follow.

global_economy |>
  filter(Country == "United States" & is.na(GDP) == FALSE) |>
  autoplot(log(GDP)) +
  labs(
    title = "United States GDP",
    y = "Log(GDP)"
  )

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock |>
  filter(Animal == "Bulls, bullocks and steers" & is.na(Count) == FALSE) |>
  autoplot(Count, show.legend = FALSE) +
  labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers") +
  facet_wrap(~State, ncol = 2, scales = "free_y")

If we were isolating “Northern Territory” then I would try an inverted power plot. But I have doubts that it would be fruitful, because the data is not strictly increasing or decreasing.

Victorian Electricity Demand from vic_elec.

vic_elec |>
  filter(is.na(Demand) == FALSE) |>
  autoplot(Demand) +
  labs(title = "Victorian Electricity Demand")

We can clearly an annual seasonal pattern, but being that the data is sampled every 30 minutes over the course of three years, it’s extremely dense. Their summers appear to be very noisy. Let’s zoom in on the summer of 2013.

vic_elec |>
  filter(is.na(Demand) == FALSE
    & Date >= date("2012-11-01")
    & Date < date("2013-03-01")
    ) |>
  autoplot(Demand) +
  labs(
    title = "Victorian Electricity Demand (Summer 2012-2013)"
  )

There is a lot that is interesting in this plot, but I don’t think that there is a transformation that would help. But being zoomed in, we can clearly see a daily seasonal interval.

Gas production from aus_production.

aus_production |>
  filter(is.na(Gas) == FALSE) |>
  autoplot(Gas) +
  labs(title = "Gas Production")

The gas production’s seasonal component’s variation is clearly increasing with time. Let’s try applying a log to it.

aus_production |>
  filter(is.na(Gas) == FALSE) |>
  autoplot(log(Gas)) +
  labs(
    title = "Gas Production",
    y = "Log(Gas)"
  )

Ah, that has worked nicely. The seasonal variation is a lot more constant now.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas |>
  filter(is.na(Volume) == FALSE) |>
  autoplot(Volume) +
  labs(title = "Canadian Gas")

I believe that it’s not helpful because the data is not strictly increasing or decreasing. It is fairly constant for the first third of the series. The second third increases and then decreases and the final third if fairly constant again. The box-cox wants to either apply a log or power transformation to the whole series. It will not help this series.

Let’s see what querrero suggests as a lambda value and plot it.

lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)

canadian_gas |>
  filter(is.na(Volume) == FALSE) |>
  mutate(Volume = box_cox(Volume, lambda)) |>
  autoplot(Volume) +
  labs(
    title = "Canadian Gas with Box-Cox Transformation",
    subtitle = paste("Lambda =", round(lambda, 3))
  )

It compressed the data, but did not help with the variance. The variance is still increasing and decreasing as in the non-transformed state.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

2.10.7: Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(314159)
data <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

data |>
  filter(is.na(Turnover) == FALSE) |>
  autoplot(Turnover) +
  labs(title = "Retail Data")

The series is very clearly exhibits a seasonal pattern, with a variance that increases with the level of the series. I would apply a log transformation. Let’s try calculating a \(\lambda\) value using the Guerrero method.

lambda <- data |>
  filter(is.na(Turnover) == FALSE) |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

data |>
  filter(is.na(Turnover) == FALSE) |>
  mutate(Turnover = box_cox(Turnover, lambda)) |>
  autoplot(Turnover) +
  labs(
    title = "Retail Data with Box-Cox Transformation",
    subtitle = paste("Lambda =", round(lambda, 3))
  )

That has stabilised the variance very nicely. We now have a fairly constant variance across the series.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance:

Tobacco from aus_production,

aus_production |>
  filter(is.na(Tobacco) == FALSE) |>
  autoplot(Tobacco) +
  labs(title = "Tobacco Production")

lambda <- aus_production |>
  filter(is.na(Tobacco) == FALSE) |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)

aus_production |>
  filter(is.na(Tobacco) == FALSE) |>
  mutate(Tobacco = box_cox(Tobacco, lambda)) |>
  autoplot(Tobacco) +
  labs(
    title = "Tobacco Production with Box-Cox Transformation",
    subtitle = paste("Lambda =", round(lambda, 3))
  )

That changed the scale but did not change the variance. I cannot think of a transformation that would help with this series. I think seasonal decomposition with a noise component would help this series.

Economy class passengers between Melbourne and Sydney from ansett,

data <- ansett |>
  filter(
    Class == "Economy" 
    & Airports == "MEL-SYD" 
    & is.na(Passengers) == FALSE
  )
data |>
  autoplot(Passengers) +
  labs(title = "Economy Class Passengers between Melbourne and Sydney")

lambda <- data |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)

data |>
  mutate(Passengers = box_cox(Passengers, lambda)) |>
  autoplot(Passengers) +
  labs(
    title = "Economy Class Passengers between Melbourne and Sydney with Box-Cox Transformation",
    subtitle = paste("Lambda =", round(lambda, 3))
  )

Pedestrian counts at Southern Cross Station from pedestrian.

data <- pedestrian |>
  filter(
    Sensor == "Southern Cross Station" 
    & is.na(Count) == FALSE
  )
data |>
  autoplot(Count) +
  labs(title = "Pedestrian Counts at Southern Cross Station")

lambda <- data |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)
data |>
  mutate(Count = box_cox(Count, lambda)) |>
  autoplot(Count) +
  labs(
    title = "Pedestrian Counts at Southern Cross Station with Box-Cox Transformation",
    subtitle = paste("Lambda =", round(lambda, 3))
  )

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas |>
  autoplot(Gas) +
  labs(
    title = "Gas Production"
  )

Yes, I can definitely see a seasonal pattern. There very clearly is a yearly seasonal pattern. The trend-cycle in increasing with a slight logarithmic downturn.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

data <- gas |>
  filter(is.na(Gas) == FALSE) |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components()
data |>
  autoplot() +
  labs(
    title = "Gas Production"
  )

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_line()`).

Do the results support the graphical interpretation from part a?

Partly. It also has identified a strong yearly seasonal pattern. And it also has identified the trend-cycle as increasing, but it describes the trend-cycle as being step-wise linear where I described it as increasing with a logarithmic downturn.

Compute and plot the seasonally adjusted data.

data |>
  mutate(
    Gas = Gas / seasonal
  ) |>
  select(Gas) |>
  autoplot(Gas) +
  labs(
    title = "Seasonally Adjusted Gas Production"
  )

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_outlier <- gas
gas_outlier$Gas[3] <- gas_outlier$Gas[3] + 300
gas_outlier |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>
  mutate(
    Gas = Gas / seasonal
  ) |>
  select(Gas) |>
  autoplot(Gas) +
  labs(
    title = "Seasonally Adjusted Gas Production with Outlier"
  )

The outlier has a very large effect on the seasonally adjusted data. First, the outlier remains very visible in the seasonally adjusted data. I think the more interesting side effect is that the outlier has also changed the seasonal component. I believe the seasonal component has been attenuated, which is causing the seasonal component to be much more pronounced in the seasonally adjusted data.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_outlier_b <- gas
gas_outlier_e <- gas
gas_outlier_b$Gas[1] <- gas$Gas[1] + 300
gas_outlier_e$Gas[20] <- gas$Gas[20] + 300

plot <- function(data, title) {
  data |>
    model(
      classical_decomposition(Gas, type = "multiplicative")
    ) |>
    components() |>
    mutate(
      Gas = Gas / seasonal
    ) |>
    select(Gas) |>
    autoplot(Gas) +
    labs(
      title = title
    )
}

plot(gas_outlier_b, "Seasonally Adjusted Gas Production with Outlier at Beginning")

plot(gas_outlier_e, "Seasonally Adjusted Gas Production with Outlier at End")

Yes, putting the outlier at the extremes has a different effect than placing the outlier within the series. When the effect of putting an outlier at the beginning is compared to the effect of putting an outlier at the end, the effects are more similar than when compared to the outlier in the middle. Nonetheless, they do differ. The outlier at the beginning of the series has a larger effect on the seasonal component. The outlier at the end of the series has a larger effect on the trend-cycle component.

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

2.10.7: Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(314159)
data <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

data |>
  model(
    x11 = X_13ARIMA_SEATS(Turnover ~ x11())
  ) |>
  components() |>
  autoplot() +
  labs(
    title = "Retail Data with X-11 Decomposition"
  )

I don’t think that it revealed anything that I had not noticed previously. But it does exaggerate outliers in the trend-cycle component that were not nearly as clear as they are when isolated. It also does a great job of amplifying the seasonal component from the early part of the series in which the seasonal component was not very clear.

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The trend-cycle component is increasing and is very linear in its increase. The seasonal component is a lot more clear than in the original time series, though it’s amplitude is very small when compared to the trend-cycle’s range. The seasonal component has a clear annual pattern, but I think it also has a biannual pattern that splits a year into two parts that approximate a 60%/40% division (in time). The remainder component exhibits the same variation as the seasonal component does, but this might be due to a radical outlier around 1992. Which I see is in the next question.

Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession of 1991/1992 is very visible in the remainder component. It appears as two enormous, downward facing spikes around 1991/1992. I find it interesting that the trend-cycle component shows a very slight hump that peaks just before the recession strikes. But, for the most part, the trend cycle component is not very affected by the recession. The seasonal component does not appear to be affected by the recession.