library(tsibble)

## Warning: package 'tsibble' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## 
## Attaching package: 'tsibble'

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ ggplot2     3.5.1
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.0

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()     masks base::date()
## ✖ dplyr::filter()       masks stats::filter()
## ✖ tsibble::intersect()  masks base::intersect()
## ✖ lubridate::interval() masks tsibble::interval()
## ✖ dplyr::lag()          masks stats::lag()
## ✖ tsibble::setdiff()    masks base::setdiff()
## ✖ tsibble::union()      masks base::union()

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

The graph below shows the GDP per capita for each year for each country in the dataset.

global_economy |>
  mutate(gdp_per_cap = GDP / Population) |>
  drop_na(gdp_per_cap) |>
  select(Country, Year, gdp_per_cap) |>
  autoplot(gdp_per_cap) + guides(col = "none") +
  labs(title = 'GDP Per Capita by Country Over Time',
       y = 'GDP Per Capita',
       x = 'Year')

The plot below shows all of the countries that had the highest GDP per capita at any point in the data set.

annual_leaders <- global_economy |>
  mutate(gdp_per_cap = GDP / Population) |>
  select(Country, Year, gdp_per_cap) |>
  as_tibble() |>
  drop_na(gdp_per_cap) |>
  group_by(Year) |>
  filter(gdp_per_cap == max(gdp_per_cap)) |>
  ungroup() |>
  arrange(Year)

global_economy |>
  mutate(gdp_per_cap = GDP / Population) |>
  drop_na(gdp_per_cap) |>
  select(Country, Year, gdp_per_cap) |>
  filter(Country %in% annual_leaders$Country) |>
  autoplot(gdp_per_cap) + guides(col = "none") +
  labs(title = 'GDP Per Capita Leaders Over Time',
       y = 'GDP Per Capita',
       x = 'Year') +
  guides(colour = guide_legend(title = "Country"))

In the most recent year in the data set (2017) the GDP-per-capita leader was Monaco. Monaco has held the lead consistently since 1970, which is when the series for that country begins. It has been challenged for this lead a few times, early on by the United Arab Emirates and then recently by Liechtenstein. Prior to Monaco’s inclusion in the data set, the lead was mostly held by the United States.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

US GDP

global_economy |> filter(Code == "USA") |> autoplot(GDP) + labs(title = 'US GDP')

global_economy |>
  filter(Code == "USA") |>
  mutate(gdp_per_cap = GDP / Population) |>
  autoplot(gdp_per_cap) + labs(title = 'US GDP Per Capita')

A transformation of the data to be GDP per capita instead of GDP felt appropriate…but it seems to have barely any effect on the shape of the graph. This is an unexpected (and frankly amazing) result, so much so that it has me second-guessing my code.

Victorian Livestock

aus_livestock |>
  filter(Animal == 'Bulls, bullocks and steers') |>
  filter(State == 'Victoria') |>
  autoplot(Count) + labs(title = 'Victorian Bulls, Bullocks, and Steers Slaughtered')

as_tibble(aus_livestock) |>
  filter(Animal == 'Bulls, bullocks and steers') |>
  group_by(Month) |>
  mutate(share_of_total = Count / sum(Count)) |>
  filter(State == 'Victoria') |>
  ungroup() |>
  as_tsibble(index = Month) |>
  autoplot(share_of_total) + labs(title = 'Victorian Share of Bulls, Bullocks, and Steers Slaughtered')

Transforming the data to represent what percentage of the total “Bulls, bullocks and steers” slaughtered in Australia were from Victoria, rather than a simple count, changes the data somewhat. The count plot shows a dramatic drop-off in the late 1970’s, after which the data never fully recovers (and indeed, eventually drops further).

The transformed data shows a similar pattern but the initial decline is more gradual.

Victorian Electricity Demand

vic_elec |> autoplot(Demand) + labs(title = 'Victorian Electricity Demand')

I do not believe that it is appropriate to apply a transformation to this data.

Gas Production

aus_production |>
  select(Quarter, Gas) |>
  autoplot(Gas) + labs('Australian Gas Production')

# Filter original data
gas_data <- aus_production |>
  select(Quarter, Gas)

# Compute a value for lambda
lambda2d <- gas_data |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)

# Apply a Box-Cox transformation
gas_data <- gas_data |>
  mutate(b_c = box_cox(Gas, lambda2d))

# Plot the transformed series
gas_data |> autoplot(b_c) + labs(title = 'Transformed Data')

A Box-Cox transformation is appropriate to stabilize the variance in this data.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

autoplot(canadian_gas, Volume)

A Box-Cox transformation is unhelpful for this data-set because the variation in the data is already very consistent. As shown below, the transformation will not substantially alter the shape of the data.

# Determine a value for lambda
lambda3 <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)

# Apply a Box-Cox transformation
canadian_gas_bc <- canadian_gas |>
  mutate(b_c = box_cox(Volume, lambda3))

# Plot the transformed data
canadian_gas_bc |> autoplot(b_c)

Notice that the transformed graph has lower values but the shape of the resulting data is nearly identical to the original data set.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

# Recreate the retail time series data (code provided by the textbook)
set.seed(31415)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

# Select relevant columns
myseries <- myseries |> select(Month, Turnover)

# Determine a value for lambda
lambda4 <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda4)

## [1] -0.0264685

# Apply a Box-Cox transformation
myseries_bc <- myseries |>
  mutate(b_c = box_cox(Turnover, lambda4))

# Plot the original series and transformed series
myseries_bc |> autoplot(Turnover) + labs(title = 'Original Data')

myseries_bc |> autoplot(b_c) + labs(title = 'Transformed Data')

I would select a Box-Cox transformation with a lambda value of approximately -0.0265. As desired, the Box-Cox transformation makes the variance more consistent.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco

# Filter data
tobacco_data <- aus_production |> select(Quarter, Tobacco)

# Determine a value for lambda
lambda5a <- tobacco_data |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)

# Apply a Box-Cox transformation
tobacco_data <- tobacco_data |>
  mutate(b_c = box_cox(Tobacco, lambda5a))

# Plot the original series and transformed series
tobacco_data |> autoplot(Tobacco) + labs(title = 'Original Data')

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

tobacco_data |> autoplot(b_c) + labs(title = 'Transformed Data')

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Print the value of lambda
print(lambda5a)

## [1] 0.9264636

For the tobacco data, the appropriate Box-Cox transformation has a \(\lambda\) value of approximately 0.9265. However, the shape of the data does not appear to be substantially altered.

Passengers

# Filter data
passenger_data <- ansett |>
  filter(Class == 'Economy') |>
  filter(Airports == 'MEL-SYD') |>
  select(Week, Passengers)

# Determine a value for lambda
lambda5b <- passenger_data |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)

# Apply a Box-Cox transformation
passenger_data <- passenger_data |>
  mutate(b_c = box_cox(Passengers, lambda5b))

# Plot the original series and transformed series
passenger_data |> autoplot(Passengers) + labs('Original Data')

passenger_data |> autoplot(b_c) + labs('Transformed Data')

# Print the value of lambda
print(lambda5b)

## [1] 1.999927

For the full passengers data, the appropriate Box-Cox transformation has a \(\lambda\) value of approximately 2. However I believe that this value of lambda is skewed by the outliers visible in the original data, particularly those where a 0 is recorded. I’d remove these outliers and replace them with the average value of the data set before computing a value for \(\lambda\), as follows:

# Filter data
passenger_data_2 <- ansett |>
  filter(Class == 'Economy') |>
  filter(Airports == 'MEL-SYD') |>
  select(Week, Passengers)

# Replace outliers
passenger_data_2$Passengers[passenger_data_2$Passengers == 0] <- mean(passenger_data_2$Passengers)

# Determine a value for lambda
lambda5b_2 <- passenger_data_2 |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)

# Apply a Box-Cox transformation
passenger_data_2 <- passenger_data_2 |>
  mutate(b_c = box_cox(Passengers, lambda5b_2))

# Plot the original series and transformed series
passenger_data_2 |> autoplot(Passengers) + labs('Adjusted Original Data')

passenger_data_2 |> autoplot(b_c) + labs('Transformed Adjusted Data')

# Print the value of lambda
print(lambda5b_2)

## [1] 0.1539777

The new value is \(\lambda \approx 0.154\)

Pedestrians

# Filter data
pedestrian_data <- pedestrian |>
  filter(Sensor == 'Southern Cross Station') |>
  select(Date_Time, Count)

# Determine a value for lambda
lambda5c <- pedestrian_data |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)

# Apply a Box-Cox transformation
pedestrian_data <- pedestrian_data |>
  mutate(b_c = box_cox(Count, lambda5c))

# Plot the original series and transformed series
pedestrian_data |> autoplot(Count) + labs(title = 'Original Data')

pedestrian_data |> autoplot(b_c) + labs(title = 'Transformed Data')

# Print the value of lambda
print(lambda5c)

## [1] -0.2501616

For the pedestrians data, the appropriate Box-Cox transformation has a \(\lambda\) value of approximately -0.2502.

3.7

Consider the last five years of the Gas data from aus_production.

# Code provided by the textbook
gas <- tail(aus_production, 5*4) |> select(Gas)

3.7(a)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas |> autoplot(Gas)

There are clear seasonal fluctuations, with elevated levels in Q2 and Q3 (the fall and winter months in Australia) and lower levels in Q4 and Q1 (the spring and summer months in Australia). In each year for which the data exists, Q1 is the lowest, then Q4, then Q2, then Q3 (respectively: Summer, spring, fall, and winter).

3.7(b)

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas |>
  model(classical_decomposition(Gas, type = 'multiplicative')) |>
  components() |>
  autoplot()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

3.7(c)

Do the results support the graphical interpretation from part a?

Yes, these results show a strong seasonal component with peaks in Q2 and Q3 and troughs in Q1 and Q4 every year, and a clear overall trend upward.

3.7(d)

Compute and plot the seasonally adjusted data.

gas |>
  model(classical_decomposition(Gas, type = 'multiplicative')) |>
  components() |>
  as_tsibble() |>
  autoplot(Gas, color = 'black') +
  geom_line(aes(y=season_adjust, color = 'red'))

3.7(e)

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_mod <- gas
gas_mod$Gas[13] <- gas_mod$Gas[13] + 300
gas_mod |> autoplot(Gas)

gas_mod |>
  model(classical_decomposition(Gas, type = 'multiplicative')) |>
  components() |>
  as_tsibble() |>
  autoplot(Gas, color = 'black') +
  geom_line(aes(y=season_adjust, color = 'red'))

The outlier creates a spike in the seasonally adjusted data at the point of the outlier, but it also impacts the data at other times. Specifically, it appears to “push down” the seasonally adjusted data throughout. In the original seasonally adjusted plot, the seasonally adjusted data was under 200 until Q1 of 2006, but after that it never drops below 200. However, with the high-end Q3 outlier included, the seasonally adjust data drops below 200 in Q3 of every year through 2009.

3.7(f)

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_mod_2 <- gas
gas_mod_2$Gas[17] <- gas_mod_2$Gas[17] + 300
gas_mod_2 |> autoplot(Gas)

gas_mod_2 |>
  model(classical_decomposition(Gas, type = 'multiplicative')) |>
  components() |>
  as_tsibble() |>
  autoplot(Gas, color = 'black') +
  geom_line(aes(y=season_adjust, color = 'red'))

Changing the outlier to be near the end of the series instead of closer to the middle does not seem to have a big impact on the seasonally adjusted data, except of course for a change in the location of the spike corresponding to the outlier.

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

# Code provided by the textbook, with the seed modified
set.seed(31415)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries |>
  select(Month, Turnover) |>
  model(X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components() |>
  autoplot()

Mostly this confirms my previous observations that there is a recognizable but not particularly impactful seasonal component to the data and an overall trend upward that reverses near the end of the series.

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

fig3.19 fig3.20

3.9(a)

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The decomposition shows four plots. The “value” plot shows the original data, which varies slightly but demonstrates a clear trend upward over time. The “trend” plot has a scale similar to that of the overall graph, and appears to be strictly (although not uniformly) increasing over time. The “season_year” plot shows that there is a clear seasonal component to the data, but the scale indicates that it does not result in significant variation from month to month. This plot also shows the seasonal impact becoming slightly more pronounced in more recent years, with higher peaks and lower troughs. Finally, the “remainder” plot shows the variation that is not accounted for by the trend or seasonal elements. Although the plot does not appear to show substantial variation, that’s because it has a different scale than the seasonaal plot. In fact, the initial peaks of the “remainder” plot match the magnitude of the initial peaks of the “season_year” plot (between 100 and -100). The “remainder” plot also shows some substantial low-end outliers later in the plot.

3.9(b)

Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession is visible in two different ways: First, the “trend” plot flattens significantly around then. Second, the “remainder” plot (which shows all variation not accounted for by the trend or season plots) shows several low outliers for the data points corresponding to those years, in fact these outliers are the most remarkable feature of the entire image.

DATA 624 HW 2

Marley Myrianthopoulos

2024-09-15

3.1

3.2