624 hw2

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.3     ✔ fable       0.5.0
## ✔ ggplot2     3.5.1

## Warning: package 'dplyr' was built under R version 4.4.2

## Warning: package 'ggplot2' was built under R version 4.4.2

## Warning: package 'tsibble' was built under R version 4.4.3

## Warning: package 'tsibbledata' was built under R version 4.4.3

## Warning: package 'feasts' was built under R version 4.4.3

## Warning: package 'fabletools' was built under R version 4.4.3

## Warning: package 'fable' was built under R version 4.4.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tsibble)

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy |>
  mutate(adj_GDP = GDP / Population) |>
  drop_na(adj_GDP) |>
  select(Country, Year, adj_GDP) |>
  autoplot(adj_GDP) + guides(col="none") +
  labs(title = "GDP per capita over time", 
       x = "Year",
       y = "GDP per capita")

The plot has too many countries to differentiate each from each other effectively. I will create another plot of the 10 countries with the highest GDP.

top_GDP <- global_economy |>
  mutate(adj_GDP = GDP / Population) |>
  filter(!is.na(adj_GDP)) |>
  group_by(Country) |>
  filter(adj_GDP == max(adj_GDP)) |>
  ungroup() |>
  slice_max(adj_GDP, n=10) |>
  pull(Country)

global_economy |>
  mutate(adj_GDP = GDP/Population) |>
  filter(Country %in% top_GDP) |> 
  autoplot(adj_GDP) +
  labs(title = "Top 10 countries GDP per capita", 
       x = "Year",
       y = "GDP per capita")

## Warning: Removed 133 rows containing missing values or values outside the scale range
## (`geom_line()`).

We can see that Monaco and Liechtenstein have the highest GDP with Monaco historically higher. However, Liechtenstein has data from 1960 while Monaco has data starting from ~1970 and would have the highest GDP if Monaco was not added to the mix.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

US GDP

global_economy |>
  filter(Code == "USA") |>
  autoplot(GDP) + 
  labs(title = "US GDP")

global_economy |> 
  filter(Code == "USA") |>
  mutate(adj_gdp = GDP/Population) |>
  autoplot(adj_gdp) +
  labs(title = "US GDP per capita", x = "Year", y = "GDP per capita")

It seemed appropriate to transform the GDP to GDP per capita but the shape of the data looks nearly identical to the original plot.

Victorian livestock

aus_livestock |>
  filter(Animal == "Bulls, bullocks and steers") |>
  filter(State == "Victoria") |>
  autoplot(Count) + labs(title = "Victoria Bulls, Bullocks, and Steers Slaughtered")

as_tibble(aus_livestock) |>
  filter(Animal== "Bulls, bullocks and steers") |>
  group_by(Month) |>
  mutate(total_perc = Count/sum(Count)) |>
  filter(State == "Victoria") |>
  ungroup() |>
  as_tsibble(index =Month ) |>
  autoplot(total_perc) +
  labs(title = "Share of Victorian Bulls, bullocks and steers slaughered", 
       x= "Date", 
       y= "Percent of total")

The bulls, bullocks, and steers data was transformed to reflect the percentage of total bulls, bullocks, and steers slaughtered in Victoria vs Australia. The shape of the plots are similar, showing a an overall trend down but the transformed plot is less sharp in regards to change in direction.

Victoria electricity

vic_elec |>
  autoplot(Demand) +
  labs(title = "Victorian electricity demand")

I don’t think this data requires transformation so I will keep the original plot.

Gas production

aus_production |>
  select(Quarter, Gas) |>
  autoplot(Gas) + labs(title = "Australian gas production")

gas_data <-aus_production |>
  select(Quarter, Gas)

lambda <-gas_data |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)

gas_data <- gas_data |>
  mutate(bc = box_cox(Gas, lambda))

gas_data |>
  autoplot(bc) +
  labs(title = "Box cox transformed AUS gas production")

After a box cox transformation, the overall shape of the data is similar to the original plot but it helped to stabilize the data.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

autoplot(canadian_gas, Volume)

A box-cox transformation is unhelpful because the data variation is relatively consistent and the result will be similar to the original plot.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

# Code from section 2.10
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

# filter data
myseries <-myseries |>
  select(Month, Turnover)

# Find lambda value
lambda1 <-myseries |>
  features(Turnover,features = guerrero) |>
  pull(lambda_guerrero)

print(lambda1)

## [1] 0.08303631

# Box cox transformation
myseries_bc <- myseries |>
  mutate(bc = box_cox(Turnover, lambda1))

# Plot data
myseries_bc |> 
  autoplot(Turnover) +
  labs(title = "Original data")

myseries_bc |>
  autoplot(bc) +
  labs(title ="Transformed data")

I prefer the 2nd transformed plot since it seems a bit more stable. The spike around 1995 is still prominent but the overall data seems more normalized.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian

Tobacco from aus_production

# Filter data to time and tobacco
tobacco_df <- aus_production |>
  select(Quarter, Tobacco)

# Find lambda value
lambda2 <- tobacco_df |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)
print(lambda2)

## [1] 0.9264636

# Box-cox transform data
tobacco_df <- tobacco_df |>
  mutate(bc = box_cox(Tobacco, lambda2))

# Plot 
tobacco_df |>
  autoplot(Tobacco) +
  labs (title = "Original data")

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

tobacco_df |>
  autoplot(bc) +
  labs (title = "Transformed data")

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Although lambda is close to 1 and data has been transformed using box-cox, the plot is very similar to the original.

Economy class passengers between Melbourne and Sydney from ansett

# filter data
passenger_df <- ansett |>
  filter(Class == "Economy") |>
  filter(Airports == "MEL-SYD") |>
  select(Week, Passengers)

# lambda value
lambda3 <- passenger_df |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)
print(lambda3)

## [1] 1.999927

# box-cox transformation
passenger_df <- passenger_df |>
  mutate(bc = box_cox(Passengers, lambda3))

# plot 
passenger_df |>
  autoplot(Passengers) +
  labs(Title = "Original")

passenger_df |>
  autoplot(bc) +
  labs(Title = "Transformed")

The lambda is 1.999 however there seems to be a large outlier right before 1990.

Pedestrian counts at Southern Cross Station from pedestrian

# filter data
pedestrian_df <- pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  select(Date_Time, Count)

# lambda value
lambda4 <- pedestrian_df |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)
print(lambda4)

## [1] -0.2501616

# box cox transformation 
pedestrian_df <- pedestrian_df |>
  mutate(bc = box_cox(Count, lambda4))

# plot
pedestrian_df |>
  autoplot(Count) + 
  labs(title= "Original")

pedestrian_df |>
  autoplot(bc) +
  labs(title = "Transformed")

The lambda for this data set is around -0.25.

3.7

Consider the last five years of the Gas data from aus_production.

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas <- tail(aus_production, 5*4) |> select(Gas)
gas |> 
  autoplot(Gas)

There are seasonal fluctuations with an upward trend. There is an increase of gas between Q1 and Q3. There are decreases during Q3 and Q4.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices

gas |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  autoplot()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

### c. Do the results support the graphical interpretation from part a? Yes, the results seem to support the plot in a since we can see the clear seasonal patterns with the increase in Q1 and Q2 and decrease in Q3 and Q4.

d. Compute and plot the seasonally adjusted data.

gas |> 
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  as_tsibble() |>
  autoplot(Gas) +
  geom_line(aes(y=season_adjust))

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_adj <- gas
gas_adj$Gas[5] <-gas_adj$Gas[5] + 100

gas_adj |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  as_tsibble() |>
  autoplot(Gas) +
  geom_line(aes(y=season_adjust, 
                color = "red"))

The outlier seems to impact the seasonally adjusted data by slightly pushing the values up. For example, at 2009 Q1, the gas seems to be around 225 in the original plot but with the outlier, the value is over 225.

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_adj2 <- gas
gas_adj2$Gas[10] <-gas_adj$Gas[10] + 100

gas_adj2 |>
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  as_tsibble() |>
  autoplot(Gas) +
  geom_line(aes(y=season_adjust, 
                color = "red"))

Since my original outlier plot had the outlier closer to the end, I moved it closer to the middle in this version. It does not appear to make a big difference where the outlier is.

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

# Code from section 2.10
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

# X-11 
myseries |>
  select(Month, Turnover) |>
  model(X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components() |>
  autoplot()

This plot doesn’t appear to show anything new but does confirm and help make the patterns more clean. There is a clear upward trend and there is a somewhat seasonal component to the data.

3.9

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The value plot shows the original data and presents a clear trend upward.

The trend plot has the same scale as the value plot, and appears to be more clearly increasing over time.

The season_year plot shows that there is a seasonal component to the data, but the scale is much smaller compared to the previous graphs and suggests that seasonality does not result in significant variation. This plot also shows the seasonal impact becoming slightly more pronounced in more recent years but the impact will be small since the scale is so small.

The remainder plot shows the variation not accounted for by the trend or seasonal elements. The plot does not appear to show substantial variation but it does have an outlier around 1991. The scale here is different from the other plots as well.

b.Is the recession of 1991/1992 visible in the estimated components?

Yes the recesssion is clearly visible in the remainder plot as a major outlier and less clearly in the value and trend plots where the upward trend becomes less steep and flattens slightly.