DATA 624 Homework 2: Time Series Decomposition

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.2     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.2
## ✔ lubridate   1.9.2     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## Warning: package 'tsibble' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(latex2exp)
library(dplyr)
library(ggplot2)

#1. Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy <- global_economy %>%
  mutate("gdp_per_capita" = GDP/ Population)

global_economy %>%
  autoplot(gdp_per_capita) +
  labs(title= "GDP per capita", y = "$US") +
  theme(legend.position = "none")

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

In the most recently recorded year, the country with the highest GDP per capita was Liechtenstein. However, prior to 2014, it had consistently been Monaco.

#2. For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title= "GDP", y = "$US")

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita", y = "$US")

I transformed the data by dividing by the population to show the GDP per capita. This transformation created a very similar graph, only slightly more linear, and the scale decreased significantly.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers") %>%
  filter(State == "Victoria") %>%
  autoplot(Count) +
  labs(title= "Victorian bulls slaughter by year", y = "Count")

I don’t think a transformation is necessary for the quantities in this time series.

Victorian Electricity Demand from vic_elec.

vic_elec %>%
  autoplot(Demand) +
  labs(title= "Victorian electricity demand", y = "Demand")

vic_elec %>%
  group_by(Date) %>%
  summarize(average_demand = mean(Demand)) %>%
  ggplot(aes(x = Date, y = average_demand)) + 
  geom_line() + 
  labs(title = "Victorial Electricity Demand (Daily Average)",
       y = "Average Demand",
       x = "Date") +
  theme_minimal()

I tried to do a calendar adjustment by computing the daily average demand since the original graph was rather busy, but it did not significantly change the visualization.

Gas production from aus_production.

aus_production %>%
  autoplot(Gas)

lambda_1 <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)
aus_production |>
  autoplot(box_cox(Gas, lambda_1)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda_1,2))))

Since the variation in the gas production increases over time, I used a Box Cox transformation, which stabilized the variance.

#3. Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume)

lambda_2 <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)
canadian_gas |>
  autoplot(box_cox(Volume, lambda_2)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Canadian gas production with $\\lambda$ = ",
         round(lambda_2,2))))

A Box-Cox transformation is unhelpful for the Canadian gas data, because the variance is not increasing or decreasing consistently over time, it increases and then decreases again. We can see that a Box-Cox transformation in this case makes the scale smaller, but otherwise leaves the shape of the graph relatively unchanged.

#4. What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(1989)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries, Turnover)

lambda_3 <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)
myseries |>
  autoplot(box_cox(Turnover, lambda_3)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed retail turnover with $\\lambda$ = ",
         round(lambda_3,2))))

I used the Guerrero feature to choose the ideal value of lambda for the Box-Cox transformation, in this case, -0.06.

#5. For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production

autoplot(aus_production, Tobacco)

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda_4 <- aus_production |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)
aus_production |>
  autoplot(box_cox(Tobacco, lambda_4)) +
  labs(y = "Tobacco",
       title = latex2exp::TeX(paste0(
         "Transformed tobacco production with $\\lambda$ = ",
         round(lambda_4,2))))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Economy class passengers between Melbourne and Sydney from ansett

economy_mel_syd <- ansett %>%
  filter(Class == "Economy") %>%
  filter(Airports == "MEL-SYD")

autoplot(economy_mel_syd, Passengers)

lambda_5 <- economy_mel_syd |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)
economy_mel_syd |>
  autoplot(box_cox(Passengers, lambda_5)) +
  labs(y = "Passengers",
       title = latex2exp::TeX(paste0(
         "Transformed economy passengers between Melbourne and Sydney with $\\lambda$ = ",
         round(lambda_5,2))))

Pedestrian counts at Southern Cross Station from pedestrian

ped_s_cross_station <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")

autoplot(ped_s_cross_station, Count)

lambda_6 <- ped_s_cross_station |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)
ped_s_cross_station |>
  autoplot(box_cox(Count, lambda_6)) +
  labs(y = "Count",
       title = latex2exp::TeX(paste0(
         "Transformed pedestrian counts at Southern Cross Station with $\\lambda$ = ",
         round(lambda_6,2))))

#7. Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

a.) Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(gas, Gas)

There is a seasonal fluctuation that peaks in Q3 and dips in Q1. There is also a trend cycle; the amount of gas produced is increasing over the five years shown.

b.) Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_classical_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_classical_decomp)

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##    .model                      Quarter   Gas trend seasonal random season_adjust
##    <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposition(G… 2005 Q3   221   NA     1.13  NA              196.
##  2 "classical_decomposition(G… 2005 Q4   180   NA     0.925 NA              195.
##  3 "classical_decomposition(G… 2006 Q1   171  200.    0.875  0.974          195.
##  4 "classical_decomposition(G… 2006 Q2   224  204.    1.07   1.02           209.
##  5 "classical_decomposition(G… 2006 Q3   233  207     1.13   1.00           207.
##  6 "classical_decomposition(G… 2006 Q4   192  210.    0.925  0.987          208.
##  7 "classical_decomposition(G… 2007 Q1   187  213     0.875  1.00           214.
##  8 "classical_decomposition(G… 2007 Q2   234  216.    1.07   1.01           218.
##  9 "classical_decomposition(G… 2007 Q3   245  219.    1.13   0.996          218.
## 10 "classical_decomposition(G… 2007 Q4   205  219.    0.925  1.01           222.
## 11 "classical_decomposition(G… 2008 Q1   194  219.    0.875  1.01           222.
## 12 "classical_decomposition(G… 2008 Q2   229  219     1.07   0.974          213.
## 13 "classical_decomposition(G… 2008 Q3   249  219     1.13   1.01           221.
## 14 "classical_decomposition(G… 2008 Q4   203  220.    0.925  0.996          219.
## 15 "classical_decomposition(G… 2009 Q1   196  222.    0.875  1.01           224.
## 16 "classical_decomposition(G… 2009 Q2   238  223.    1.07   0.993          222.
## 17 "classical_decomposition(G… 2009 Q3   252  225.    1.13   0.994          224.
## 18 "classical_decomposition(G… 2009 Q4   210  226     0.925  1.00           227.
## 19 "classical_decomposition(G… 2010 Q1   205   NA     0.875 NA              234.
## 20 "classical_decomposition(G… 2010 Q2   236   NA     1.07  NA              220.

gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>
  autoplot() +
  labs(title = "Classical multiplicative decomposition of total
                  Australian gas production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

c.) Do the results support the graphical interpretation from part a?

Yes, we can see from the classical decomposition that the trend component is increasing over time, and there is a clear seasonal component as well, as in the graphical interpretation from part a).

d.) Compute and plot the seasonally adjusted data.

components(gas_classical_decomp) |>
  as_tsibble() |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas",
       title = "Australian gas production (seasonally adjusted)")

e.) Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas[3, "Gas"] <- 471

gas_classical_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_classical_decomp) |>
  as_tsibble() |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas",
       title = "Australian gas production (seasonally adjusted with outlier)")

The outlier significantly affects the overall trend in its vicinity.

f.) Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas[3, "Gas"] <- 171
gas[10, "Gas"] <- 505

gas_classical_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_classical_decomp) |>
  as_tsibble() |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas",
       title = "Australian gas production (seasonally adjusted with outlier)")

gas[10, "Gas"] <- 205
gas[18, "Gas"] <- 510

gas_classical_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_classical_decomp) |>
  as_tsibble() |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas",
       title = "Australian gas production (seasonally adjusted with outlier)")

No, the effect is similar whether the outlier is place at the low end, the high end, or the center of the data.

#8. Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

x11_dcmp <- myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of total US retail turnover using X-11.")

The decomposition does not reveal any apparent outliers. It reveals an upward trend with a clear seasonal component.

#9. Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

knitr::include_graphics("/Users/mollysiebecker/Desktop/Screenshot 2024-09-17 at 4.32.32 PM.png")

knitr::include_graphics("/Users/mollysiebecker/Desktop/Screenshot 2024-09-17 at 4.32.45 PM.png")

a.) Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

We can see from the decomposition that there is a clear upward trend from 1978 to 1995. The trend accounts for the most variation in the time series, given the scale on the left. There is also seasonal variation shown by the clear cycle visible on the season_year graph, although this is on a much smaller scale. In most years, the remainder accounts for a small amount of variation, similar to the seasonal variation, except for in 1991/1992, when there was a significant drop in the number of persons in the civilian labor force that could not be accounted for by the trend or the seasonality. The seasonality, which, again, is not very significant, does not follow a pattern that could be explained by someone without specialized knowledge of the Australian civilian labor force. It seems like there is an uptick in the spring, leading to a decrease in summer with sudden increases in September and December that don’t fit in with the overall direction of the seasonal variation.

b.) Is the recession of 1991/1992 visible in the estimated components?

Yes, as previously stated there is a significant dip in 1991/1992 in the remainder component, which is due to the recession, since this dip could not be accounted for by the overall trend or by the seasonality. On the trend line, you can see that the overall increase slows at 1991/1992.

DATA 624 Homework 2: Time Series Decomposition

Molly Siebecker

2024-09-17