DATA 624 HW2

suppressPackageStartupMessages(library(fpp2))
suppressPackageStartupMessages(library(fpp3))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(feasts))
suppressPackageStartupMessages(library(seasonal))
suppressPackageStartupMessages(library(latex2exp))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tibble))
suppressPackageStartupMessages(library(tsibble))
suppressPackageStartupMessages(library(tsibbledata))

Home Work #2

Question 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy %>%
  tsibble(key = Code, index = Year)%>%
  autoplot(GDP/Population, show.legend =  FALSE) +
  labs(title= "GDP per capita",
       y = "US Dollars")

## Warning: Removed 3242 row(s) containing missing values (geom_path).

global_economy %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
  select(Country, GDP_per_capita)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country GDP_per_capita  Year
##   <fct>            <dbl> <dbl>
## 1 Monaco         185153.  2014

global_economy %>%
  filter(Country == "Monaco") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita for Monaco", y = "US Dollars")

## Warning: Removed 11 row(s) containing missing values (geom_path).

global_economy %>%
  filter(Country == "Liechtenstein") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita for Liechtenstein", y = "US Dollars")

## Warning: Removed 11 row(s) containing missing values (geom_path).

According to the data, The current highest GPD per capita is Liechtenstein. Liechtenstein recently passed Monaco, which is probably the smallest in area but has held the title of highest GDP per capita for the longest period.

Question 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP)+
  labs(title = "United States GDP", y = "US Dollars")

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") %>%
  autoplot(Count)+
  labs(title = "Slaughter of Victorian “Bulls, bullocks and steers”")

Observed downward trend

vic_elec %>%
  autoplot(Demand)+
  labs(title = "Victorian Electricity Demand",
       y = "Demand")

Observed seasonality

aus_production %>%
  autoplot(Gas)+
  labs(title = "Gas production")

Question 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume) +
  labs(title = "Non-Transformed Gas Production")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Gas Production with $\\lambda$ = ",
         round(lambda,2))))

The Box-Cox transformation doesn’t have much of an effect on the canadian_gas because the data variation is not consistent through the dataset

Question 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(1111)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

autoplot(myseries, Turnover)+
  labs(title = "Retail Turnover", y = "Million")

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Retail Turnover with $\\lambda$ = ",
         round(lambda,2))))

I would select transformation with \(\\lambda\) since it enhanced the seasonality and readability

Question 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

lambda_tobacco <- aus_production %>%
                   features(Tobacco, features = guerrero) %>%
                   pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Tobacco, lambda_tobacco)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda_tobacco,2))))

## Warning: Removed 24 row(s) containing missing values (geom_path).

lambda_class <- ansett %>%
                 filter(Class == "Economy",
                        Airports == "MEL-SYD")%>%
                 features(Passengers, features = guerrero) %>%
                 pull(lambda_guerrero)
ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")%>%
  mutate(Passengers = Passengers/1000) %>%
  autoplot(box_cox(Passengers, lambda = lambda_class)) +
  labs(y = "Passengers ('000)",
       title = latex2exp::TeX(paste0(
         "Transformed Ansett Airlines Economy Class with $\\lambda$ = ",
         round(lambda_class,2))),
       subtitle = "Melbourne-Sydney")

lambda_count <- pedestrian %>%
                filter(Sensor == "Southern Cross Station") %>%
                 features(Count, features = guerrero) %>%
                 pull(lambda_guerrero)
pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(box_cox(Count,lambda_count))+
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Pedestrian Counts at Southern Cross Station with $\\lambda$ = ",
         round(lambda_count,2))))

Question 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?

(A)

gas %>%
  autoplot()+
  labs(title = "Last Five Years of The Gas Data")

## Plot variable not specified, automatically selected `.vars = Gas`

There’s an upward trend and annual seasonality

(B)

gas %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components() %>%
  autoplot() + 
  ggtitle("Last Five Years of The Gas Data")

## Warning: Removed 2 row(s) containing missing values (geom_path).

(C)

The results do support the graphical interpretation from part a - we see the annual seasonality and the upward trend

(D)

gas_decom <- gas %>%
             model(classical_decomposition(Gas,type = "multiplicative")) %>%
             components()
gas_decom %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "",
       title = "Last Five Years of The Gas Data") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 row(s) containing missing values (geom_path).

(E)

new_gas <- gas
new_gas$Gas[10] <- new_gas$Gas[10]+300

new_gas %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components() %>%
  autoplot() + 
  ggtitle("Last Five Years of The Gas Data with 300 added to the 10th observation")

## Warning: Removed 2 row(s) containing missing values (geom_path).

new_gas %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components() %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "",
       title = "Last Five Years of The Gas Data with 300 added to the 10th observation") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 row(s) containing missing values (geom_path).

The +300 caused a spike in the seasonally adjusted data. +300 to the 10th observation had little impact on the seasonal component.

(F)

new_gas2 %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components() %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "",
       title = "Last Five Years of The Gas Data with 300 added to the last observation")

## Error in model(., classical_decomposition(Gas, type = "multiplicative")): object 'new_gas2' not found

+300 to the last entry causes a spike on the seasonally adjusted data. The seasonal data is less affected - little to no difference if the outlier is near the end or in the middle.

Question 3.8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  model(classical_decomposition(Turnover,type = "multiplicative")) %>%
  components() %>%
  autoplot() + 
  ggtitle("Multiplicative decomposition of my retail time series data")

## Warning: Removed 6 row(s) containing missing values (geom_path).

myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components() %>%
  autoplot()+
  labs(title = "X-11 decomposition of my retail time series data")

Decomposing the series using X-11 does capture more noise and removes smoothing which allows us to see more outliers by making them more pronounced

Question 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

(image)

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The results show a steady increase in the civilian labor force over each month from February 1978 to August 1995. A great deal of seasonality is also observed (scale is much smaller than the remainder) - Seasonality is not as important in the labor force data. The recession is also observed.

Is the recession of 1991/1992 visible in the estimated components?

Isolating the trend component from the seasonal component shows that the trend has increased through the majority of the time frame. The monthly breakdown of the seasonal component shows that a few months show greater velocities in their variations than other months.

Yes - Theres a dip in employment during 1991/1992 not explained by seasonality or the positive trend.