DATA 624 Homework 2

library(fpp3)
library(tidyverse)
library(readxl)

Question 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time.

gdp_pc <- global_economy %>%
  mutate(GDP_per_capita = GDP / Population)
gdp_pc %>%
  autoplot(GDP_per_capita, show.legend =  FALSE) +
  labs(title = "GDP per Capita Over Time (1960–2017)",
       y = "GDP per Capita (USD)",
       x = "Year")

Which country has the highest GDP per capita? How has this changed over time?

latest_year <- max(global_economy$Year)

highest_gdp_pc <- global_economy %>%
  filter(Year == latest_year) %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  arrange(desc(GDP_per_capita))
highest_gdp_pc %>%
  select(Country, Year, GDP_per_capita) %>%
  slice_head(n = 10)
## # A tsibble: 10 x 3 [1Y]
## # Key:       Country [10]
##    Country           Year GDP_per_capita
##    <fct>            <dbl>          <dbl>
##  1 Luxembourg        2017        104103.
##  2 Macao SAR, China  2017         80893.
##  3 Switzerland       2017         80190.
##  4 Norway            2017         75505.
##  5 Iceland           2017         70057.
##  6 Ireland           2017         69331.
##  7 Qatar             2017         63249.
##  8 United States     2017         59532.
##  9 North America     2017         58070.
## 10 Singapore         2017         57714.

The most recent year in the dataset is 2017 and the country with the highest gdp in 2017 was Luxemborg.

How has this changed over time?

top_country <- highest_gdp_pc$Country[1]

gdp_pc %>%
  filter(Country == top_country) %>%
  autoplot(GDP_per_capita) +
  labs(title = paste("GDP per Capita Over Time:", top_country),
       y = "GDP per Capita (USD)",
       x = "Year")

The country’s gdp has been steadily increasing over time with very significant increase in the 2000 onwards.

Question 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

us_gdp <- global_economy %>%
  filter(Country == "United States")

us_gdp %>%
  autoplot(GDP) +
  labs(title = "United States GDP", y = "GDP (USD billions)", x = "Year")

us_gdp %>%
  mutate(log_GDP = log(GDP)) %>%
  autoplot(log_GDP) +
  labs(title = "Log-transformed United States GDP", y = "log(GDP)", x = "Year")

The orginal graph shows exponential growth. The log transformation makes it a bit more linear and easier to model with simple linear regression or ARIMA models.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

vic_cattle <- aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria")

vic_cattle %>%
  autoplot(Count) +
  labs(title = "Victorian Cattle Slaughter", y = "Count", x = "Year")

Victorian Electricity Demand from vic_elec.

vic_elec %>%
  autoplot(Demand) +
  labs(title = "Victorian Electricity Demand", y = "Megawatts")

vic_elec %>%
  autoplot(log(Demand)) +
  labs(title = "Log of Victorian Electricity Demand", y = "log(Megawatts)")

The electricity demand shows season pattern where Summer and Winter has peaks. A log transformation seems to slightly stabilize variance which should make the seasonal effects easier to compare across time.

Gas production from aus_production.

gas <- aus_production %>%
  select(Quarter, Gas)

autoplot(gas) +
  labs(title = "Australian Gas Production", y = "Petajoules")
## Plot variable not specified, automatically selected `.vars = Gas`

autoplot(gas, log(Gas)) +
  labs(title = "Log of Australian Gas Production", y = "log(Petajoules)")

The log transform helps stabilize the variance and makes the seasonal fluctuation approximately constant in relative size.

Question 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot() +
  labs(title = "Canadian Gas Production", y = "Volume", x = "Year")
## Plot variable not specified, automatically selected `.vars = Volume`

Box-Cox transformation is unnecessary because there is no exponential growth and is steady. There is also strong seasonality with no visible heteroscedasctity. A Box-Cox transformation may compressthe values unnecessarily.

Question 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

retail_data <- aus_retail %>%
  filter(
    State == "New South Wales",
    Industry == "Department stores"
  )

retail_data %>%
  features(Turnover, features = guerrero)
## # A tibble: 1 × 3
##   State           Industry          lambda_guerrero
##   <chr>           <chr>                       <dbl>
## 1 New South Wales Department stores           0.219
lambda <- 0.2193207

retail_data %>%
  mutate(Transformed = box_cox(Turnover, lambda)) %>%
  autoplot(Transformed) +
  labs(title = paste("Box-Cox Transformed Retail Turnover (lambda =", round(lambda, 2), ")"),
       y = "Transformed Turnover", x = "Month")

For the New South Wales (random selected), the optimal Box-Cox lambda estimated using the guerrero method is approximately lambda = 0.219. This indicates that a Box-Cox transformatin is appropraite to stabilize variance, and improving the performance of some models like ARIMA or ETS.

Question 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production

tobacco_data <- aus_production %>%
  select(Quarter, Tobacco)

tobacco_data %>%
  features(Tobacco, features = guerrero)
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.926

The guerrero function picked a lambda of 0.93

Economy from ansett

ansett_data <- ansett %>%
  filter(Airports == "MEL-SYD", Class == "Economy")

ansett_data %>%
  features(Passengers, features = guerrero)
## # A tibble: 1 × 3
##   Airports Class   lambda_guerrero
##   <chr>    <chr>             <dbl>
## 1 MEL-SYD  Economy            2.00

The guerrero function chose lambda of 1.99

Pedestrian from pedestrian

pedestrian_data <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  index_by(Date = date(Date_Time)) %>%
  summarise(Count = sum(Count, na.rm = TRUE))

pedestrian_data %>%
  features(Count, features = guerrero)
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.273

The guerrero function picked a lambda of 0.27

Question 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas <- tail(aus_production, 5 * 4) %>%
  select(Gas)

gas %>%
  autoplot(Gas) +
  labs(title = "Australian Gas Production (Last 5 Years)",
       y = "Million Cubic Metres", x = "Year")

There is clear seasonla pattern in the data with dips at Q1 and peaks at Q3. There is a slight upwards trand where the dips and peaks are lower in 2006 then they are in 2009.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data.

gas_decomp <- gas %>%
  model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

gas_decomp %>%
  autoplot() +
  labs(title = "Classical Multiplicative Decomposition of Gas Production",
       y = "Million Cubic Metres")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

Yes the results do support the graphical interpretation of part a.

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_outlier_mid <- gas %>%
  mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))

gas_decomp_mid <- gas_outlier_mid %>%
  model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

gas_decomp_mid %>%
  autoplot(season_adjust) +
  labs(title = "Seasonally Adjusted Gas Production with Outlier in Middle",
       y = "Million Cubic Metres (Adjusted)")

gas_outlier_end <- gas %>%
  mutate(Gas = if_else(row_number() == n(), Gas + 300, Gas))

gas_decomp_end <- gas_outlier_end %>%
  model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

gas_decomp_end %>%
  autoplot(season_adjust) +
  labs(title = "Seasonally Adjusted Gas Production with Outlier Near End",
       y = "Million Cubic Metres (Adjusted)")

Interpretation of the effects of the Outlier

The middle outlier creates a large spike in the data around the middle of the plot whereas the end outlier create a large spike in the end of the plot. These spikes The classical decomposition algorithm tries to fit a smooth trend and season pattern, which distorts the trend-cycle estimate. However in the end outlier it has less impact on the overall trend and seasonal pattern. It seems that decomposition is more robust for outlier in the end then it is for ones in the middle of the series.

Question 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

retail_data <- aus_retail %>%
  filter(State == "New South Wales",
         Industry == "Department stores")

X11 was not working I tried fixing it and it didnt work. Apparently I needed to download something which I couldn’t. So I just tried another STL.

retail_decomp <- retail_data %>%
  model(stl = STL(Turnover))

components(retail_decomp) %>%
  autoplot()

There is a slight upwards trend and clear seasonal pattern. I dont see any clear outliers except maybe in Febuary 2000.

Question 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The decomposition shows that the number of persons in the Australian civilian labor force increased steadily from 1978 to 1995, with the trend component rising from 6500 to 9000. The seasonal component shows strong within-year patterns, with alternating dip and peaks with dips in January. The seasonal fluctuations are relatively small compared to the overall trend. The remainder component captures irregular fluctuations, with noticeable deviations around 1991-92, reflecting economic disruptions. Overall, the main drivers are the long-term growth trend and stable recurring seasonal patterns.

Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession of 1991/1992 is visible in the decomposition. In the trend component, growth temporarily flattens and slighlty dips during that period. In the remainder component, there are large negative deviations around 1991-92, indicating that the actual labor force was lower than expected from the trend and seasonality. These features show the economic downturn.