Data-624 Homework 2

#Loading required R package and datasets:
library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.2

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.4
## ✔ dplyr       1.1.3     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.3     ✔ fable       0.3.3
## ✔ ggplot2     3.4.4     ✔ fabletools  0.3.4

## Warning: package 'tsibble' was built under R version 4.3.2

## Warning: package 'tsibbledata' was built under R version 4.3.2

## Warning: package 'feasts' was built under R version 4.3.2

## Warning: package 'fabletools' was built under R version 4.3.2

## Warning: package 'fable' was built under R version 4.3.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(ggplot2)
library(seasonal)

## Warning: package 'seasonal' was built under R version 4.3.2

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

data("global_economy")

Question 3.1:

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Answer:

Plotting the GDP per capita:

global_economy %>%
  autoplot(GDP / Population, show.legend =  FALSE) +
  labs(title= "GDP per capita", y = "$US")

## Warning: Removed 3242 rows containing missing values (`geom_line()`).

Calculating highest GDP per capita country-

global_economy %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
  select(Country, GDP_per_capita)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country GDP_per_capita  Year
##   <fct>            <dbl> <dbl>
## 1 Monaco         185153.  2014

We can see that Monaco has the highest GDP per capita, with the value of $185153 in the year 2014.

Finding the changes over the time-

global_economy %>%
  filter(Country == "Monaco") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita for Monaco", y = "$US")

## Warning: Removed 11 rows containing missing values (`geom_line()`).

The chart shows detailed trend over the time of Monaco’s GDP per capita. It starts with a low value in the 1960s and shows a significant increase over the decades, with some fluctuations. There is remarkable dip and recovery around the late 1990s and early 2000s, and the chart ends with an upward trend.

Question 3.2:

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

Answer:

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP / 10 ^ 12) +
  labs(title= "GDP, United States", y = "$US (in trillions)")

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") %>%
  autoplot(Count) +
  labs(title= "Slaughter of Victoria Bulls, Bullocks, and Steers")

The chart provided appears to depict the count of bulls, bullocks, and steers slaughtered over time in Victoria. There are cyclical patterns that appear to repeat approximately every 10 years, where the count rises and falls in a somewhat regular pattern. The highest peaks seem to occur in the late 1970s and early 1980s, with counts reaching up to approximately 125,000, and in the most recent years on the graph, there is a decline that appears to be steeper than previous periods.

v <- vic_elec %>%
  group_by(Date) %>%
  mutate(Demand = sum(Demand)) %>%
  distinct(Date, Demand)

v %>% 
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Daily Victorian Electricity Demand", y = "$US (in trillions)")

v %>%
  mutate(Date = yearmonth(Date)) %>%
  group_by(Date) %>%
  summarise(Demand = sum(Demand)) %>%
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Monthly Victorian Electricity Demand", y = "$US (in trillions)")

Electricity demand is highly variable on a day-to-day basis but follows a more predictable seasonal pattern on a monthly basis. There’s no evident long-term trend towards increasing or decreasing demand over the years provided, suggesting that any growth in demand due to population or economic growth might be balanced by improvements in energy efficiency or other factors.

aus_production %>%
  autoplot(Gas) +
  labs(title = "Non-Transformed Gas Production")

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "", title = paste0("Transformed Gas Production with lambda = ",
         round(lambda,2)))

When data is transformed using the Box-Cox power transformation, the variance is reduced and the data becomes more appropriate for regression modeling. It is clear that while the transformation appears to make the variance more uniform over time and the cyclical pattern more regular, it has no effect on the overall upward trend in production.

Question 3.3:

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Answer:

canadian_gas %>%
  autoplot(Volume) +
  labs(title = "Non-Transformed Gas Production")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "", title = paste0("Transformed Gas Production with lambda = ",
         round(lambda,2)))

The seasonal variance is not consistent by the Box-Cox transformation, which renders it ineffective. This could be because Australian gas output only had an increase in variation, but there was an increase in variation around 1978 and a drop around 1989. The seasonal peaks and troughs are still visible and have not been smoothed out, suggesting that modeling the altered data will not be any easier.

Question 3.4:

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Answer:

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

autoplot(myseries, Turnover)+
  labs(title = "Retail Turnover", y = "$AUD (in millions)")

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "", title = paste0("Transformed Retail Turnover with lambda = ",
         round(lambda,2)))

Box-Cox transformation with lambda = 0.27 appears to be somewhat effective, as it has made the variance of the data look more constant over time.

Question 3.5:

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Answer:

Box-Cox transformation for Tobacco from aus_production:

autoplot(aus_production, Tobacco)+
  labs(title = "Tobacco and Cigarette Production in Tonnes")

## Warning: Removed 24 rows containing missing values (`geom_line()`).

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "", title = paste0("Transformed Tobacco Production with lambda = ",
         round(lambda,2)))

## Warning: Removed 24 rows containing missing values (`geom_line()`).

Since lambda is close to 1 here, the transformed data is mostly just shifted downwards with little change in the shape of the time series. The Box-Cox transformation is not effective on the tobacco production data.

Box-Cox transformation for Economy class passengers between Melbourne and Sydney from ansett:

mel_syd <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")

autoplot(mel_syd, Passengers)+
  labs(title = "Economy class Passengers Between Melbourne and Sydney")

lambda <- mel_syd %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

mel_syd %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "", title = paste0("Transformed Number of Passengers with lambda = ",
         round(lambda,2)))

With a lambda of 2, it is essentially a transformation of Y2 or Passengers2. It shows the variation a little more clear.

Box-Cox transformation for Pedestrian counts at Southern Cross Station from pedestrian:

southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") 

southern_cross <- southern_cross %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count))

autoplot(southern_cross, Count)+
  labs(title = "Weekly Pedestrian Counts at Southern Cross Station")

lambda <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = paste0("Transformed Weekly Pedestrian Counts with lambda = ",
         round(lambda,2)))

Question 3.7:

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Answer:

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas <- tail(aus_production, 5*4) |> select(Gas)

gas %>% autoplot(Gas)

There is a clear seasonal pattern, with production peaking and troughing regularly every year. From 2006 to the peak around 2008, there is an upward trend, indicating that the level of gas production is increasing over time. After 2008, there appears to be a downward trend, indicating a decrease in production levels.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

class_decomp <- gas %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components()
class_decomp %>% autoplot() +
  labs(title = "Classical multiplicative decomposition of Australia
                  Gas Production")

## Warning: Removed 2 rows containing missing values (`geom_line()`).

Do the results support the graphical interpretation from part a?

Yes, as there is upward trend and the seasonality in the decomposition components, the results support the graphical interpretation from part a.

Compute and plot the seasonally adjusted data.

as_tsibble(class_decomp) %>%
  autoplot(season_adjust) +
  labs(title = "Seasonally Adjusted Data")

It appears that there is much less volatility in the seasonally adjusted data, as we would anticipate.

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas %>%
  mutate(Gas = if_else(Quarter==yearquarter("2007Q2"), Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(season_adjust) +
  labs(title = 'Seasonally Adjusted Data with 300 added to "2007 Q2"')

Seasonally adjusted data is altered in a way that essentially restores seasonality in the opposite direction as the outlier when an observation with 300 added to it becomes an outlier. Since this anomaly was included in the seasonally adjusted data, I think the data is sort of balancing itself.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas %>%
  mutate(Gas = if_else(Quarter==yearquarter("2010Q1"), Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(season_adjust) +
  labs(title = 'Seasonally Adjusted Data with 300 added to "2010 Q1"')

Seasonally adjusted data show no seasonality compared to the prior plot when the outlier is included at the conclusion of the data. This can be because the outlier is located in a region of the data where it is impossible to estimate the pattern.

Question 3.8:

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Answer:

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Australian retail data using X-11.")

Previously seasonality was significantly greater or more variable from 1982 to around 1990. Additionally, I’ve noticed a significant increase around the middle of the year 2000, which may be an anomaly in retail data as higher turnover is typically seen around the end of the year.

Question 3.9:

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Is the recession of 1991/1992 visible in the estimated components?

Answer:

A time series is broken down into three parts using the STL decomposition charts: trend, seasonal, and remainder. The original data is shown in the ‘value’ graph, which shows an overall increasing trend over time with some seasonal swings. The time series has increased with time, as seen by the ‘trend’ graph, suggesting that there has been a positive underlying trend throughout the duration. The seasonal pattern, which exhibits regular oscillations within each year and appears stable between years, is captured by the’season_year’ graph. Ultimately, the residuals are displayed on the’remainder’ graph following the removal of the trend and seasonal components.

The ‘trend’ part of the decomposition may show signs of the 1991–1992 recession. A turning point or tiny drop in the trend line during this time suggests this. The’remainder’ component, which exhibits more variations or departures from zero around the early 1990s, may also be indicative of the recession.

Data-624 Homework 2

Mohammed Rahman

2024-02-10

Question 3.1:

Answer:

Question 3.2:

Answer:

Question 3.3:

Answer:

Question 3.4:

Answer:

Question 3.5:

Answer:

Question 3.7:

Answer:

Question 3.8:

Answer:

Question 3.9:

Answer: