Homework 2

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code

3.1)

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.2

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'dplyr' was built under R version 4.4.2

## Warning: package 'tidyr' was built under R version 4.4.2

## Warning: package 'ggplot2' was built under R version 4.4.2

## Warning: package 'tsibble' was built under R version 4.4.2

## Warning: package 'tsibbledata' was built under R version 4.4.2

## Warning: package 'feasts' was built under R version 4.4.2

## Warning: package 'fabletools' was built under R version 4.4.2

## Warning: package 'fable' was built under R version 4.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

# Check the structure of the dataset
help("global_economy")

## starting httpd help server ...

##  done

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

# Calculate GDP per capita and find the country with the highest GDP per capita
labels <- global_economy |>
  slice_max(order_by = GDP / Population, n = 1) |>
  with(annotate('label', label = Country, x = Year, y = 1.05 * (GDP / Population)))

# Create the plot of GDP per capita over time with annotation
gdp_per_capita_plot <- global_economy |>
  ggplot(aes(x = Year, y = GDP / Population, color = Country)) +
  geom_line(show.legend = FALSE, na.rm = TRUE) +
  labs(title = "GDP Per Capita Over Time",
       y = "$US") +
  labels

print(gdp_per_capita_plot)

3.2)

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.

# Load necessary libraries
library(fpp3)

# United States GDP from global_economy
us_gdp <- global_economy %>%
  filter(Country == "United States") %>%
  select(Year, GDP)

us_gdp_plot <- us_gdp %>%
  ggplot(aes(x = Year, y = GDP)) +
  geom_line() +
  theme_minimal() +
  labs(title = "United States GDP Over Time",
       x = "Year",
       y = "GDP (in USD)")

print(us_gdp_plot)

# Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock
victorian_slaughter <- aus_livestock %>%
  filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
  select(Month, Count)

victorian_slaughter_plot <- victorian_slaughter %>%
  ggplot(aes(x = Month, y = Count)) +
  geom_line() +
  theme_minimal() +
  labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers",
       x = "Month",
       y = "Count")

print(victorian_slaughter_plot)

# Victorian Electricity Demand from vic_elec
vic_electricity <- vic_elec %>%
  select(Date, Demand)

vic_electricity_plot <- vic_electricity %>%
  ggplot(aes(x = Date, y = Demand)) +
  geom_line() +
  theme_minimal() +
  labs(title = "Victorian Electricity Demand",
       x = "Date",
       y = "Demand (in MW)")

print(vic_electricity_plot)

# Gas production from aus_production
gas_production <- aus_production %>%
  select(Quarter, Gas)

gas_production_plot <- gas_production %>%
  ggplot(aes(x = Quarter, y = Gas)) +
  geom_line() +
  theme_minimal() +
  labs(title = "Gas Production in Australia",
       x = "Quarter",
       y = "Gas Production")

print(gas_production_plot)

# Describe transformations and their effects if applied

# Transformation for United States GDP (Log Transformation)
us_gdp_log <- us_gdp %>%
  mutate(Log_GDP = log(GDP))

us_gdp_log_plot <- us_gdp_log %>%
  ggplot(aes(x = Year, y = Log_GDP)) +
  geom_line() +
  theme_minimal() +
  labs(title = "Log of United States GDP Over Time",
       x = "Year",
       y = "Log(GDP)")

print(us_gdp_log_plot)

# Effect: The log transformation helps in stabilizing the variance and making the trend more linear.

# Transformation for Victorian Slaughter (Log Transformation)
victorian_slaughter_log <- victorian_slaughter %>%
  mutate(Log_Count = log(Count))

victorian_slaughter_log_plot <- victorian_slaughter_log %>%
  ggplot(aes(x = Month, y = Log_Count)) +
  geom_line() +
  theme_minimal() +
  labs(title = "Log of Slaughter of Victorian Bulls, Bullocks and Steers",
       x = "Month",
       y = "Log(Count)")

print(victorian_slaughter_log_plot)

# Effect: The log transformation helps in stabilizing the variance and making the trend more linear.

# No transformation needed for Victorian Electricity Demand and Gas Production as they do not exhibit clear non-linear trends that require transformation.

3.3)

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

library(fpp3)

# Load the canadian_gas dataset
head(canadian_gas)

## # A tsibble: 6 x 2 [1M]
##      Month Volume
##      <mth>  <dbl>
## 1 1960 Jan   1.43
## 2 1960 Feb   1.31
## 3 1960 Mar   1.40
## 4 1960 Apr   1.17
## 5 1960 May   1.12
## 6 1960 Jun   1.01

# Plot original Canadian gas production data
canadian_gas_plot <- canadian_gas |>
  autoplot(Volume) +
  labs(title = "Canadian Gas Production Over Time", 
       y = "Gas Produced (Billions of Cubic Metres)")

print(canadian_gas_plot)

# Calculate the optimal Box-Cox transformation parameter using the Guerrero method
lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)

# Print the optimal lambda value
print(lambda)

## [1] 0.5767648

# Apply the Box-Cox transformation and plot the transformed data
canadian_gas_boxcox_plot <- canadian_gas |>
  autoplot(box_cox(Volume, lambda)) +
  labs(title = "Canadian Gas Production (Box-Cox Transformed)", 
       y = "Transformed Gas Produced (Billions of Cubic Metres)")

print(canadian_gas_boxcox_plot)

This transformed data plot and determine whether the Box-Cox transformation is helpful for the canadian_gas data. This comparison will show that, due to the strong seasonality and trend components, the Box-Cox transformation might not be very effective in stabilizing the variance or improving the model fit.

3.4)

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Exercisde 7 in section 2.10: Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

#set.seed(12345678) #myseries <- aus_retail |> # filter(Series ID == sample(aus_retail$Series ID,1)) Explore your chosen retail time series using the following functions:

#autoplot(), gg_season(), gg_subseries(), gg_lag(),

#ACF() |> autoplot()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

# Load the necessary libraries
library(fpp3)

# Set the seed and select one time series from the aus_retail dataset
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# Explore the chosen retail time series using autoplot(), gg_season(), gg_subseries(), gg_lag(), and ACF()

# Autoplot of the series
autoplot(myseries, Turnover) +
  labs(title = "Monthly Australian Retail Turnover",
       y = "Turnover (in AUD)")

# Seasonal plot
gg_season(myseries, Turnover) +
  labs(title = "Seasonal Plot: Monthly Australian Retail Turnover",
       y = "Turnover (in AUD)")

# Subseries plot
gg_subseries(myseries, Turnover) +
  labs(title = "Subseries Plot: Monthly Australian Retail Turnover",
       y = "Turnover (in AUD)")

# Lag plot
gg_lag(myseries, Turnover) +
  labs(title = "Lag Plot: Monthly Australian Retail Turnover",
       y = "Turnover (in AUD)")

# ACF plot
ACF(myseries, Turnover) |>
  autoplot() +
  labs(title = "Autocorrelation Function: Monthly Australian Retail Turnover")

# Calculate the optimal Box-Cox transformation parameter using the Guerrero method
lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

# Print the optimal lambda value
print(lambda)

## [1] 0.08303631

# Apply the Box-Cox transformation and plot the transformed data
myseries_transformed <- myseries |>
  mutate(Turnover_BoxCox = box_cox(Turnover, lambda))

autoplot(myseries_transformed, Turnover_BoxCox) +
  labs(title = "Box-Cox Transformed Monthly Australian Retail Turnover",
       y = "Transformed Turnover (in AUD)")

3.5)

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

# Load necessary libraries
library(fpp3)

# Tobacco from aus_production
df1 <- aus_production
f1 <- "Tobacco"

# Plot original data for Tobacco
df1 |>
  autoplot(Tobacco, na.rm = TRUE) +
  labs(title = sprintf("%s Over Time", f1), y = sprintf("%s", f1))

# Calculate the optimal Box-Cox transformation parameter for Tobacco
lambda_tobacco <- df1 |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)

# Apply the Box-Cox transformation for Tobacco and plot transformed data
df1 |>
  autoplot(box_cox(Tobacco, lambda_tobacco), na.rm = TRUE) +
  labs(title = sprintf("Transformed %s Over Time", f1), y = sprintf("%s", f1))

# Print the lambda value for Tobacco
print(sprintf("The auto tuned Box-Cox transform with a lambda of %s is what we would use to transform this data", lambda_tobacco))

## [1] "The auto tuned Box-Cox transform with a lambda of 0.926463585274373 is what we would use to transform this data"

# Economy class passengers between Melbourne and Sydney from ansett
df1 <- ansett
f1 <- "Passengers"

df1 |>
  filter(Airports == "MEL-SYD", Class == "Economy") -> df1

# Plot original data for Economy Passengers
df1 |>
  autoplot(Passengers) +
  labs(title = sprintf("Economy %s Over Time", f1), y = sprintf("%s", f1))

# Calculate the optimal Box-Cox transformation parameter for Economy Passengers
lambda_economy_passengers <- df1 |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)

# Apply the Box-Cox transformation for Economy Passengers and plot transformed data
df1 |>
  autoplot(box_cox(Passengers, lambda_economy_passengers)) +
  labs(title = sprintf("Transformed Economy %s Over Time", f1), y = sprintf("%s", f1))

# Print the lambda value for Economy Passengers
print(sprintf("The auto tuned Box-Cox transform with a lambda of %s is what we would use to transform this data", lambda_economy_passengers))

## [1] "The auto tuned Box-Cox transform with a lambda of 1.9999267732242 is what we would use to transform this data"

# Pedestrian counts at Southern Cross Station from pedestrian
df1 <- pedestrian |>
  filter(Sensor == "Southern Cross Station")
f1 <- "Count"

# Plot original data for Pedestrian Counts
df1 |>
  autoplot(Count) +
  labs(title = sprintf("%s Over Time", f1), y = sprintf("%s", f1))

# Calculate the optimal Box-Cox transformation parameter for Pedestrian Counts
lambda_pedestrian_counts <- df1 |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)

# Apply the Box-Cox transformation for Pedestrian Counts and plot transformed data
df1 |>
  autoplot(box_cox(Count, lambda_pedestrian_counts)) +
  labs(title = sprintf("Transformed %s Over Time", f1), y = sprintf("%s", f1))

# Print the lambda value for Pedestrian Counts
print(sprintf("The auto tuned Box-Cox transform with a lambda of %s is what we would use to transform this data", lambda_pedestrian_counts))

## [1] "The auto tuned Box-Cox transform with a lambda of -0.2501615623911 is what we would use to transform this data"

3.7)

Consider the last five years of the Gas data from aus_production.

#gas <- tail(aus_production, 5*4) |> select(Gas) Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?

#1 Load necessary libraries
library(fpp3)

# Extract the last five years of the Gas data
gas <- aus_production %>%
  tail(5 * 4) %>%
  select(Gas)

# Plot the time series
autoplot(gas, Gas) +
  labs(title = "Gas Production Over the Last Five Years",
       y = "Gas Produced (millions of cubic metres)")

#2 Identify seasonal fluctuations and/or trend-cycle using STL decomposition
# Use STL decomposition to calculate the trend-cycle and seasonal components
gas_stl <- gas %>%
  model(STL(Gas ~ season(window = "periodic")))

# Extract the components
gas_stl_components <- components(gas_stl)

# Plot the decomposed components
autoplot(gas_stl_components) +
  labs(title = "STL Decomposition of Gas Production",
       y = "Gas Produced (millions of cubic metres)")

#3 Compute and plot the seasonally adjusted data
# Compute seasonally adjusted data
gas_seasonally_adjusted <- gas_stl_components %>%
  select(Quarter, season_adjust)

# Plot the seasonally adjusted data
autoplot(gas_seasonally_adjusted, season_adjust) +
  labs(title = "Seasonally Adjusted Gas Production",
       y = "Gas Produced (millions of cubic metres)")

#4 Introduce an outlier, recompute, and analyze the effect
# Introduce an outlier by adding 300 to one observation (e.g., the first observation)
gas_outlier <- gas
gas_outlier$Gas[1] <- gas_outlier$Gas[1] + 300

# Use STL decomposition for the gas data with the outlier
gas_outlier_stl <- gas_outlier %>%
  model(STL(Gas ~ season(window = "periodic")))

# Extract the components with the outlier
gas_outlier_stl_components <- components(gas_outlier_stl)

# Compute seasonally adjusted data with the outlier
gas_outlier_seasonally_adjusted <- gas_outlier_stl_components %>%
  select(Quarter, season_adjust)

# Plot the seasonally adjusted data with the outlier
autoplot(gas_outlier_seasonally_adjusted, season_adjust) +
  labs(title = "Seasonally Adjusted Gas Production with Outlier",
       y = "Gas Produced (millions of cubic metres)")

#5 Analyze the effect of the outlier near the end of the time series
# Introduce an outlier near the end of the time series
gas_outlier_end <- gas
gas_outlier_end$Gas[nrow(gas_outlier_end)] <- gas_outlier_end$Gas[nrow(gas_outlier_end)] + 300

# Use STL decomposition for the gas data with the outlier at the end
gas_outlier_end_stl <- gas_outlier_end %>%
  model(STL(Gas ~ season(window = "periodic")))

# Extract the components with the outlier at the end
gas_outlier_end_stl_components <- components(gas_outlier_end_stl)

# Compute seasonally adjusted data with the outlier at the end
gas_outlier_end_seasonally_adjusted <- gas_outlier_end_stl_components %>%
  select(Quarter, season_adjust)

# Plot the seasonally adjusted data with the outlier at the end
autoplot(gas_outlier_end_seasonally_adjusted, season_adjust) +
  labs(title = "Seasonally Adjusted Gas Production with Outlier at End",
       y = "Gas Produced (millions of cubic metres)")

Interpretation: The plot clearly illustrates the effect of the outlier on the seasonally adjusted gas production data. The outlier at the end causes a sharp increase, which could be due to an anomaly or a significant positive increase in production. Analyzing the outlier’s position helps understand its impact on the overall trend and seasonally adjusted series.

3.8)

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

# Install the 'seasonal' package if not already installed
if (!requireNamespace("seasonal", quietly = TRUE)) {
  install.packages("seasonal")
}

#1 Load necessary libraries
library(fpp3)
library(seasonal)

## Warning: package 'seasonal' was built under R version 4.4.2

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

# Set the seed and select one time series from the aus_retail dataset
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# Plot the time series
autoplot(myseries, Turnover) +
  labs(title = "Monthly Australian Retail Turnover",
       y = "Turnover (in AUD)")

#2 Decompose the series using X-11
myseries_decomposed <- myseries |>
  model(X11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()

# Plot the decomposed components
autoplot(myseries_decomposed) +
  labs(title = "X-11 Decomposition of Monthly Australian Retail Turnover",
       y = "Turnover (in AUD)")

#3 Analyze the decomposed components for outliers and unusual features
# Print the decomposed components to identify outliers or unusual features
print(myseries_decomposed)

## # A dable: 369 x 9 [1M]
## # Key:     State, Industry, .model [1]
## # :        Turnover = trend * seasonal * irregular
##    State              Industry .model    Month Turnover trend seasonal irregular
##    <chr>              <chr>    <chr>     <mth>    <dbl> <dbl>    <dbl>     <dbl>
##  1 Northern Territory Clothin… X11    1988 Apr      2.3  2.64    0.852     1.02 
##  2 Northern Territory Clothin… X11    1988 May      2.9  2.63    0.958     1.15 
##  3 Northern Territory Clothin… X11    1988 Jun      2.6  2.64    1.01      0.974
##  4 Northern Territory Clothin… X11    1988 Jul      2.8  2.67    1.20      0.874
##  5 Northern Territory Clothin… X11    1988 Aug      2.9  2.72    1.09      0.982
##  6 Northern Territory Clothin… X11    1988 Sep      3    2.80    1.09      0.986
##  7 Northern Territory Clothin… X11    1988 Oct      3.1  2.92    1.07      0.995
##  8 Northern Territory Clothin… X11    1988 Nov      3    3.04    0.972     1.02 
##  9 Northern Territory Clothin… X11    1988 Dec      4.2  3.13    1.36      0.991
## 10 Northern Territory Clothin… X11    1989 Jan      2.7  3.16    0.840     1.02 
## # ℹ 359 more rows
## # ℹ 1 more variable: season_adjust <dbl>

# Seasonally adjusted data
autoplot(myseries_decomposed, season_adjust) +
  labs(title = "Seasonally Adjusted Monthly Australian Retail Turnover",
       y = "Turnover (in AUD)")

Interpretation; Analysis of Australian retail turnover from 1985 to 2020 using X-11 decomposition reveals a strong, long-term growth trend, increasing from approximately 4 AUD to around 10 AUD. After removing seasonal fluctuations, the trend component clearly illustrates this upward trajectory. While the data initially showed more volatility, the irregular component, representing random variations, stabilized over time, indicating a more consistent retail market in recent years. In essence, the analysis confirms significant growth in the retail sector and a decrease in market volatility.

3.9)

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. Is the recession of 1991/1992 visible in the estimated components?

#Interpretation: Analysis of the Australian civilian labor force (1978-1995) using decomposition reveals a clear upward trend, with the labor force growing from approximately 6.5 to 8 million. This growth is accompanied by consistent monthly seasonal patterns. The decomposition also isolates an irregular component, representing random fluctuations and deviations not explained by the trend or seasonal variations.

The 1991/1992 recession is clearly visible in the decomposed data. The recession’s impact is seen as a noticeable dip in the long-term growth trend and significant fluctuations in the irregular component, reflecting the economic shock to the labor market beyond normal seasonal variations. This analysis effectively separates long-term growth, seasonal changes, and the specific impact of the recession.

Assignment2 D624

Jose Fuentes

2025-02-15