Data 624 Week 3

Instructions

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code

# Load libraries
library(fpp3)
library(dplyr)
library(gridExtra)
library(seasonal)

Exercise 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Luxembourg has the highest GDP per Capita over time. In normal plot of the data shows that it and Norway have the highest increases in GDP per capital in absolute gains since about 2000 with marked decreases towards the end of the series at 2015.

The log GDP per Capita shows that Louxembourg has the highest GDP per Capita – but the percentage growth relative to its peers is very similar and roughly linear since the 1960’s.

# Calculate GDP per capita
gdp_pc <- global_economy |>
  mutate(GDP_per_Capita = GDP / Population)

# Top 7 countries based on latest GDP per capita
top_countries <- gdp_pc |>
  filter(Year == max(Year)) |>
  arrange(desc(GDP_per_Capita)) |>
  slice(1:7) |>
  pull(Country)

# Plot GDP per capita for top 7 countries
p1 <- gdp_pc |>
  filter(Country %in% top_countries) |>
  group_by(Country) |>
  autoplot(GDP_per_Capita) +
  labs(
    title = "GDP per Capita Over Time for Top 7 Countries",
    x = "Year",
    y = "GDP per Capita (USD)"
  ) +
  scale_color_discrete(name = "Country") +
  theme_minimal()


# Calculate GDP per capita with log transformation
gdp_pc <- global_economy |>
  mutate(GDP_per_Capita = GDP / Population,
         log_GDP_per_Capita = log(GDP_per_Capita))

# Top 7 countries based on latest log-transformed GDP per capita
top_countries <- gdp_pc |>
  filter(Year == max(Year)) |>
  arrange(desc(GDP_per_Capita)) |>
  slice(1:7) |>
  pull(Country)

# Plot log-transformed GDP per capita for top 7 countries
p2 <- gdp_pc |>
  filter(Country %in% top_countries) |>
  group_by(Country) |>
  autoplot(log_GDP_per_Capita) +
  labs(
    title = "Log of GDP per Capita Over Time for Top 7 Countries",
    x = "Year",
    y = "Log(GDP per Capita)"
  ) +
  scale_color_discrete(name = "Country") +
  theme_minimal()

# Display plots
grid.arrange(p1, p2, ncol = 1)

Exercise 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

The US GDP per Capita over time plots generally linear growth with very little variance. The normal data plot would not require a log transformation on its own as the variance is steady over the period of time in both absolute and relative percentage increases.

# Calculate United States GDP per Capita
us_gdp_per_capita <- global_economy |>
  filter(Country == "United States") |>
  mutate(GDP_per_Capita = GDP / Population)

# Plot US GDP per Capita 
p1 <- us_gdp_per_capita |>
  autoplot(GDP_per_Capita) +
  labs(
    title = "United States GDP per Capita Over Time",
    x = "Year",
    y = "GDP per Capita (USD)"
  ) +
  theme_minimal()

# Calculate GDP per capita for the United States with log transformation
us_gdp_per_capita <- global_economy |>
  filter(Country == "United States") |>
  mutate(GDP_per_Capita = log(GDP / Population))


# Plot log US GDP per Capita 
p2 <- us_gdp_per_capita |>
  autoplot(GDP_per_Capita) +
  labs(
    title = "Log of United States GDP per Capita Over Time",
    x = "Year",
    y = "Log of GDP per Capita (USD)"
  ) +
  theme_minimal()

# Display plots
grid.arrange(p1, p2, ncol = 1)

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

The normal data plot of the livestock data set shows a downward trend over team (changing mean), fluctuating variance and possibly seasonality. A seasonal monthly difference (12 month) was applied to normalize the mean and variance.

# Victorian slaughter data for "Bulls, bullocks and steers"
vic_slaughter <- aus_livestock |>
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria")

# Plot Victorian slaughter data
p1 <- vic_slaughter |>
  autoplot(Count) +
  labs(
    title = "Livestock Slaughter Over Time",
    x = "Month",
    y = "Number of Animals Slaughtered"
  ) +
  theme_minimal()


# Seasonal differencing with lag = 12 (monthly data)
vic_slaughter_diff <- vic_slaughter |>
  mutate(Diff_Count = difference(Count, lag = 12))

# Plot the seasonally differenced data
p2 <- vic_slaughter_diff |>
  autoplot(Diff_Count) +
  labs(
    title = "Seasonally Differenced Livestock Slaughter (Lag = 12 months)",
    y = "Differenced Count",
    x = "Year"
  )+
  theme_minimal()

# Display plots
grid.arrange(p1, p2, ncol = 1)

Victorian Electricity Demand from vic_elec.

The normal data plot of the electrical demand data set fluctuating variance and daily seasonality. A seasonal hourly difference (24 hours) was applied to normalize the mean and variance.

# Plot Victorian Electricity Demand
p1 <- vic_elec |>
  autoplot(Demand) +
  labs(
    title = "Victorian Electricity Demand Over Time",
    x = "Time",
    y = "Electricity Demand (MW)"
  ) +
  theme_minimal()

# seasonal differencing with lag = 24 (hourly data)
vic_elec_diff <- vic_elec |>
  mutate(Diff_Demand = difference(Demand, lag = 24))

# Plot the seasonally differenced data
p2 <- vic_elec_diff |>
  autoplot(Diff_Demand) +
  labs(
    title = "Seasonally Differenced Victorian Electricity Demand (Lag = 24 hours)",
    y = "Differenced Demand",
    x = "Time"
  )+
  theme_minimal()

# Display plots
grid.arrange(p1, p2, ncol = 1)

Gas production from aus_production.

The normal data plot for gas production shows an upward trend with fluctuating variance. Variance at the start of the time series is small but increases steadily over time. The Box Cox stabilizes the variance so the variance is consistent throughout.

# Plot Gas Production
p1 <- aus_production |>
  autoplot(Gas) +
  labs(
    title = "Gas Production Over Time",
    x = "Quarter",
    y = "Gas Production"
  ) +
  theme_minimal() # Optional: To keep the minimal theme

# Optimal lambda for the Box-Cox transformation
lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)

# Box-Cox transformation using the optimal lambda
gas_production_trans <- aus_production |>
  mutate(BoxCox_Gas = box_cox(Gas, lambda))

# Plot the transformed Gas Production data
p2 <- gas_production_trans |>
  autoplot(BoxCox_Gas) +
  labs(
    title = paste("Box-Cox Transformation Gas Production (Lambda =", round(lambda, 3), ")"),
    x = "Quarter",
    y = "(Box-Cox)Gas Production"
  ) +
  theme_minimal()

# Display plots
grid.arrange(p1, p2, ncol = 1)

Exercise 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

The Box Cox transformation is intended to stabilize the variance given the optimal lambda. In normal data plot shows fluctuating variance; this fluctuating variance remains aftet Box Cox transformation - perhaps due to strong seasonality or other non normality of the data.

# Plot gas production data
p1 <- autoplot(canadian_gas, Volume) +
  labs(title = "Canadian Gas Production",
       y = "Volume",
       x = "Year")+
  theme_minimal()

# Optimal Box-Cox lamdba
lambda <- canadian_gas |> 
  features(Volume, features = guerrero) |> 
  pull(lambda_guerrero)

# Box-Cox transformation to the data
canadian_gas_transformed <- canadian_gas |> 
  mutate(Transformed_Volume = box_cox(Volume, lambda))

# Plot the Box-Cox transformed data
p2 <- autoplot(canadian_gas_transformed, Transformed_Volume) +
  labs(title = paste("Box-Cox Transformed Canadian Gas Production (Lambda =", round(lambda, 3), ")"),
       y = "(Box Cox) Volumne",
       x = "Year")+
  theme_minimal()

# Display plots
grid.arrange(p1, p2, ncol = 1)

Exercise 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

The time series shows fluctuating variance - low at the start and steadily increasing over time. A Box Cox with Lambda .445 was able to stabilize the variance.

set.seed(123456789)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

p1 <- autoplot(myseries, Turnover) +
  labs(title = "Time Series of Turnover",
       y = "Turnover",
       x = "Time")+
  theme_minimal()

# Optimal lambda using Guerrero method
lambda_guerrero <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

# Box-Cox transformation
myseries_transformed <- myseries |>
  mutate(Turnover_BoxCox = box_cox(Turnover, lambda_guerrero))


# Plot the Box-Cox transformed data
p2 <- autoplot(myseries_transformed, Turnover_BoxCox) +
  labs(title = paste("Box-Cox Transformed Time Series of Turnover (Lambda =", round(lambda_guerrero, 3), ")"),
       y = "Box Cox Turnover",
       x = "Time")+
  theme_minimal()

# Display plots
grid.arrange(p1, p2, ncol = 1)

Exercise 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

# 1. Tobacco from aus_production 
aus_prod <- aus_production |>
  as_tsibble(index = Quarter)

# Tobacco plot
p1<- aus_prod |>
  autoplot(Tobacco) +
  ggtitle("Tobacco Production Series") +
  xlab("Year") + ylab("Tobacco Production")+
  theme_minimal()

# Box-Cox transformation using Guerrero method
lambda_tobacco <- aus_prod |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)

# Box-Cox transformation
aus_prod <- aus_prod |>
  mutate(Tobacco_BoxCox = box_cox(Tobacco, lambda_tobacco))

# Plot with Box Cox Transformation
p2 <- aus_prod |>
  autoplot(Tobacco_BoxCox) +
  ggtitle(paste("Tobacco Production with Box-Cox Transformation (Lambda =", round(lambda_tobacco, 3), ")")) +
  xlab("Year") + ylab("(Box Cox) Tobacco Production")+
  theme_minimal()

# Display plots
grid.arrange(p1, p2, ncol = 1)

# 2. Economy class passengers for Melbourne-Sydney
ansett_data <- ansett |>
  filter(Airports == "MEL-SYD", Class == "Economy") |>
  as_tsibble(index = Week)

# Passenger plot
p3 <- ansett_data |>
  autoplot(Passengers) +
  ggtitle("Economy Class Passengers (Melbourne-Sydney)") +
  xlab("Year") + ylab("Passengers")+
  theme_minimal()


# Box-Cox transformation using Guerrero method
lambda_passengers <- ansett_data |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)

# Box-Cox transformation
ansett_data <- ansett_data |>
  mutate(Passengers_BoxCox = box_cox(Passengers, lambda_passengers))

# Plot Box Cox transformation
p4 <- ansett_data |>
  autoplot(Passengers_BoxCox) +
  ggtitle(paste("Economy Class Passengers with Box-Cox Transformation (Lambda =", round(lambda_passengers, 3), ")")) +
  xlab("Year") + ylab("(Box Cox) Passengers")+
  theme_minimal()

# Display plots
grid.arrange(p3, p4, ncol = 1)

# 3. Pedestrian counts from Southern Cross Station
pedestrian_data <- pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  as_tsibble(index = Date_Time)

# Pedestrian plot
p5 <- pedestrian_data |>
  autoplot(Count) +
  ggtitle("Pedestrian Counts at Southern Cross Station") +
  xlab("Year") + ylab("Pedestrian Counts")+
  theme_minimal()

# Box-Cox transformation using Guerrero method
lambda_pedestrian <- pedestrian_data |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)

# Box-Cox transformation
pedestrian_data <- pedestrian_data |>
  mutate(Count_BoxCox = box_cox(Count, lambda_pedestrian))

# Plot Box Cox transformation
p6 <- pedestrian_data |>
  autoplot(Count_BoxCox) +
  ggtitle(paste("Pedestrian Counts with Box-Cox Transformation (Lambda =", round(lambda_pedestrian, 3), ")")) +
  xlab("Year") + ylab("(Box Cox) Pedestrian Counts")+
  theme_minimal()

# Display plots
grid.arrange(p5, p6, ncol = 1)

Exercise 3.7

Consider the last five years of the Gas data from aus_production,

**gas <- tail(aus_production, 5*4) |> select(Gas)**

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

In the normal data plot for gas production, there is upward trend over time and clear indication of regular/consistent seasonality.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.Do the results support the graphical interpretation from part a?

Yes. The decomposition components show the upward trend and the regular seasonal pattern. The residuals appear to be random and relatively very small.

Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Generally, I would expect the outlier occurring near the beginning to affect the trend locally but the decomposition will smooth it out over time with minimal impact to overall shape except for the start of the plot. The effect would be the same at the other end where the shape may shift to reflect the outlier but otherwise has no impact on the shape. Outliers in the middle may be more noticeable where they occur but it will also be smoothed out by the decomposition. The outliers would appear in the residuals but the decomposition preserves the trend and seasonality with outliers present.

# Gas Production Data for Last 5 Years
gas <- tail(aus_production, 5*4) |> dplyr::select(Gas)

# Plot last 5 years of Gas production
p1 <- autoplot(gas) + 
  ggtitle("Gas Production - Last 5 Years") +
  ylab("Gas Production")

# Classical decomposition with multiplicative type
decomp <- gas |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

# Extract components
components <- components(decomp)

# Plot the decomposition components
p6 <- autoplot(components) +
  ggtitle("Classical Multiplicative Decomposition")

# Compute the seasonally adjusted data
seasonally_adjusted <- components |>
  mutate(Seasonally_Adjusted = Gas / season_adjust)

# Plot the seasonally adjusted data
p2 <- autoplot(seasonally_adjusted, Seasonally_Adjusted) +
  ggtitle("Seasonally Adjusted Gas Production") +
  ylab("Seasonally Adjusted")

# Outlier near the beginning (add 300 to observation)
gas_outlier_start <- gas
gas_outlier_start[1, "Gas"] <- gas_outlier_start[1, "Gas"] + 300

# Apply decomposition to outlier data
decomp_outlier_start <- gas_outlier_start |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

# Extract components
components_outlier_start <- components(decomp_outlier_start)

# Compute seasonally adjusted data with the outlier
seasonally_adjusted_outlier_start <- components_outlier_start |>
  mutate(Seasonally_Adjusted = Gas / season_adjust)

# Plot the seasonally adjusted data with the outlier
p3 <- autoplot(seasonally_adjusted_outlier_start, Seasonally_Adjusted) +
  ggtitle("Seasonally Adjusted Gas Production with Beginning Outlier") +
  ylab("Seasonally Adjusted")

# Outlier near the middle (add 300 to observation)
gas_outlier_middle <- gas
gas_outlier_middle[10, "Gas"] <- gas_outlier_middle[10, "Gas"] + 300

# Apply decomposition
decomp_outlier_middle <- gas_outlier_middle |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

#Extract components
components_outlier_middle <- components(decomp_outlier_middle)

# Compute seasonally adjusted data with the outlier
seasonally_adjusted_outlier_middle <- components_outlier_middle |>
  mutate(Seasonally_Adjusted = Gas / season_adjust)

# Plot the seasonally adjusted data with the outlier
p4 <- autoplot(seasonally_adjusted_outlier_middle, Seasonally_Adjusted) +
  ggtitle("Seasonally Adjusted Gas Production with Middle Outlier") +
  ylab("Seasonally Adjusted")

# Outlier near the end (add 300 to observation)
gas_outlier_end <- gas
gas_outlier_end[nrow(gas_outlier_end), "Gas"] <- gas_outlier_end[nrow(gas_outlier_end), "Gas"] + 300

# Apply decomposition
decomp_outlier_end <- gas_outlier_end |> 
  model(classical_decomposition(Gas, type = "multiplicative"))

# Extract components
components_outlier_end <- components(decomp_outlier_end)

# Compute seasonally adjusted data with the outlier
seasonally_adjusted_outlier_end <- components_outlier_end |>
  mutate(Seasonally_Adjusted = Gas / season_adjust)

# Plot the seasonally adjusted data with the outlier
p5 <- autoplot(seasonally_adjusted_outlier_end, Seasonally_Adjusted) +
  ggtitle("Seasonally Adjusted Gas Production with End Outlier") +
  ylab("Seasonally Adjusted")

# Display plots
grid.arrange(p1, p2, ncol = 1)

print(p6)

grid.arrange(p3, p4, p5, ncol = 1)

Exercise 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

The trend closely follows tha shape of the data include a large dip around 2012. The seasonality is strong and consistent but the magnitude decrease over time - possibly indicating external factors affecting the seasonality - like changes to consumer habits, impact of technology to retail data, etc. The residuals indicate a possible outlier in the late 1980’s but otherwise is stable and smaller fluctations – possibly indicatre stable retail markets.

# Decomposition using the X-11 method
decomp <- myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()

# Plots the components
autoplot(decomp) +
  labs(title = "X-11 Decomposition of My Time Series",
       y = "Turnover")

Exercise 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Is the recession of 1991/1992 visible in the estimated components?

The trend component of the Australian civilian labor force decomposition shows a steady increase over time, indicating long-term growth in the labor force, The seasonal component fluctuates between plus/minus 100, showing a regular, cyclical pattern with some months experiencing stronger seasonal effects than others. The remainder component remains relatively small but shows a significant dip around 1991, suggesting an economic disruption.

The monthly subseries plot shows notable seasonal variations for different months. January, August, October & November consistently show lower levels in the labor force sitting below the baseline (zero). This could indicate seasonal dips in workforce participation during these months. March exhibits more complex behavior with both significant spikes and drops possibly indicating variability due to external labor conditions (outside of trend and seasonality). December consistently shows higher labor force participation with a steady upward trend. This month does not show the volatility seen in March, indicating a more predictable seasonal increase during the end of the year. This might reflect the typical end-of-year economic activities like retail increase triggered by the holiday season.

The recession of 1991/1992 is visible in the remainder component. The sharp drop in the residuals indicates that external factors, such as the recession,impacted the labor force beyond what cam be explained by the trend and seasonal patterns. The trend also slows down slightly during this time (getting flatter), reflecting the affects of the recession.