Chapter 7

Exercise 1

A

jan14_vic_elec <- vic_elec |>
  filter(yearmonth(Time) == yearmonth("2014 Jan")) |>
  index_by(Date = as_date(Time)) |>
  summarise(
    Demand = sum(Demand),
    Temperature = max(Temperature)
  )


jan14_vic_elec |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = Demand`

model1 <- lm(Demand ~ Temperature, data = jan14_vic_elec)
summary(model1)

## 
## Call:
## lm(formula = Demand ~ Temperature, data = jan14_vic_elec)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -49978 -10219   -121  18533  35441 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  59083.9    17424.8   3.391  0.00203 ** 
## Temperature   6154.3      601.3  10.235 3.89e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24540 on 29 degrees of freedom
## Multiple R-squared:  0.7832, Adjusted R-squared:  0.7757 
## F-statistic: 104.7 on 1 and 29 DF,  p-value: 3.89e-11

The coefficient on temperature is positive and statistically significant. The coefficient being positive in the days of a month like January makes sense. In a summer month, electricity may be used more intensely on warmer days for the purposes of air conditioning.

B

plot(model1)

hist(jan14_vic_elec$Demand)

hist(jan14_vic_elec$Temperature)

The model is performing fairly well upon looking at the residual plots. Some observations are having significant influence on the results, though, including numbers 5, 1 and 26.

C

jan14_vic_elec |>
  model(TSLM(Demand ~ Temperature)) |>
  forecast(
    new_data(jan14_vic_elec, 1) |>
      mutate(Temperature = 15)
  ) |>
  autoplot(jan14_vic_elec)

## Warning: Computation failed in `stat_interval()`.
## Caused by error in `Ops.Date()`:
## ! * not defined for "Date" objects

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

jan14_vic_elec |>
  model(TSLM(Demand ~ Temperature)) |>
  forecast(
    new_data(jan14_vic_elec, 1) |>
      mutate(Temperature = 35)
  ) |>
  autoplot(jan14_vic_elec)

## Warning: Computation failed in `stat_interval()`.
## Caused by error in `Ops.Date()`:
## ! * not defined for "Date" objects

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

mean(jan14_vic_elec$Temperature)

## [1] 28.03548

median(jan14_vic_elec$Temperature)

## [1] 26

min(jan14_vic_elec$Temperature)

## [1] 19.6

These forecasts are reliant on the assumptions of temperature. Therefore the predictions of demand are only as good as the predictions of temperature. Based on the history, a temperature of 35 degrees Celsius (7 degrees above average) is more realistic than 15 degrees (13 below average). There is no precedent in the data for a 15 degree day - the previous minimum is ~20 degrees.

D

forecast_data <- jan14_vic_elec |> 
  model(TSLM(Demand ~ Temperature)) |> 
  forecast(
    new_data(jan14_vic_elec, 1) |> 
      mutate(Temperature = 35),
    level = c(80, 95)  # Request prediction intervals at 80% and 95%
  )

forecast_data

## # A fable: 1 x 5 [1D]
## # Key:     .model [1]
##   .model                     Date                   Demand   .mean Temperature
##   <chr>                      <date>                 <dist>   <dbl>       <dbl>
## 1 TSLM(Demand ~ Temperature) 2014-02-01 N(274484, 6.4e+08) 274484.          35

# Plot the forecast with intervals
forecast_data |> 
  autoplot(jan14_vic_elec) + 
  ggtitle("TSLM Forecast with Prediction Intervals")

## Warning: Computation failed in `stat_interval()`.
## Caused by error in `Ops.Date()`:
## ! * not defined for "Date" objects

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

Chapter 2

Question 8

all_vic_elec <- vic_elec |>
  index_by(Date = as_date(Time)) |>
  summarise(
    Demand = sum(Demand),
    Temperature = max(Temperature)
  )

# Calculate the correlation between Demand and Temperature
cor_value <- cor(all_vic_elec$Demand, all_vic_elec$Temperature)

# Create the plot and add the correlation value as text
ggplot(all_vic_elec, aes(x = Temperature, y = Demand)) +
  geom_point() +  # Scatter plot for Demand vs. Temperature
  geom_smooth(method = "lm", color = "blue", se = FALSE) +  # Optional: Add a regression line
  geom_text(
    aes(x = max(Temperature), y = max(Demand), 
        label = paste("Correlation: ", round(cor_value, 2))), 
    hjust = 1.1, vjust = 1.1, size = 5, color = "black"
  ) +
  theme_minimal() +
  ggtitle("Demand vs. Temperature with Correlation")

## Warning in geom_text(aes(x = max(Temperature), y = max(Demand), label = paste("Correlation: ", : All aesthetics have length 1, but the data has 1096 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

## `geom_smooth()` using formula = 'y ~ x'

cor(jan14_vic_elec$Temperature,jan14_vic_elec$Demand)

## [1] 0.8849706

The scatterplot of the two variables, and their low correlation coefficient (0.04), suggest that the relationship is not very strong when looking at the entire dataset. Interestingly, the correlation is much higher in January 2014. This suggests that the model is not generalizable to all months.

Chapter 7

Question 5

A

# Ensure the Week variable is in the correct format


us_gas <- us_gasoline |> 
  filter(Week <= yearweek("2004 W52")) 

us_gas$Barrels <- as.numeric(us_gas$Barrels)

#us_gas <- us_gas |> 
#  mutate(Week = as.yearweek(Week))

us_gas <- us_gas |> 
  mutate(trend = row_number())

us_gas <- us_gas |> 
  filter(!is.na(Barrels))


fit_gas1 <- us_gas |>
  model(TSLM(Barrels ~ trend + fourier(K = 1)))

fit_gas2 <- us_gas |>
  model(TSLM(Barrels ~ trend + fourier(K = 2)))

fit_gas3 <- us_gas |>
  model(TSLM(Barrels ~ trend + fourier(K = 3)))


# Extract AIC from the model
aic_value1 <- fit_gas1 |> 
  glance() |>
  dplyr::select(AIC)
print(aic_value1)

## # A tibble: 1 × 1
##      AIC
##    <dbl>
## 1 -1808.

aic_value2 <- fit_gas2 |> 
  glance() |>
  dplyr::select(AIC)
print(aic_value2)

## # A tibble: 1 × 1
##      AIC
##    <dbl>
## 1 -1813.

aic_value3 <- fit_gas3 |> 
  glance() |>
  dplyr::select(AIC)
print(aic_value3)

## # A tibble: 1 × 1
##      AIC
##    <dbl>
## 1 -1852.

Lower AIC in model with 1 fourier term suggests that is the better fit among the three.

fit_gas1 |>
  gg_tsresiduals()

augment(fit_gas1) |>
  features(.innov, ljung_box, lag = 10)

## # A tibble: 1 × 3
##   .model                                 lb_stat lb_pvalue
##   <chr>                                    <dbl>     <dbl>
## 1 TSLM(Barrels ~ trend + fourier(K = 1))    16.3    0.0924

The p value of 0.09 and the ACF thresholds remaining within the thresholds (mostly) suggests that there is not autocorrelation in the residuals of the model. This further suggests that the model is well fit.

# Generate a new data frame for forecasting (52 weeks ahead)
future_data <- us_gasoline |> 
  filter(Week > max(us_gas$Week)) |> 
  head(52) |> 
  mutate(trend = seq(max(us_gas$trend) + 1, by = 1, length.out = 52))

# If future_data is unavailable in the original dataset, create a new one
if (nrow(future_data) < 52) {
  future_data <- tibble(
    Week = seq(max(us_gas$Week) + 1, by = 1, length.out = 52) |> as.yearweek(),
    trend = seq(max(us_gas$trend) + 1, by = 1, length.out = 52)
  )
}

# Forecast
forecast_gas <- fit_gas1 |> forecast(new_data = future_data)

us_gasoline <- us_gasoline |>
  filter(Week < yearweek("2005 W52"))

# Print the forecast
print(forecast_gas)

## # A fable: 52 x 5 [1W]
## # Key:     .model [1]
##    .model                                     Week       Barrels .mean trend
##    <chr>                                    <week>        <dist> <dbl> <dbl>
##  1 TSLM(Barrels ~ trend + fourier(K = 1)) 2004 W53 N(8.9, 0.083)  8.88   726
##  2 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W01 N(8.9, 0.083)  8.87   727
##  3 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W02 N(8.9, 0.083)  8.87   728
##  4 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W03 N(8.9, 0.083)  8.87   729
##  5 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W04 N(8.9, 0.083)  8.87   730
##  6 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W05 N(8.9, 0.083)  8.88   731
##  7 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W06 N(8.9, 0.083)  8.89   732
##  8 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W07 N(8.9, 0.083)  8.91   733
##  9 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W08 N(8.9, 0.083)  8.93   734
## 10 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W09   N(9, 0.083)  8.95   735
## # ℹ 42 more rows

# Visualize the forecast
forecast_gas |> 
  autoplot(us_gasoline, level = NULL) +
  ggtitle("1-year Forecast of Gasoline Barrels") +
  xlab("Week") +
  ylab("Barrels") +
  geom_line(data = us_gasoline, aes(x = Week, y = Barrels), color = "gray") +
  labs(color = "Legend")

One glaring feature of the forecast is how smooth it is compared to the history. The variance in the history is pronounced and likely affected by seasonality. The forecast does not consider this given its simple specification.

Chapter 8

Question 6

A

China_GDP <- global_economy %>%
  filter(Country =="China") %>%
  dplyr::select(Year, GDP) %>%
  mutate(GDP_bc = (GDP^0.5 - 1) / 0.5)

ETS_fit1 <- China_GDP %>%
  model(ses = ETS(GDP ~ error('A') + trend('A')))

ETS_fit1 |> forecast(h = 10) |>
  autoplot(China_GDP)

This model includes an additive trend. It assumes that the trend remains constant over time.

ETS_fit2 <- China_GDP %>%
  model(damped_trend = ETS(GDP ~ error('A') + trend('Ad')))

ETS_fit2 |> forecast(h = 10) |>
  autoplot(China_GDP)

This version includes a damped additive trend. The trend will decrease over time, which might reflect more realistic growth expectations as growth slows.

ETS_fit3 <- China_GDP %>%
  model(multiplicative_errors = ETS(GDP ~ error('M') + trend('A')))

ETS_fit3 |> forecast(h = 10) |>
  autoplot(China_GDP)

This configuration assumes multiplicative errors, where the size of the error scales with the level of GDP.

ETS_fit4 <- China_GDP %>%
  model(boxcox = ETS(GDP_bc ~ error('A') + trend('A')))

ETS_fit4 |> forecast(h = 10) |>
  autoplot(China_GDP)

Box-cox transformed GDP data stabilizes the variance, making the model less sensitive to fluctuations. This is not as applicable with a series as stable as GDP.

Chapter 9

Question 11

Choose one of the following seasonal time series: the Australian production of electricity, cement, or gas (from aus_production).

A

Do the data need transforming? If so, find a suitable transformation.

Cement <- aus_production |>
  dplyr::select(Quarter,Cement) |>
  as_tsibble(index=Quarter)

autoplot(Cement)

## Plot variable not specified, automatically selected `.vars = Cement`

gg_subseries(Cement)

## Plot variable not specified, automatically selected `y = Cement`

gg_lag(Cement)

## Plot variable not specified, automatically selected `y = Cement`

# Decompose the time series
decomposed <- Cement |> 
  model(STL(Cement ~ season(window = "periodic"))) # Seasonal-Trend decomposition using LOESS

# View components
components <- components(decomposed)

# Plot the decomposition
autoplot(components)

There appears to be a seasonal pattern in these data, so I will use the seasonally adjusted version.

Cement_adj <- components |>
  mutate(Seasonally_Adjusted = Cement - season_year)

# Plot the adjusted series
autoplot(Cement_adj, Seasonally_Adjusted)

B

Are the data stationary? If not, find an appropriate differencing which yields stationary data.

Cement_adj

## # A dable: 218 x 8 [1Q]
## # Key:     .model [1]
## # :        Cement = trend + season_year + remainder
##    .model               Quarter Cement trend season_year remainder season_adjust
##    <chr>                  <qtr>  <dbl> <dbl>       <dbl>     <dbl>         <dbl>
##  1 "STL(Cement ~ seaso… 1956 Q1    465  545.      -136.     55.8            601.
##  2 "STL(Cement ~ seaso… 1956 Q2    532  544.        23.8   -35.8            508.
##  3 "STL(Cement ~ seaso… 1956 Q3    561  548.        67.9   -54.5            493.
##  4 "STL(Cement ~ seaso… 1956 Q4    570  557.        44.2   -30.9            526.
##  5 "STL(Cement ~ seaso… 1957 Q1    529  577.      -136.     87.5            665.
##  6 "STL(Cement ~ seaso… 1957 Q2    604  581.        23.8    -0.429          580.
##  7 "STL(Cement ~ seaso… 1957 Q3    603  576.        67.9   -40.8            535.
##  8 "STL(Cement ~ seaso… 1957 Q4    582  586.        44.2   -48.4            538.
##  9 "STL(Cement ~ seaso… 1958 Q1    554  600.      -136.     89.7            690.
## 10 "STL(Cement ~ seaso… 1958 Q2    620  609.        23.8   -13.2            596.
## # ℹ 208 more rows
## # ℹ 1 more variable: Seasonally_Adjusted <dbl>

ACF(Cement_adj, var = Seasonally_Adjusted) |> autoplot()

## Warning: The `...` argument of `PACF()` is deprecated as of feasts 0.2.2.
## ℹ ACF variables should be passed to the `y` argument. If multiple variables are
##   to be used, specify them using `vars(...)`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The data appear to be non-stationary, as the ACF remains elevated across the entire period. I’ll create a (one or more) differenced series to address this.

Cement_adj |> 
  ACF(Seasonally_Adjusted) |> 
  autoplot() +
  labs(title = "ACF of Seasonally Adjusted Series")

Cement_adj |> 
  mutate(diff = difference(Seasonally_Adjusted, lag = 3)) |> 
  ACF(diff) |> 
  autoplot() +
  labs(title = "ACF of Differenced Series")

Transforming into the third difference causes the data to appear more like white noise in the ACF plot.

C

Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?

fit_cement1 <- Cement_adj |> 
  model(arima = ARIMA(Seasonally_Adjusted ~ pdq(1,1,0)))

report(fit_cement1)

## Series: Seasonally_Adjusted 
## Model: ARIMA(1,1,0)(1,0,1)[4] 
## 
## Coefficients:
##           ar1    sar1     sma1
##       -0.2742  0.9533  -0.7970
## s.e.   0.0667  0.0363   0.0625
## 
## sigma^2 estimated as 7137:  log likelihood=-1270.26
## AIC=2548.52   AICc=2548.71   BIC=2562.04

fit_cement2 <- Cement_adj |> 
  model(arima2 = ARIMA(Seasonally_Adjusted ~ pdq(2,1,0)))

report(fit_cement2)

## Series: Seasonally_Adjusted 
## Model: ARIMA(2,1,0)(1,0,1)[4] 
## 
## Coefficients:
##           ar1      ar2    sar1     sma1
##       -0.2828  -0.0327  0.9532  -0.8023
## s.e.   0.0693   0.0719  0.0366   0.0634
## 
## sigma^2 estimated as 7166:  log likelihood=-1270.16
## AIC=2550.32   AICc=2550.6   BIC=2567.22

fit_cement3 <- Cement_adj |> 
  model(arima3 = ARIMA(Seasonally_Adjusted ~ pdq(3,1,0)))

report(fit_cement3)

## Series: Seasonally_Adjusted 
## Model: ARIMA(3,1,0)(1,0,1)[4] 
## 
## Coefficients:
##           ar1      ar2     ar3    sar1     sma1
##       -0.2816  -0.0246  0.0226  0.9526  -0.7968
## s.e.   0.0693   0.0763  0.0730  0.0370   0.0666
## 
## sigma^2 estimated as 7195:  log likelihood=-1270.11
## AIC=2552.22   AICc=2552.62   BIC=2572.5

The model with one autoregressive term yielded the lowest AIC value.

D

Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise?

fit_cement1 |>
  dplyr::select(arima) |>
  gg_tsresiduals()

The residuals are within the boundaries in the ACF plot except at lags 8, 17, 18, and 20. This suggests they are autocorrelated. # E Forecast of two years.

# Generate the forecast
forecasts <- fit_cement1 |> 
  dplyr::select(arima) |> 
  forecast(h = 8)

# Convert forecast and historical data to tibbles
forecast_data <- as_tibble(forecasts$.mean) |> mutate(type = "Forecast")
historical_data <- as_tibble(Cement_adj) |> mutate(type = "Historical")

quarters_seqhist <- seq(as.yearqtr("1956 Q1"), as.yearqtr("2010 Q2"), by = 1/4)
quarters_seqfcst <- seq(as.yearqtr("2010 Q3"), as.yearqtr("2012 Q2"), by = 1/4)

# Ensure the time variable is consistent (adjust column name as necessary)
historical_data <- historical_data |> mutate(Time = quarters_seqhist)
forecast_data <- forecast_data |> mutate(Time = quarters_seqfcst)

# Combine data for plotting
combined_data <- bind_rows(historical_data, forecast_data)

# Plot
ggplot(combined_data, aes(x = Time, y = season_adjust, color = type)) + 
  geom_line(size = 1) +  # Combine historical and forecast data
  geom_line(aes(y = value), linetype = "dashed", size = 1, color = "blue") +  # Add line for "value"
  labs(
    title = "Historical and Forecasted Data",
    x = "Time",
    y = "Seasonally Adjusted",
    color = "Data Type"
  ) + 
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 218 rows containing missing values or values outside the scale range
## (`geom_line()`).

F

Compare the forecasts obtained using ETS().

forecast1 <- fit_cement1 |> 
  dplyr::select(arima) |> 
  forecast(h = 8)

forecast2 <- fit_cement2 |> 
  dplyr::select(arima2) |> 
  forecast(h = 8)

forecast3 <- fit_cement3 |> 
  dplyr::select(arima3) |> 
  forecast(h = 8)

forecast_data <- data.frame(
  Time = time(forecast1$.mean),  # Use the time points from the forecasts
  AR1 = as.numeric(forecast1$.mean),
  AR2 = as.numeric(forecast2$.mean),
  AR3 = as.numeric(forecast3$.mean)
)
forecast_long <- reshape2::melt(forecast_data, id.vars = "Time")

ggplot(forecast_long, aes(x = Time, y = value, color = variable)) +
  geom_line(size = 1) +
  labs(
    title = "Comparison of Forecasts",
    x = "Time",
    y = "Forecasted Value",
    color = "Model"
  ) +
  theme_minimal() +
  theme(legend.title = element_blank())

The forecasts with different AR terms and the same differencing compare very closely. This makes sense as we are using a stationary, seasonally adjusted series. In the stationary series, the lag of 1 or 2 periods makes little difference when modeling as we have removed the seasonal variability.

Final problem set: A

Daniel McLaughlin

2024-12-15

Chapter 7

Exercise 1

A

B

C

D

Chapter 2

Question 8

Chapter 7

Question 5

A

Chapter 8

Question 6

A

Chapter 9

Question 11

A

B

C

D

F