jan14_vic_elec <- vic_elec |>
filter(yearmonth(Time) == yearmonth("2014 Jan")) |>
index_by(Date = as_date(Time)) |>
summarise(
Demand = sum(Demand),
Temperature = max(Temperature)
)
jan14_vic_elec |>
autoplot()
## Plot variable not specified, automatically selected `.vars = Demand`
model1 <- lm(Demand ~ Temperature, data = jan14_vic_elec)
summary(model1)
##
## Call:
## lm(formula = Demand ~ Temperature, data = jan14_vic_elec)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49978 -10219 -121 18533 35441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59083.9 17424.8 3.391 0.00203 **
## Temperature 6154.3 601.3 10.235 3.89e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24540 on 29 degrees of freedom
## Multiple R-squared: 0.7832, Adjusted R-squared: 0.7757
## F-statistic: 104.7 on 1 and 29 DF, p-value: 3.89e-11
The coefficient on temperature is positive and statistically significant. The coefficient being positive in the days of a month like January makes sense. In a summer month, electricity may be used more intensely on warmer days for the purposes of air conditioning.
plot(model1)
hist(jan14_vic_elec$Demand)
hist(jan14_vic_elec$Temperature)
The model is performing fairly well upon looking at the residual plots. Some observations are having significant influence on the results, though, including numbers 5, 1 and 26.
jan14_vic_elec |>
model(TSLM(Demand ~ Temperature)) |>
forecast(
new_data(jan14_vic_elec, 1) |>
mutate(Temperature = 15)
) |>
autoplot(jan14_vic_elec)
## Warning: Computation failed in `stat_interval()`.
## Caused by error in `Ops.Date()`:
## ! * not defined for "Date" objects
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
jan14_vic_elec |>
model(TSLM(Demand ~ Temperature)) |>
forecast(
new_data(jan14_vic_elec, 1) |>
mutate(Temperature = 35)
) |>
autoplot(jan14_vic_elec)
## Warning: Computation failed in `stat_interval()`.
## Caused by error in `Ops.Date()`:
## ! * not defined for "Date" objects
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
mean(jan14_vic_elec$Temperature)
## [1] 28.03548
median(jan14_vic_elec$Temperature)
## [1] 26
min(jan14_vic_elec$Temperature)
## [1] 19.6
These forecasts are reliant on the assumptions of temperature. Therefore the predictions of demand are only as good as the predictions of temperature. Based on the history, a temperature of 35 degrees Celsius (7 degrees above average) is more realistic than 15 degrees (13 below average). There is no precedent in the data for a 15 degree day - the previous minimum is ~20 degrees.
forecast_data <- jan14_vic_elec |>
model(TSLM(Demand ~ Temperature)) |>
forecast(
new_data(jan14_vic_elec, 1) |>
mutate(Temperature = 35),
level = c(80, 95) # Request prediction intervals at 80% and 95%
)
forecast_data
## # A fable: 1 x 5 [1D]
## # Key: .model [1]
## .model Date Demand .mean Temperature
## <chr> <date> <dist> <dbl> <dbl>
## 1 TSLM(Demand ~ Temperature) 2014-02-01 N(274484, 6.4e+08) 274484. 35
# Plot the forecast with intervals
forecast_data |>
autoplot(jan14_vic_elec) +
ggtitle("TSLM Forecast with Prediction Intervals")
## Warning: Computation failed in `stat_interval()`.
## Caused by error in `Ops.Date()`:
## ! * not defined for "Date" objects
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf
## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf
all_vic_elec <- vic_elec |>
index_by(Date = as_date(Time)) |>
summarise(
Demand = sum(Demand),
Temperature = max(Temperature)
)
# Calculate the correlation between Demand and Temperature
cor_value <- cor(all_vic_elec$Demand, all_vic_elec$Temperature)
# Create the plot and add the correlation value as text
ggplot(all_vic_elec, aes(x = Temperature, y = Demand)) +
geom_point() + # Scatter plot for Demand vs. Temperature
geom_smooth(method = "lm", color = "blue", se = FALSE) + # Optional: Add a regression line
geom_text(
aes(x = max(Temperature), y = max(Demand),
label = paste("Correlation: ", round(cor_value, 2))),
hjust = 1.1, vjust = 1.1, size = 5, color = "black"
) +
theme_minimal() +
ggtitle("Demand vs. Temperature with Correlation")
## Warning in geom_text(aes(x = max(Temperature), y = max(Demand), label = paste("Correlation: ", : All aesthetics have length 1, but the data has 1096 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## `geom_smooth()` using formula = 'y ~ x'
cor(jan14_vic_elec$Temperature,jan14_vic_elec$Demand)
## [1] 0.8849706
The scatterplot of the two variables, and their low correlation coefficient (0.04), suggest that the relationship is not very strong when looking at the entire dataset. Interestingly, the correlation is much higher in January 2014. This suggests that the model is not generalizable to all months.
# Ensure the Week variable is in the correct format
us_gas <- us_gasoline |>
filter(Week <= yearweek("2004 W52"))
us_gas$Barrels <- as.numeric(us_gas$Barrels)
#us_gas <- us_gas |>
# mutate(Week = as.yearweek(Week))
us_gas <- us_gas |>
mutate(trend = row_number())
us_gas <- us_gas |>
filter(!is.na(Barrels))
fit_gas1 <- us_gas |>
model(TSLM(Barrels ~ trend + fourier(K = 1)))
fit_gas2 <- us_gas |>
model(TSLM(Barrels ~ trend + fourier(K = 2)))
fit_gas3 <- us_gas |>
model(TSLM(Barrels ~ trend + fourier(K = 3)))
# Extract AIC from the model
aic_value1 <- fit_gas1 |>
glance() |>
dplyr::select(AIC)
print(aic_value1)
## # A tibble: 1 × 1
## AIC
## <dbl>
## 1 -1808.
aic_value2 <- fit_gas2 |>
glance() |>
dplyr::select(AIC)
print(aic_value2)
## # A tibble: 1 × 1
## AIC
## <dbl>
## 1 -1813.
aic_value3 <- fit_gas3 |>
glance() |>
dplyr::select(AIC)
print(aic_value3)
## # A tibble: 1 × 1
## AIC
## <dbl>
## 1 -1852.
Lower AIC in model with 1 fourier term suggests that is the better fit among the three.
fit_gas1 |>
gg_tsresiduals()
augment(fit_gas1) |>
features(.innov, ljung_box, lag = 10)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 TSLM(Barrels ~ trend + fourier(K = 1)) 16.3 0.0924
The p value of 0.09 and the ACF thresholds remaining within the thresholds (mostly) suggests that there is not autocorrelation in the residuals of the model. This further suggests that the model is well fit.
# Generate a new data frame for forecasting (52 weeks ahead)
future_data <- us_gasoline |>
filter(Week > max(us_gas$Week)) |>
head(52) |>
mutate(trend = seq(max(us_gas$trend) + 1, by = 1, length.out = 52))
# If future_data is unavailable in the original dataset, create a new one
if (nrow(future_data) < 52) {
future_data <- tibble(
Week = seq(max(us_gas$Week) + 1, by = 1, length.out = 52) |> as.yearweek(),
trend = seq(max(us_gas$trend) + 1, by = 1, length.out = 52)
)
}
# Forecast
forecast_gas <- fit_gas1 |> forecast(new_data = future_data)
us_gasoline <- us_gasoline |>
filter(Week < yearweek("2005 W52"))
# Print the forecast
print(forecast_gas)
## # A fable: 52 x 5 [1W]
## # Key: .model [1]
## .model Week Barrels .mean trend
## <chr> <week> <dist> <dbl> <dbl>
## 1 TSLM(Barrels ~ trend + fourier(K = 1)) 2004 W53 N(8.9, 0.083) 8.88 726
## 2 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W01 N(8.9, 0.083) 8.87 727
## 3 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W02 N(8.9, 0.083) 8.87 728
## 4 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W03 N(8.9, 0.083) 8.87 729
## 5 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W04 N(8.9, 0.083) 8.87 730
## 6 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W05 N(8.9, 0.083) 8.88 731
## 7 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W06 N(8.9, 0.083) 8.89 732
## 8 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W07 N(8.9, 0.083) 8.91 733
## 9 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W08 N(8.9, 0.083) 8.93 734
## 10 TSLM(Barrels ~ trend + fourier(K = 1)) 2005 W09 N(9, 0.083) 8.95 735
## # ℹ 42 more rows
# Visualize the forecast
forecast_gas |>
autoplot(us_gasoline, level = NULL) +
ggtitle("1-year Forecast of Gasoline Barrels") +
xlab("Week") +
ylab("Barrels") +
geom_line(data = us_gasoline, aes(x = Week, y = Barrels), color = "gray") +
labs(color = "Legend")
One glaring feature of the forecast is how smooth it is compared to the
history. The variance in the history is pronounced and likely affected
by seasonality. The forecast does not consider this given its simple
specification.
China_GDP <- global_economy %>%
filter(Country =="China") %>%
dplyr::select(Year, GDP) %>%
mutate(GDP_bc = (GDP^0.5 - 1) / 0.5)
ETS_fit1 <- China_GDP %>%
model(ses = ETS(GDP ~ error('A') + trend('A')))
ETS_fit1 |> forecast(h = 10) |>
autoplot(China_GDP)
This model includes an additive trend. It assumes that the trend remains constant over time.
ETS_fit2 <- China_GDP %>%
model(damped_trend = ETS(GDP ~ error('A') + trend('Ad')))
ETS_fit2 |> forecast(h = 10) |>
autoplot(China_GDP)
This version includes a damped additive trend. The trend will decrease over time, which might reflect more realistic growth expectations as growth slows.
ETS_fit3 <- China_GDP %>%
model(multiplicative_errors = ETS(GDP ~ error('M') + trend('A')))
ETS_fit3 |> forecast(h = 10) |>
autoplot(China_GDP)
This configuration assumes multiplicative errors, where the size of the error scales with the level of GDP.
ETS_fit4 <- China_GDP %>%
model(boxcox = ETS(GDP_bc ~ error('A') + trend('A')))
ETS_fit4 |> forecast(h = 10) |>
autoplot(China_GDP)
Box-cox transformed GDP data stabilizes the variance, making the model less sensitive to fluctuations. This is not as applicable with a series as stable as GDP.
Choose one of the following seasonal time series: the Australian production of electricity, cement, or gas (from aus_production).
Do the data need transforming? If so, find a suitable transformation.
Cement <- aus_production |>
dplyr::select(Quarter,Cement) |>
as_tsibble(index=Quarter)
autoplot(Cement)
## Plot variable not specified, automatically selected `.vars = Cement`
gg_subseries(Cement)
## Plot variable not specified, automatically selected `y = Cement`
gg_lag(Cement)
## Plot variable not specified, automatically selected `y = Cement`
# Decompose the time series
decomposed <- Cement |>
model(STL(Cement ~ season(window = "periodic"))) # Seasonal-Trend decomposition using LOESS
# View components
components <- components(decomposed)
# Plot the decomposition
autoplot(components)
There appears to be a seasonal pattern in these data, so I will use the
seasonally adjusted version.
Cement_adj <- components |>
mutate(Seasonally_Adjusted = Cement - season_year)
# Plot the adjusted series
autoplot(Cement_adj, Seasonally_Adjusted)
Are the data stationary? If not, find an appropriate differencing which yields stationary data.
Cement_adj
## # A dable: 218 x 8 [1Q]
## # Key: .model [1]
## # : Cement = trend + season_year + remainder
## .model Quarter Cement trend season_year remainder season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "STL(Cement ~ seaso… 1956 Q1 465 545. -136. 55.8 601.
## 2 "STL(Cement ~ seaso… 1956 Q2 532 544. 23.8 -35.8 508.
## 3 "STL(Cement ~ seaso… 1956 Q3 561 548. 67.9 -54.5 493.
## 4 "STL(Cement ~ seaso… 1956 Q4 570 557. 44.2 -30.9 526.
## 5 "STL(Cement ~ seaso… 1957 Q1 529 577. -136. 87.5 665.
## 6 "STL(Cement ~ seaso… 1957 Q2 604 581. 23.8 -0.429 580.
## 7 "STL(Cement ~ seaso… 1957 Q3 603 576. 67.9 -40.8 535.
## 8 "STL(Cement ~ seaso… 1957 Q4 582 586. 44.2 -48.4 538.
## 9 "STL(Cement ~ seaso… 1958 Q1 554 600. -136. 89.7 690.
## 10 "STL(Cement ~ seaso… 1958 Q2 620 609. 23.8 -13.2 596.
## # ℹ 208 more rows
## # ℹ 1 more variable: Seasonally_Adjusted <dbl>
ACF(Cement_adj, var = Seasonally_Adjusted) |> autoplot()
## Warning: The `...` argument of `PACF()` is deprecated as of feasts 0.2.2.
## ℹ ACF variables should be passed to the `y` argument. If multiple variables are
## to be used, specify them using `vars(...)`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The data appear to be non-stationary, as the ACF remains elevated across
the entire period. I’ll create a (one or more) differenced series to
address this.
Cement_adj |>
ACF(Seasonally_Adjusted) |>
autoplot() +
labs(title = "ACF of Seasonally Adjusted Series")
Cement_adj |>
mutate(diff = difference(Seasonally_Adjusted, lag = 3)) |>
ACF(diff) |>
autoplot() +
labs(title = "ACF of Differenced Series")
Transforming into the third difference causes the data to appear more
like white noise in the ACF plot.
Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?
fit_cement1 <- Cement_adj |>
model(arima = ARIMA(Seasonally_Adjusted ~ pdq(1,1,0)))
report(fit_cement1)
## Series: Seasonally_Adjusted
## Model: ARIMA(1,1,0)(1,0,1)[4]
##
## Coefficients:
## ar1 sar1 sma1
## -0.2742 0.9533 -0.7970
## s.e. 0.0667 0.0363 0.0625
##
## sigma^2 estimated as 7137: log likelihood=-1270.26
## AIC=2548.52 AICc=2548.71 BIC=2562.04
fit_cement2 <- Cement_adj |>
model(arima2 = ARIMA(Seasonally_Adjusted ~ pdq(2,1,0)))
report(fit_cement2)
## Series: Seasonally_Adjusted
## Model: ARIMA(2,1,0)(1,0,1)[4]
##
## Coefficients:
## ar1 ar2 sar1 sma1
## -0.2828 -0.0327 0.9532 -0.8023
## s.e. 0.0693 0.0719 0.0366 0.0634
##
## sigma^2 estimated as 7166: log likelihood=-1270.16
## AIC=2550.32 AICc=2550.6 BIC=2567.22
fit_cement3 <- Cement_adj |>
model(arima3 = ARIMA(Seasonally_Adjusted ~ pdq(3,1,0)))
report(fit_cement3)
## Series: Seasonally_Adjusted
## Model: ARIMA(3,1,0)(1,0,1)[4]
##
## Coefficients:
## ar1 ar2 ar3 sar1 sma1
## -0.2816 -0.0246 0.0226 0.9526 -0.7968
## s.e. 0.0693 0.0763 0.0730 0.0370 0.0666
##
## sigma^2 estimated as 7195: log likelihood=-1270.11
## AIC=2552.22 AICc=2552.62 BIC=2572.5
The model with one autoregressive term yielded the lowest AIC value.
Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise?
fit_cement1 |>
dplyr::select(arima) |>
gg_tsresiduals()
The residuals are within the boundaries in the ACF plot except at lags
8, 17, 18, and 20. This suggests they are autocorrelated. # E Forecast
of two years.
# Generate the forecast
forecasts <- fit_cement1 |>
dplyr::select(arima) |>
forecast(h = 8)
# Convert forecast and historical data to tibbles
forecast_data <- as_tibble(forecasts$.mean) |> mutate(type = "Forecast")
historical_data <- as_tibble(Cement_adj) |> mutate(type = "Historical")
quarters_seqhist <- seq(as.yearqtr("1956 Q1"), as.yearqtr("2010 Q2"), by = 1/4)
quarters_seqfcst <- seq(as.yearqtr("2010 Q3"), as.yearqtr("2012 Q2"), by = 1/4)
# Ensure the time variable is consistent (adjust column name as necessary)
historical_data <- historical_data |> mutate(Time = quarters_seqhist)
forecast_data <- forecast_data |> mutate(Time = quarters_seqfcst)
# Combine data for plotting
combined_data <- bind_rows(historical_data, forecast_data)
# Plot
ggplot(combined_data, aes(x = Time, y = season_adjust, color = type)) +
geom_line(size = 1) + # Combine historical and forecast data
geom_line(aes(y = value), linetype = "dashed", size = 1, color = "blue") + # Add line for "value"
labs(
title = "Historical and Forecasted Data",
x = "Time",
y = "Seasonally Adjusted",
color = "Data Type"
) +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 218 rows containing missing values or values outside the scale range
## (`geom_line()`).
Compare the forecasts obtained using ETS().
forecast1 <- fit_cement1 |>
dplyr::select(arima) |>
forecast(h = 8)
forecast2 <- fit_cement2 |>
dplyr::select(arima2) |>
forecast(h = 8)
forecast3 <- fit_cement3 |>
dplyr::select(arima3) |>
forecast(h = 8)
forecast_data <- data.frame(
Time = time(forecast1$.mean), # Use the time points from the forecasts
AR1 = as.numeric(forecast1$.mean),
AR2 = as.numeric(forecast2$.mean),
AR3 = as.numeric(forecast3$.mean)
)
forecast_long <- reshape2::melt(forecast_data, id.vars = "Time")
ggplot(forecast_long, aes(x = Time, y = value, color = variable)) +
geom_line(size = 1) +
labs(
title = "Comparison of Forecasts",
x = "Time",
y = "Forecasted Value",
color = "Model"
) +
theme_minimal() +
theme(legend.title = element_blank())
The forecasts with different AR terms and the same differencing compare very closely. This makes sense as we are using a stationary, seasonally adjusted series. In the stationary series, the lag of 1 or 2 periods makes little difference when modeling as we have removed the seasonal variability.