Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6 in Hyndman. Please submit both the Rpubs link as well as your .rmd file.
advert has (at least) two series: sales and advertising spend. If you plot them on the same panel, one variable can dominate the scale and hide variation in the other. With facets = TRUE, each series gets its own panel but shares the same time axis, so you can clearly see patterns, trends, and seasonality in each series and still compare their timing without the lines overlapping or being distorted by different scales.
\(y_t=a+bx_t+\eta_t\)
##
## Call:
## tslm(formula = sales ~ advert, data = advert)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8194 -1.1375 -0.2412 0.9123 2.7519
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78.73426 0.59735 131.81 < 2e-16 ***
## advert 0.53426 0.04098 13.04 7.96e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.506 on 22 degrees of freedom
## Multiple R-squared: 0.8854, Adjusted R-squared: 0.8802
## F-statistic: 170 on 1 and 22 DF, p-value: 7.955e-12
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: Residuals from Linear regression model
## LM test = 12.498, df = 5, p-value = 0.02856
## Series: advert[, "sales"]
## Regression with ARIMA(0,0,0) errors
##
## Coefficients:
## intercept xreg
## 78.7343 0.5343
## s.e. 0.5719 0.0392
##
## sigma^2 = 2.267: log likelihood = -42.83
## AIC=91.66 AICc=92.86 BIC=95.2
Using Arima(advert[,“sales”], xreg = advert[,“advert”], order = c(0,0,0)) is essentially the same regression as tslm() here, but written in the ARIMA framework. The fitted values will be (almost) identical, but Arima() makes it easy to later add AR/MA terms to model autocorrelated errors (e.g. ARIMA(1,0,0) with xreg), while tslm() is just plain regression with no built-in ARMA error structure.
## Series: advert[, "sales"]
## Regression with ARIMA(0,1,0) errors
##
## Coefficients:
## xreg
## 0.5063
## s.e. 0.0210
##
## sigma^2 = 1.201: log likelihood = -34.22
## AIC=72.45 AICc=73.05 BIC=74.72
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01279435 1.049041 0.8745732 -0.00247038 1.032833 0.189587
## ACF1
## Training set -0.09614401
Refitting the model with auto.arima() makes only a small difference to the estimated intercept and advertising coefficient – the strength of the relationship between advertising and sales stays very similar. The main change is in the error model: auto.arima() selects a low-order ARIMA model for the residuals (typically something like an AR(1) error, i.e. ARIMA(1,0,0)), which captures the autocorrelation in the errors and makes the residuals much closer to white nois
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,1,0) errors
## Q* = 1.5622, df = 5, p-value = 0.9058
##
## Model df: 0. Total lags used: 5
The residual ACF shows no significant autocorrelation, and the Ljung–Box test is not significant, so the residuals look like white noise. Therefore, the fitted model appears adequate.
new_advert <- matrix(10, nrow = 6, ncol = 1)
fc_sales <- forecast(fit_auto, xreg = new_advert, h = 6)
fc_sales## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 25 85.43173 84.02755 86.83591 83.28422 87.57924
## 26 85.43173 83.44592 87.41754 82.39469 88.46877
## 27 85.43173 82.99962 87.86384 81.71214 89.15133
## 28 85.43173 82.62337 88.24009 81.13671 89.72675
## 29 85.43173 82.29189 88.57157 80.62975 90.23371
## 30 85.43173 81.99220 88.87126 80.17143 90.69203
y <- LakeHuron
t <- as.numeric(time(y))
knot <- 1920
z <- pmax(0, t - knot) # (t - 1920)+
X <- cbind(t, z) # two regressors: overall trend + post-1920 change in slope
fit_pw <- auto.arima(y, xreg = X)
summary(fit_pw)## Series: y
## Regression with ARIMA(2,0,0) errors
##
## Coefficients:
## ar1 ar2 intercept t z
## 0.9628 -0.3107 688.0840 -0.0572 0.0633
## s.e. 0.0973 0.0984 30.5625 0.0161 0.0265
##
## sigma^2 = 0.4594: log likelihood = -98.86
## AIC=209.73 AICc=210.65 BIC=225.24
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.006564992 0.6602643 0.5171943 0.0009985589 0.08933195 0.8832368
## ACF1
## Training set -0.0007988887
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 3.5674, df = 8, p-value = 0.8939
##
## Model df: 2. Total lags used: 10
t_last <- max(t)
t_future <- seq(t_last + 1, t_last + 30)
z_future <- pmax(0, t_future - knot)
X_future <- cbind(t_future, z_future)
fc_30 <- forecast(fit_pw, xreg = X_future, h = 30)
fc_30## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1973 579.5223 578.6537 580.3909 578.1939 580.8507
## 1974 579.0813 577.8755 580.2870 577.2372 580.9253
## 1975 578.7948 577.4756 580.1140 576.7772 580.8124
## 1976 578.6581 577.3144 580.0019 576.6030 580.7132
## 1977 578.6177 577.2716 579.9638 576.5590 580.6764
## 1978 578.6233 577.2772 579.9694 576.5646 580.6820
## 1979 578.6435 577.2971 579.9898 576.5843 580.7026
## 1980 578.6632 577.3166 580.0098 576.6037 580.7227
## 1981 578.6781 577.3314 580.0248 576.6185 580.7377
## 1982 578.6884 577.3417 580.0352 576.6288 580.7481
## 1983 578.6959 577.3491 580.0426 576.6362 580.7556
## 1984 578.7020 577.3552 580.0487 576.6423 580.7616
## 1985 578.7077 577.3609 580.0544 576.6480 580.7673
## 1986 578.7134 577.3666 580.0601 576.6537 580.7730
## 1987 578.7192 577.3725 580.0660 576.6596 580.7789
## 1988 578.7252 577.3785 580.0720 576.6655 580.7849
## 1989 578.7313 577.3846 580.0780 576.6716 580.7910
## 1990 578.7374 577.3907 580.0842 576.6777 580.7971
## 1991 578.7436 577.3968 580.0903 576.6839 580.8032
## 1992 578.7497 577.4029 580.0964 576.6900 580.8094
## 1993 578.7558 577.4091 580.1026 576.6961 580.8155
## 1994 578.7619 577.4152 580.1087 576.7023 580.8216
## 1995 578.7681 577.4213 580.1148 576.7084 580.8277
## 1996 578.7742 577.4274 580.1209 576.7145 580.8338
## 1997 578.7803 577.4335 580.1270 576.7206 580.8400
## 1998 578.7864 577.4397 580.1332 576.7267 580.8461
## 1999 578.7925 577.4458 580.1393 576.7329 580.8522
## 2000 578.7987 577.4519 580.1454 576.7390 580.8583
## 2001 578.8048 577.4580 580.1515 576.7451 580.8644
## 2002 578.8109 577.4641 580.1576 576.7512 580.8706
data(motel)
average_cost <- motel[, "Takings"] / motel[, "Roomnights"]
average_cost_ts <- ts(
average_cost,
start = start(motel),
frequency = frequency(motel)
)
autoplot(average_cost_ts) +
ggtitle("Average cost per night in Victoria")cpi_q <- window(cpimel, start = c(1980, 1), end = c(1995, 2))
cpi_monthly <- ts(
rep(cpi_q, each = 3),
start = c(1980, 1),
frequency = 12
)
autoplot(cpi_monthly) + ggtitle("Estimated monthly CPI from cpimel")average_cost <- motel[,"Takings"] / motel[,"Roomnights"]
average_cost_ts <- ts(
average_cost,
start = start(motel),
frequency = frequency(motel)
)
cpi_q <- window(cpimel, start = c(1980, 1), end = c(1995, 2))
cpi_monthly <- ts(
rep(cpi_q, each = 3),
start = c(1980, 1),
frequency = 12
)
autoplot(average_cost_ts) + ggtitle("Average cost per night (Victoria)")Both the average cost series and the CPI series show strong upward trends and the size of the fluctuations tends to grow as the level increases, suggesting non-constant variance and roughly multiplicative behavior. Taking logarithms stabilizes the variance and turns multiplicative changes into additive ones, making the series more suitable for ARIMA/linear models and giving more reliable forecasts.
I first took logs of both series so that variability was more stable and the relationship between price and CPI looked approximately linear. Then I regressed log(average cost) on log(CPI) using auto.arima(…, xreg = log(CPI)), letting it choose a low-order ARIMA error structure that minimized AIC. I kept the model selected by auto.arima because its residuals showed no significant autocorrelation and passed the usual diagnostics, so the regression + ARIMA errors adequately captured both the mean relationship and the time-series dependence.
avg_cost <- motel[,"Takings"] / motel[,"Roomnights"]
avg_cost <- ts(avg_cost, start = start(motel), frequency = 12)
cpi_q <- window(cpimel, start = c(1980, 1), end = c(1995, 2))
cpi_month <- ts(rep(cpi_q, each = 3), start = c(1980, 1), frequency = 12)
lcost <- log(avg_cost)
lcpi <- log(cpi_month)
fit_motel <- auto.arima(lcost, xreg = lcpi)
fit_cpi <- auto.arima(lcpi)
fc_cpi <- forecast(fit_cpi, h = 12)
lcpi_future <- fc_cpi$mean
fc_cost_log <- forecast(fit_motel, xreg = lcpi_future, h = 12)
fc_cost <- fc_cost_log
fc_cost$x <- avg_cost
fc_cost$mean <- exp(fc_cost_log$mean)
fc_cost$lower <- exp(fc_cost_log$lower)
fc_cost$upper <- exp(fc_cost_log$upper)
autoplot(fc_cost)
# 9.5 Electricity consumption is often modelled as a function of
temperature. Temperature is measured by daily heating degrees and
cooling degrees. Heating degrees is 18 C minus the average daily
temperature when the daily average is below 18 C; otherwise it is zero.
This provides a measure of our need to heat ourselves as temperature
falls. Cooling degrees measures our need to cool ourselves as the
temperature rises. It is defined as the average daily temperature minus
18 C when the daily verage is above 18 C; otherwise it is zero. Let yt
denote the monthly total of kilowatt-hours of electricity used, let x1,t
denote the monthly total of heating degrees, and let x2,t denote the
monthly total of cooling degrees. An analyst fits the following model to
a set of such data:
\(y_t^* = \beta_1 x_{1,t}^* + \beta_2 x_{2,t}^* +\eta_t\) where \((1-B)(1-B^{1,2})\eta_t = \frac{1-\theta_1B}{1-\phi_{12}B^{12}-\phi_{24}B^{24}} \epsilon_t\)
In this model, the “errors” are the leftover part of the response after subtracting the predictor effects.
These errors are not assumed to be independent. Instead, they follow a seasonal time-series pattern: after taking one regular difference and one seasonal difference with period 12, the differenced errors are driven by white-noise shocks
This means the residuals can be related to past values (especially at seasonal lags 12 and 24), rather than being purely random.
The estimates of \(\beta_1\) and \(\beta_2\) show how electricity consumption changes when each explanatory variable changes, while holding everything else constant. In simple terms, \(\beta_1\) tells us how much electricity use goes up or down when \(x^*_{1,t}\) increase by one unit, and \(\beta_2\) tells us the same for \(x^*_{2,t}\) . Because the model already accounts for seasonality and patterns over time in the errors, these coefficients reflect the underlying effect of the variables on electricity consumption, rather than short-term or seasonal swings.
$ Y^* = _1 X^*_{1,t} + _2 X^*_{2,t} + _t $
$ _t = _t - 1 + _t -12 - _t - 13 + _12 _t -12 + _24 _t - 24 + _t - _1 _t - 1 $
To forecast the next 12 months, first forecast the predictors $ x^*_{1,t}$ and $ x^*_{2,1} $ (from weather forecasts or seasonal averages). Plug those into the regression to get a base forecase $ Y^*_{t+h} + _2 x^*{2,t+h} $ Then use the fitted SARIMA model to forecast the resuduals $ {t+h} $ from the past errors (including season lags 12 and 24) settingt future shocks $_{t+h} = 0 $ for point forecasts. Add them together for the final forecast: $ Y^*{t+h} = {1, t+h} + _2 x^*{1, 1+h} + { t+h} $ and use the SARIA innovation varience to form presition intervals
for (K in 0:K_max) {
if (K == 0) {
xreg <- xreg_base
} else {
Fk <- fourier(y, K = K) # Fourier terms aligned to y
xreg <- cbind(xreg_base, Fk)
}
fits[[K + 1]] <- auto.arima(
y_bc,
xreg = xreg,
seasonal = FALSE, # seasonality handled by Fourier terms
stepwise = FALSE,
approximation = FALSE
)
aic_tbl$AIC[aic_tbl$K == K] <- AIC(fits[[K + 1]])
}