Question 1.a
From visual inspection of the time series plot, lag-1 autocorrelation of the series would exhibit positive autocorrelation.
Analyzing the graph for lag-1 by comparing the output at any t with the output at t-1, we find that a large percentage of the trends (slopes) at t and t-1 are both increasing or both decreasing (y[t] and y[t-1] are moving in the same direction [stickiness]) which constitutes a positive autocorrelation.
Question 1.b
The original series of work hours per week was plotted.
The autocorrelations were computed: (1) without specifying a maximum lag, and (2) for lag-1.
The lag-1 autocorrelation of 0.928 was positive as predicted by visual inspection.
#setwd("~/wd678/Unit 5b")
#install.packages("forecast")
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
dataFrame <- read.csv("CanadianWorkHours.csv", stringsAsFactors=FALSE, header=TRUE)
workHours.ts <- ts(dataFrame$Hours.per.week, start=c(1966,1), end = c(2000,1), freq=1)
plot(workHours.ts, bty="l", main="Original series - Work hours per week")

Below is the Acf graph without a maximum lag specified in the argument. Lag-1 had the largest autocorrelation.
lags.ac <- Acf(workHours.ts)

lags.ac
##
## Autocorrelations of series 'workHours.ts', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.928 0.839 0.752 0.665 0.571 0.473 0.369 0.265 0.164
## 10 11 12 13 14 15
## 0.047 -0.082 -0.185 -0.261 -0.310 -0.346
Below is the autocorrelation graph for lag-1.
lag1.ac <- Acf(workHours.ts,lag.max=1)

lag1.ac
##
## Autocorrelations of series 'workHours.ts', by lag
##
## 0 1
## 1.000 0.928
Question 2a
Below are the time plots for the original closing price series and then the differenced series.
walmart <- read.csv("WalmartStock.csv", stringsAsFactors=FALSE, header=TRUE)
stockClose.ts <- ts(walmart$Close)
yrange=range(stockClose.ts)
plot(stockClose.ts, bty="l", main="Original series - Closing prices")

stockPriceLag1 <- diff(stockClose.ts,lag=1)
plot(stockPriceLag1, ylab="Lag-1", xlab="Time", bty="l", main="Time plot for Feb '01-02 stock prices with lag-1 differencing")

Question 2.b
The following information is relevant to testing whether the series is a random walk:
1. The AR(1) slope coefficient for the closing price series - If this slope coefficient is equal to one, then the series is a random walk.
2. The autocorrelations of the differenced series - If all the lags of a series are approximately equal to zero, then the series is a random walk. This means that there was not a relationship in the direction (slope) of the outputs of any two periods.
Question 2.c
The AR(1) model output in Table 7-4 was recreated below:
The Walmart stock closing prices was a random walk. The slope coefficient (0.9558) was within 2.4 standard errors of the number one. The statistical significance was tested. The two-tailed p-value from a t-distribution was 0.01896261. For an alpha=0.01, the p-value was larger so the null hypothesis that the series was a random walk was accepted.
stocks.AR1 <- Arima(stockClose.ts, order=c(1,0,0))
stocks.AR1
## Series: stockClose.ts
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.9558 52.9497
## s.e. 0.0187 1.3280
##
## sigma^2 estimated as 0.9815: log likelihood=-349.8
## AIC=705.59 AICc=705.69 BIC=716.13
2*pt(-abs((1-stocks.AR1$coef["ar1"])/0.0187), df=length(stockClose.ts)-1)
## ar1
## 0.01896261
Question 2.d
The following “true or false” statements describe the sentence that follows them:
TRUE - It is impossible to obtain useful forecasts of the series. Forecast models could not predict better than the naive forecast.
FALSE - The series is random. - The changes in output, up or down, from one period to another in the same series, are not related. The presence of random changes in the directions of outputs does not imply that the series outputs are random. Series outputs can be influenced by external information.
TRUE - The changes in the series from one period to the other are random. - The change in output, up or down, occurring in any period of the series cannot be used to predict the direction of change occuring in other periods of the same series.
Question 3.a
Below is the plot of a regression model, with linear trend and monthly predictors, for the original Sales time series. The next plot is for: (1) forecasts fitted for the training period, (2) forecasts for the validation period and (3) the forecast for February 2002. For February 2002, the log scale forecasted value was 9.74 and original scale value was 17063.
data.Frame <- read.csv("SouvenirSales.csv", stringsAsFactors=FALSE, header=TRUE)
souvSales.ts <- ts(data.Frame$Sales, start=c(1995,1), end = c(2001,12), freq=12)
plot(souvSales.ts, bty="l", main="Original series for Souvenir Sales")

nValid3 <- 12
nTrain3 <- length(souvSales.ts) - nValid3
train3.ts <- window(souvSales.ts, start = c(1995,1), end=c(1995, nTrain3))
valid3.ts <- window(souvSales.ts, start = c(1995, nTrain3 + 1), end = c(1995, nTrain3 + nValid3))
The regression model of log(sales) was found for the training period.
lmTrainSouvSales <- tslm(log(train3.ts) ~ trend + season)
summary(lmTrainSouvSales)
##
## Call:
## tslm(formula = log(train3.ts) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9306
## F-statistic: 80.4 on 12 and 59 DF, p-value: < 2.2e-16
The regression model of log(sales) was found for the validation period.
lmValidSouvSalesForecast <- forecast(lmTrainSouvSales, h=nValid3)
lmValidSouvSalesForecast$mean
## Jan Feb Mar Apr May Jun Jul
## 2001 9.188097 9.491232 9.925335 9.625329 9.694286 9.740741 9.898195
## Aug Sep Oct Nov Dec
## 2001 9.882831 9.992619 10.107664 10.600248 11.372615
exp(lmValidSouvSalesForecast$mean)
## Jan Feb Mar Apr May Jun Jul
## 2001 9780.022 13243.095 20441.749 15143.541 16224.628 16996.137 19894.424
## Aug Sep Oct Nov Dec
## 2001 19591.112 21864.492 24530.299 40144.775 86908.868
The residuals of the regression model of log(sales) in the validation period was calculated.
lmValidResiduals <- log(valid3.ts) - lmValidSouvSalesForecast$mean
lmValidResiduals
## Jan Feb Mar Apr May
## 2001 0.04627623 -0.16160882 0.06556110 0.13644081 -0.01407973
## Jun Jul Aug Sep Oct
## 2001 0.09025773 0.27360664 0.37785931 0.33304008 0.22829793
## Nov Dec
## 2001 0.14984574 0.18586333
The forecast for February 2002 was calculated in the log and original scales.
febForecast <- lmTrainSouvSales$coefficients["(Intercept)"] + lmTrainSouvSales$coefficients["trend"]*86 + lmTrainSouvSales$coefficients["season2"]
febForecast
## (Intercept)
## 9.744667
exp(febForecast)
## (Intercept)
## 17062.99
A plot was created for: (1) the log(sales) original series, (2) the regression model for the training period, (3) the forecast for the validation period, and (4) the forecast for February 2002.
#yrange2=range(log(souvSales.ts))
plot(c(1995,2003), c(6.8,12.6), type="n", xlab="Time", ylab="log(Sales [$])", bty="l", xaxt="n", yaxt="n", main="Regression model for log(Sales)")
axis(1,at=seq(1995,2003,1), labels=format(seq(1995,2003,1)))
axis(2,at=seq(6.8,12.6,1.0), labels=format(seq(6.8,12.6,1.0)), las=2)
lines(log(souvSales.ts), lty=2)
lines(lmTrainSouvSales$fitted, col="red")
lines(lmValidSouvSalesForecast$mean, col="green")
points(2003, febForecast, col="blue")
legend(1994.75,12.8, c("Original Series", "Fitted in training period", "Forecasted in validation period", "February 2002"), lty=c(2,1,1,1), col=c("black", "red", "green","blue"), bty="n")

Question 3.b
The ACF for forecast errors up to lag-15 was graphed and an AR(2) model was fit below. The null hypothesis that the AR(2) was a random walk was rejected.
train.Resid.ACF <- Acf(lmTrainSouvSales$residuals, lag.max=15 )

AR2Model.Errors.Train <- Arima(lmTrainSouvSales$residuals, order=c(2,0,0))
AR2Model.Errors.Train
## Series: lmTrainSouvSales$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
residuals <- Acf(lmTrainSouvSales$residuals)

residuals
##
## Autocorrelations of series 'lmTrainSouvSales$residuals', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.459 0.485 0.194 0.088 0.154 0.016 0.030 0.106 0.034
## 10 11 12 13 14 15 16 17 18 19
## 0.152 -0.055 -0.012 -0.047 -0.077 -0.023 -0.139 -0.056 -0.125 -0.135
## 20 21 22 23 24
## -0.088 -0.258 -0.203 -0.257 -0.334
See below: The t-statistics and p-values were statistically significant (t-statistic was outside the (-2) to 2 interval, and p < 0.0.5). The residuals series of the regression model for sales was not a random walk.
Here are the t-statistics:
AR2Model.Errors.Train$coef["ar1"]/0.1090
## ar1
## 2.818482
AR2Model.Errors.Train$coef["ar2"]/0.1102
## ar2
## 3.346186
Here are the p-values:
2*pnorm(-abs(AR2Model.Errors.Train$coef["ar1"]/0.1090))
## ar1
## 0.004825136
2*pnorm(-abs(AR2Model.Errors.Train$coef["ar2"]/0.1102))
## ar2
## 0.0008193151
Acf(AR2Model.Errors.Train$residuals)

Question 3.b.i
The ACF plot for residuals of predicted sales showed significant correlation in lag-1 and lag-2. This indicated that the trend was not adequately captured by the regression model of sales. The uncaptured trend was carried forward to the residual series. One could attempt to better capture trend in the model for sales or capture the trend using a second-layer AR model.
The estimated coefficients of the AR(1) and AR(2) models were not equal to one after considering their standard errors; therefore, the residual series was not a random walk. This indicated that there was still alot of systemic pattern to the data. The trend of sales had not been adequately captured by the regression model. Trend had been passed on to the residual series. Lag-12 was less than threshold; therefore, seasonality had been captured by the regression model.
The ACF graph for the AR(2) model doesn’t have autocorrelation for any lags because all the autocorrelation of the original sales time series was captured.
Question 3.b.ii
Below are the January 2001 & January 2002 sales forecasts (in log and original scales) which were adjusted using an AR(2) model of the forecasted error.
The January 2002 forecast (in log and original scales), before adjustment, was calculated below:
january2002Forecast <- lmTrainSouvSales$coefficients["(Intercept)"] + lmTrainSouvSales$coefficients["trend"]*85
january2002Forecast
## (Intercept)
## 9.441533
exp(january2002Forecast)
## (Intercept)
## 12601.02
The above AR(2) training period model of the regression model errors was copied below:
#errors.AR2Model.Train <- Arima(lmTrainSouvSales$residuals, order=c(2,0,0))
#summary(errors.AR2Model.Train)
AR2Model.Errors.Train <- Arima(lmTrainSouvSales$residuals, order=c(2,0,0))
summary(AR2Model.Errors.Train)
## Series: lmTrainSouvSales$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.004605836 0.1401718 0.1140246 854.4841 1235.335 0.5953986
## ACF1
## Training set 0.02836869
An AR(2) validation period model of the regression model’s errors follows below:
errorsAR2Model.Valid <- forecast(AR2Model.Errors.Train, h=nValid3)
errorsAR2Model.Valid$mean
## Jan Feb Mar Apr May
## 2001 0.107882119 0.098551352 0.069245043 0.056801003 0.042171322
## Jun Jul Aug Sep Oct
## 2001 0.033088136 0.024902959 0.019038932 0.014219137 0.010576068
## Nov Dec
## 2001 0.007679567 0.005446339
errorsAR2Model.Valid$mean[12]
## [1] 0.005446339
errorsAR2Model.Valid$mean[11]
## [1] 0.007679567
exp(errorsAR2Model.Valid$mean)
## Jan Feb Mar Apr May Jun Jul
## 2001 1.113916 1.103571 1.071699 1.058445 1.043073 1.033642 1.025216
## Aug Sep Oct Nov Dec
## 2001 1.019221 1.014321 1.010632 1.007709 1.005461
The adjusted 2001 forecast, computed below, is given in original and log scales.
For January 2001, the sales forecast was $9,781.14
adjustedForecast2001 <- exp(lmValidSouvSalesForecast$mean) + exp(errorsAR2Model.Valid$mean)
adjustedForecast2001
## Jan Feb Mar Apr May Jun Jul
## 2001 9781.136 13244.198 20442.820 15144.599 16225.671 16997.171 19895.449
## Aug Sep Oct Nov Dec
## 2001 19592.131 21865.507 24531.310 40145.783 86909.874
log(adjustedForecast2001)
## Jan Feb Mar Apr May Jun Jul
## 2001 9.188211 9.491315 9.925387 9.625399 9.694350 9.740802 9.898246
## Aug Sep Oct Nov Dec
## 2001 9.882883 9.992666 10.107706 10.600273 11.372627
The actual and adjusted forecasted sales in 2001 for the validation period were plotted below:
plot(valid3.ts,main="Actual & adjusted forecasted sales in validation period")
lines(exp(lmValidSouvSalesForecast$mean), col="black")
lines(adjustedForecast2001, col="blue")
legend(2001,100000, c("Adjusted Forecast","Actuals"), lty=c(1,1),col=c("black","blue"))

For January 2002 sales, the error of regression was calculated using past values as predictors in the equation for an AR(2) model in the validation period. The intercept and beta coefficients were taken from the summary of the AR(2) model above.
errors.January2002ForecastErrors <- -0.0025 + (0.3072*(errorsAR2Model.Valid$mean[12])) + (0.3687*(errorsAR2Model.Valid$mean[11]))
errors.January2002ForecastErrors
## [1] 0.002004572
The AR(2) model for the errors of the regression of sales was plotted in the training and validation period, and for the January 2002 forecast.
plot(c(1995,2003), c(-0.2,0.64), type="n",xlab="Time",ylab="log(error of sales regression)",bty="l",xaxt="n",yaxt="n",main="AR(2) model for the errors in regression of sales")
axis(1,at=seq(1995,2003,1), labels=format(seq(1995,2003,1)))
axis(2,at=seq(-0.2,0.64,0.1), labels=format(seq(-0.2,0.64,0.1)),las=2)
lines(lmTrainSouvSales$residuals, lty=1)
lines(lmValidResiduals, lty=1)
lines(AR2Model.Errors.Train$fitted, lty=2, col="red")
lines(errorsAR2Model.Valid$mean, lty=2, col="green")
points(2003, errors.January2002ForecastErrors, col="blue")
legend(1994.75,0.68, c("regression errors (traing & validation periods)","AR fitted errors in training period", "AR forecasted errors in validation period", "AR forecasted error for January 2002"), lty=c(1,2,2,1), col=c("black","red", "green", "blue"), bty="n")

The forecasted sales for January 2002, obtained using the regression model, was adjusted with the forecasted error, obtained using the AR(2) model. The original and log scales are below:
adjustedJanuary2002Forecast <- exp(january2002Forecast) + exp(errors.January2002ForecastErrors)
adjustedJanuary2002Forecast
## (Intercept)
## 12602.02
log(adjustedJanuary2002Forecast)
## (Intercept)
## 9.441612
Question 4
The original appliance shipments time series is plotted below:
data.Frame <- read.csv("ApplianceShipments.csv", stringsAsFactors=FALSE, header=TRUE)
shipments.ts <- ts(data.Frame$Shipments, start=c(1985,1), end = c(1989,4), freq=4)
plot(shipments.ts, bty="l", main="Original series - Appliance shipments")

Question 4.a
Lag-1 and lag-4 were expected to have the largest coefficients because they represent trend and seasonality which were obviously present in the quarterly time series.
Question 4.b