1a. The time series plot in figure 7.9 describes the average annual number of weekly hours spent by Canadian manufacturing workers. If we computed the autocorrelation of this series, would the lag-1 autocorrelation exhibit negative, positive, or no autocorrelation? How can you see this from the plot?

Positive, since we see a strong linear trend with values that are similar to past values. There are no steep peaks or valleys in this time series.

1b. Compute the autocorrelation and produce an ACF plot. Verify your answer to the previous question.

#Imported, made range
CanWkHrs <- read.csv("CanadianWorkHours.csv")
CanTS <- ts(CanWkHrs$HoursPerWeek, start = c(1,1), freq=1)
yrange = range(CanTS)

#Compute ACF, plot up to lag-15
CanACF <- Acf(CanTS)

#Print ACF result, showing positive lag-1 autocorrelation of 0.928
CanACF
## 
## Autocorrelations of series 'CanTS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.928  0.839  0.752  0.665  0.571  0.473  0.369  0.265  0.164 
##     10     11     12     13     14     15 
##  0.047 -0.082 -0.185 -0.261 -0.310 -0.346
#Plot differenced series up to lag-12 ACF, still positive
Acf(diff(CanTS, lag=1), lag.max=12, main="ACF plot for differenced series")

This confirms the positive autocorrelation suspected in Question 1.

2a. Create a time plot of the differenced WalMart series.

#Imported, plotted diff series
WalMart <- read.csv("WalMartStock.csv")
WalMartTS <- ts(WalMart$Close, start = c(1,1), freq=365)
plot(diff(WalMartTS, lag=1), bty="l")

#ACF plot
Acf(diff(WalMartTS, lag=1), lag.max=12, main="ACF plot for differenced series")

2b. Which of the following is/are relevant for testing whether this stock is a random walk?

The autocorrelations of the closing price series.

The AR(1) slope coefficient for the closing price series.

The AR(1) constant coefficient for the closing price series.

The autocorrelations of the differenced series.

2c. Recreate the AR(1) model output for the Close price series shown in the left of table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.

fit <- Arima(WalMartTS, order=c(1,0,0))
fit
## Series: WalMartTS 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1  intercept
##       0.9558    52.9497
## s.e.  0.0187     1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13
#Calculated two-tailed p-value using S.E. value above
2*pt(-abs((1 - fit$coef["ar1"]) / 0.0187), df=length(WalMartTS)-1)
##        ar1 
## 0.01896261
#P-value, normal distribution
2*pnorm(-abs((1-fit$coef["ar1"])/0.0187))
##        ar1 
## 0.01818593

Both ways, the p-value is less than 0.05, the statistically significant threshold. It doesn’t indicate a random walk.

2d. What are the implications of finding that a time series is a random walk? Choose the correct statements below.

It is impossible to obtain useful forecasts of the series: You can forecast random walks, but they are naive forecasts.

The changes in the series from one period to the other are random: The book notes this on page 154.

3a. Run a regression model on the Souvenir series with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use the model to forecast sales in February 2002.

#Imported, made range, partitioned
Souvenir <- read.csv("SouvenirSales.csv")
SouvTS <- ts(Souvenir$Sales, start = c(1995,1), freq=12)
souvValid <- 12
souvTrain <- length(SouvTS) - souvValid
souvTrainTS <- window(SouvTS, start= c(1995,1), end=c(1995, souvTrain))
souvValidTS <- window(SouvTS, start= c(1995, souvTrain+1), end= c(1995, souvTrain+souvValid))

#Fitted model to training set with log(Sales) model
logSalesM <- tslm(log(souvTrainTS) ~ trend + season)

#Generated forecast for Feb-2002
feb02Forecast <- logSalesM$coef["(Intercept)"] + logSalesM$coef["trend"]*86
exp(feb02Forecast)
## (Intercept) 
##    12869.98

3b. Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2 [ARIMA(2,0,0)] to the forecast errors.

#ACF plot to lag-15
residualACF <- Acf(logSalesM$residuals, lag.max=15)

#Create ARIMA, print
lag2Model <- Arima(logSalesM$residuals, order =c(2,0,0))  
lag2Model
## Series: logSalesM$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.3072  0.3687    -0.0025
## s.e.  0.1090  0.1102     0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95
i. Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?
#Calculate t statistics for each: coefficient / s.e.
#Remember from lesson: Rough rule is anything > 2 (or < -2) is statistically significant
lag2Model$coef["ar1"]/ 0.1090
##      ar1 
## 2.818482
lag2Model$coef["ar2"]/ 0.1102
##      ar2 
## 3.346186

These t-statistics indicate statistical significance, because they’re greater than 2.

#Estimated p-values for both, normal distribution
2*pnorm(-abs(lag2Model$coef["ar1"]/ 0.1090))
##         ar1 
## 0.004825136
2*pnorm(-abs(lag2Model$coef["ar2"]/ 0.1102))
##          ar2 
## 0.0008193151

The p-values support the above finding of significance, since they’re both well under 0.05.

ii. Use the autocorrelation information to compute a forecast for January 2001, using the regressional model for the AR(2) model above.
#Linear regression forecast
lrForecast <- forecast(logSalesM, h=1)
lrForecast
##          Point Forecast   Lo 80    Hi 80   Lo 95    Hi 95
## Jan 2001       9.188097 8.91722 9.458974 8.76989 9.606304
#Error forecast
errorForecast <- forecast(lag2Model, h=1)
errorForecast
##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001      0.1078821 -0.07561892 0.2913832 -0.1727585 0.3885227
#Added to create adjusted forecast
adjustedForecast <- lrForecast$mean + errorForecast$mean
adjustedForecast
##           Jan
## 2001 9.295979

4a. If we compute the autocorrelation of the Appliance series, which lag (> 0) is most likely to have the largest coefficient (in absolute value)?

Lag-4, since the data is quarterly. For example, 1986-Q1 should and does correspond reasonably well with 1985-Q1 because they’re the same parts of similar years.

4b. Create an ACF plot and compare it with your answer.

#Imported, made time series
Appliance <- read.csv("ApplianceShipments.csv")
ApplianceTS <- ts(Appliance$Shipments, start = c(1985,1), end = c(1989,4), freq=4)

#Created ACF plot
AppACF <- Acf(ApplianceTS)

#Printed results 
AppACF
## 
## Autocorrelations of series 'ApplianceTS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.261 -0.098  0.164  0.387 -0.030 -0.269  0.081  0.086 -0.168 
##     10     11     12     13 
## -0.325 -0.019  0.047 -0.096

It confirms that Lag-4 has the largest coefficient.