#Libraries
library(forecast)
library(zoo)
library(timeDate)

Chapter 7: #1, 2, 3, 4

Question 1

Analysis of Canadian Manufacturing Workers Work-hours: The time series plot in Figure 7.9 describes the average annual number of weekly hours spent by Candian manufacturing workers. The data is available in CanadianWorkhours.xls.

(A)

If we computer the autocorrelation of this series, would the lag-1 autocorrelation exhibit negative, positive, or no autocorrelation? how can you see this from the plot?

Generally speaking, I would expect to see positive autocorrelation. The trend of the first two-thirds of the data is trending in the same direction, meaning the autocorrelation would be positive.

(B)

Compute the autocorrelation and produce an ACF plot. Verify your answer to the previous question.

See plots below. As I expected, there is a strong positive autocorrelation. The lag-1 autocorrelation is 0.928, demonstrating very stong autocorrelation.

Work <- read.csv("CanadianWorkHours1.csv", stringsAsFactors = FALSE)

# Create a time series object out of it
Work.TS <- ts(Work$HoursPerWeek, start=c(1, 1), frequency=1)

Acf(Work.TS, lag.max = 1)

WorkACF <- Acf(Work.TS)

WorkACF
## 
## Autocorrelations of series 'Work.TS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.928  0.839  0.752  0.665  0.571  0.473  0.369  0.265  0.164 
##     10     11     12     13     14     15 
##  0.047 -0.082 -0.185 -0.261 -0.310 -0.346

Question 2

Forecasting Wal-Mart Stock: Figure 7.10 shows a time plot of Wal-Mart daily closing prices between February 2001 and February 2002.0 The data is available at finance.yahoo.com and in WalMartStock.xls. The ACF plots of these daily closing prices and its lag-1 differenced series are in Figure 7.11. Table 7.4 shows the output from fitting an AR(1) model to the series of closing prices and to the series of differences. use all the information to answer the following questions.

(A)

Create a time plot of the differenced series.

See plots below (use lag.max of 10)

WalMart <- read.csv("WalMartStock.csv")
WalMart.TS <- ts(WalMart$Close, start = c(1,1), freq=365)

Acf(WalMart.TS, lag.max=1)

WalMartACF <- Acf(WalMart.TS)

Acf(diff(WalMart.TS, lag=1),lag.max=10, main="ACF Plot for Differenced Series")

plot(diff(WalMart.TS, lag=1), bty="l")

(B)

Which of the following is/are relevant for testing whether this stock is a random walk?

  • The autocorrelations of the closing price series: YES (close to 0)
  • The AR(1) slope coefficient for the closing price series: YES
  • The AR(1) constant coefficient for the closing price series: NO
  • The autocorrelation of the differenced series: YES
  • The AR(1) slope coefficient for the differenced series: NO
  • The AR(1) constant coefficient for the differenced series: YES

(C)

Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion

See below. The p-values are calculated below using both a t-distribution and a normal distribution. At a significance level of .05, we can conclusively say that this is not a random walk, and has some underlying patterns in the data.

# Call Arima() with order=c(1,0,0) is the same as an AR(1) model
fit <- Arima(WalMart.TS, order=c(1,0,0))
fit
## Series: WalMart.TS 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.9558  52.9497
## s.e.  0.0187   1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13
#Calculate the two-tailed p-value from a t-distribution - use s.e. for denom.
2*pt(-abs((1 - fit$coef["ar1"]) / 0.0187), df=length(WalMart.TS)-1)
##        ar1 
## 0.01896261
#Calculate using normal distribution
2*pnorm(-abs((1-fit$coef["ar1"])/ 0.0187))
##        ar1 
## 0.01818593

(D)

What are the implications of finding that a time series is a random walk? Chose the correct statement(s) below:

  • Is it impossible to obtain useful forecasts of the series: TRUE (Only Naive)
  • The series is random: FALSE (There may be patterns - just not forecastable)
  • The changes in the series from one period to the other are random: TRUE

Question 3

Souvenir Sales: The file SouvenirSales.xls contains monthly sales for a souvenir shop at a beach resort town in Queenstown, Australia, between 1995 and 2001.

Back in 2001, the store wanted to use the data to forecast sales for the next 12 months (year 2002). They hired an analyst to generate forecasts. The analyst first partitioned the data into the training and validation periods, with the validation set containing the last 12 months of data (year 2001). She then fit a regression model to sales, using the training period.

(A)

Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.

Data partioned and regression model using log(Sales) below. Forecasted sales in February 2002 with this model are expected to be $12,869.98

Souvenir <- read.csv("SouvenirSales.csv")
Souvenir.TS <- ts(Souvenir$Sales, start = c(1995,1), freq=12)

plot(Souvenir.TS)

#Partition data
nValid <- 12
nTrain <- length(Souvenir.TS) - nValid
Train.TS <- window(Souvenir.TS, start= c(1995,1), end= c(1995, nTrain))
Valid.TS <- window(Souvenir.TS, start= c(1995, nTrain+1), end= c(1995, nTrain+nValid))

#Regression model: log(Sales), linear trend, seasonality (training pd)
SouvenirLinearSeason <- tslm(log(Train.TS) ~ trend + season)

summary(SouvenirLinearSeason)
## 
## Call:
## tslm(formula = log(Train.TS) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16
#Forecast (Feb 2002 = obs. 86).
Feb2002.Forecast <- SouvenirLinearSeason$coefficients["(Intercept)"] + SouvenirLinearSeason$coefficients["trend"]*86
exp(Feb2002.Forecast)
## (Intercept) 
##    12869.98

(B)

Create an ACF plot until lag-15 for the forecast errors. Now fit and AR model with lag-2 [ARIMA(2,0,0)] to the forecast errors.

#Residuals ACF
Residuals.ACF <- Acf(SouvenirLinearSeason$residuals, lag.max=15)

Residuals.ACF
## 
## Autocorrelations of series 'SouvenirLinearSeason$residuals', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.459  0.485  0.194  0.088  0.154  0.016  0.030  0.106  0.034 
##     10     11     12     13     14     15 
##  0.152 -0.055 -0.012 -0.047 -0.077 -0.023
#Fit AR model, lag-2 [ARIMA(2,0,0)]
ARModel <- Arima(SouvenirLinearSeason$residuals, order=c(2,0,0))
ARModel
## Series: SouvenirLinearSeason$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.3072  0.3687  -0.0025
## s.e.  0.1090  0.1102   0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95
# Calculate the t statistics for each: coefficient / s.e.
# Rough rule is anything > 2 (or < -2) is statistically significant
ARModel$coef["ar1"] / 0.1090
##      ar1 
## 2.818482
ARModel$coef["ar2"] / 0.1102
##      ar2 
## 3.346186
# Now estimate p-value based on the normal distribution
2*pnorm(-abs(ARModel$coef["ar1"] / 0.1090))
##         ar1 
## 0.004825136
2*pnorm(-abs(ARModel$coef["ar2"] / 0.1102))
##          ar2 
## 0.0008193151

i. Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?

Both the t-stats for AR1 and AR2 are statistically significant at 2.82 and 3.35 respectively. Further, their p-values are very small at .005 and .001 - making both terms statistically significant. Using both the AR1 and AR2 models can be useful.

ii. Use the autocorrelation information to compute a forecast for January 2001, using the regression model and the AR(2) model above.

Jan2001.Forecast <- forecast(SouvenirLinearSeason, h=1)
Jan2001.Forecast
##          Point Forecast   Lo 80    Hi 80   Lo 95    Hi 95
## Jan 2001       9.188097 8.91722 9.458974 8.76989 9.606304
# Generate a forecast for the error terms using residuals AR(2) model
errorForecast <- forecast(ARModel, h=1)
errorForecast
##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001      0.1078821 -0.07561892 0.2913832 -0.1727585 0.3885227
# Create the adjusted forecast by adding the two forecasts together
adjustedForecast <- Jan2001.Forecast$mean + errorForecast$mean

#inverse log function
exp(adjustedForecast)
##           Jan
## 2001 10894.13

Using a combination of the two models yields an adjusted forecast. Once this is inverted back to a sales number, we can see that the January 2001 sales are expected to be $10,894.13

Question 4

Shipments of Household Appliances: The file ApplicanceShipments.xls contains the series of quarterly shipments (in millsion of USD) of U.S. household applicance between 1985 and 1989. The series is plotted in Figure 7.13.

(A)

If we compute the autocorrelation of the series, which lag (>0) is most likely to have the largest coefficient (in absolute value?

Given that this data is quarterly, I would expect the largest coefficient to be that which represents the same period a year prior, in this case that would be a lag of 4.

(B)

Create an ACF plot and compare it with your answer.

See ACF plot below. This confirms my expectation that a lag of 4 has the largest coefficient, indicating the most autocorrelation between the same quarter a year later.

Appliances <- read.csv("ApplianceShipments1.csv")
Appliance.TS <- ts(Appliances$Shipments, start = c(1985,1), end = c(1989,4), freq=4)

Appliance.ACF <- Acf(Appliance.TS)