Chapter 7: Questions 1, 2, 3 and 4

Question 1

I began, as usual, by pulling in the data from the files provided and loading the required libraries. To start I recreated the plot in the text to answer question 1(a).

library(forecast)
WorkHours <- read.csv("~/MBA678/CanadianWorkHours.csv", stringsAsFactors = FALSE)

workTS <- ts(WorkHours$Hours, start=c(1966, 1), frequency=1)

yrange = range(workTS)

plot(c(1966, 2000), yrange, type="n", xlab="Year",  ylab="Hours Per Week", bty="l", xaxt="n", yaxt="n")

lines(workTS, bty="l")

axis(1, at=seq(1970,2000,5), labels=format(seq(1970,2000,5)))

axis(2, at=seq(34.5,38,0.5), las=2)

1(a). I would expect to see a postive lag-1 autocorrelation as there does seem to be some linear trend, generally decreasing over the majority of the years.

1(b). For part b, I computed the autocorrelation and generated an ACF plot:

Acf(workTS, lag.max = 1)

workACF <- Acf(workTS)

workACF
## 
## Autocorrelations of series 'workTS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.928  0.839  0.752  0.665  0.571  0.473  0.369  0.265  0.164 
##     10     11     12     13     14     15 
##  0.047 -0.082 -0.185 -0.261 -0.310 -0.346

As I guessed in 1(a), there is a postive lag-1 autocorrelation (of 0.928).

Question 2

2(a). I again pulled in the Walmart stock data and recreated the first plot in the text for question 2. Then I differenced the series using diff().

WalMart <- read.csv("~/MBA678/WalMartStock.csv", header = TRUE, stringsAsFactors = FALSE)

walmartts <- ts(WalMart$Close, frequency = 1)
yrange <- range(walmartts)

plot(c(0, 248), c(44,60), type="n", xlab="Time",  ylab="WalMart Closing Stock Price", bty="l", xaxt="n", yaxt="n")
axis(2, at=seq(44,60,2), labels=format(seq(44,60,2)), las=0)

lines(walmartts)

walmartdiff <- diff(walmartts, lag = 1)

plot(walmartdiff, bty = "l", ylab="lag-1 diff", main = "Plot of Differenced Series")

2(b). The following are relevant for testing whether the stock is a random walk: * The AR(1) slope coefficient for the closing price series. * The autocorrelations of the differenced series

2(c). I first recreated the AR(1) model output and then calculated the p-values using in two ways.

fitstock <- Arima(walmartts, order = c(1,0,0))
fitstock
## Series: walmartts 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1  intercept
##       0.9558    52.9497
## s.e.  0.0187     1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13
2*pt(-abs((1 - fitstock$coef["ar1"]) / 0.0187), df=length(walmartts)-1)
##        ar1 
## 0.01896261
2*pnorm(-abs((1 - fitstock$coef["ar1"]) / 0.0187))
##        ar1 
## 0.01818593

Using a alpha of 0.01, both of these calculated p-values are greater than alpha (at 0.0189 and 0.181 respectively) meaning the series is a random walk.

2(d). Based on the results in 2(c), showing that a time series is a random walk, the following implications are true:

  1. It is impossible to obtain useful forecasts of the series.
  2. The series is random.

The changes in the series may seem random, but simply because we’ve found a series to be a random walk we cannot be certain that the changes are random. They may be linked to external factors (in this example, the stock prices are likely not entirely random, but are linked to markets).

Question 3

3(a). Part a of question three is the same as a portion of Assignment 5 - I’ve used almost identical code to create an output for this question.

souvensales <- read.csv("~/MBA678/SouvenirSales.csv",stringsAsFactors = FALSE)

salesTS <- ts(souvensales$Sales,start=c(1995,1),frequency=12)
validLength <- 12
trainLength <- length(salesTS) - validLength
souvensalesTrain <- window(salesTS,end=c(1995,trainLength))
souvensalesValid <- window(salesTS, start=c(1995,trainLength+1))

logsales <- tslm(log(souvensalesTrain) ~ trend + season)
summary(logsales)
## 
## Call:
## tslm(formula = log(souvensalesTrain) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16
febforecast <- logsales$coefficients["(Intercept)"] + logsales$coefficients["trend"]*86 + logsales$coefficients["season2"]

exp(febforecast)
## (Intercept) 
##    17062.99

I calculated a forecasted value of $17062.99 for February 2022 sales.

3(b). i.

salesACF <- Acf(logsales$residuals, lag.max = 15)

ARModel <- Arima(logsales$residuals, order = c(2,0,0))
ARModel
## Series: logsales$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.3072  0.3687    -0.0025
## s.e.  0.1090  0.1102     0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95

You can see that at lag-1 and lag-2 are much larger than the other ACF’s ploted.

ARModel$coef["ar1"] / 0.1090
##      ar1 
## 2.818482
ARModel$coef["ar2"] / 0.1102
##      ar2 
## 3.346186
Acf(ARModel$residuals)

I then calculated the p-value for the AR(2) model:

2 * pnorm(-abs(ARModel$coef["ar2"] / 0.1102))
##          ar2 
## 0.0008193151
Acf(ARModel$residuals)

The p-value is statistically significant and looking at the second ACF plot after the AR(2) model is applied you can tell that the model accounts for autoregression (all the ACFs fall within the normal range).

3(b) ii.

I ran the two models above and combined the results to calculate the forecast for January 2001. Then I had to convert back to the correct format to get the dollar amount forecasted for that month.

JanForecast <- forecast(logsales,h=validLength)
ARForecast <- forecast(ARModel, h=validLength)

combined <- JanForecast$mean + ARForecast$mean
combined[1]
## [1] 9.295979
exp(combined[1])
## [1] 10894.13

The forecast using both the regression and the AR(2) model from 3(b) i for January 2001 is $10894.13

Question 4

4(a). Based on the plot in the text, that I’ve recreated below I would guess that lag-4 autocorrelation would have the largest coeffecient, due to the quarterly seasonality.

4(b).

appliance <- read.csv("~/MBA678/ApplianceShipments.csv", stringsAsFactors = FALSE)
appliancets <- ts(appliance$Shipments, start=c(1985,1), freq = 4)
plot(appliancets, xlab="Year",  ylab="Appliance Shipments")

Acf(appliancets)

As predicted in 4(a), lag-4 has the greatest coeffecient.