Chapter 7 Problem 2

(a) create a time plot of the differenced series

Bring in the data and look at the head and tail

library(forecast)
## Warning: package 'forecast' was built under R version 3.4.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3
walmart <- read.csv("WalMartStock.csv", stringsAsFactors = FALSE)
head(walmart)
##        Date Close
## 1  5-Feb-01 53.84
## 2  6-Feb-01 53.20
## 3  7-Feb-01 54.66
## 4  8-Feb-01 52.30
## 5  9-Feb-01 50.40
## 6 12-Feb-01 53.45
tail(walmart)
##          Date Close
## 243 28-Jan-02 58.63
## 244 29-Jan-02 57.91
## 245 30-Jan-02 59.75
## 246 31-Jan-02 59.98
## 247  1-Feb-02 59.26
## 248  4-Feb-02 58.90

create the time series and plot it

walmartTS <- ts(walmart$Close, start=c(2001,2,5), frequency = 365)
autoplot(walmartTS)

Plot the differenced series

autoplot(diff(walmartTS, lag = 1))

(b) Which of the following is/are relevant for testing whether the stock is a random walk?

The autocorrelation of the closing price series- No, we use ACF of the differenced series ####The AR(1) slope coefficient for the closing price series- Yes it’s used to compute the test statistic

The AR(1) constant coefficient for the closing price series- No, we use the slope coefficent instead ####The autocorrelations of the differenced series- Yes we use the ACF of the differenced series to determine if it is a random walk

The AR(1) slope coefficient for the differenced series-No we use the slope coefficient with the closing price series ####The AR(1) constant coefficient of the differences series- No we don’t use this with the differenced series

(c) Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you r3eached your conclusion.

First we fit and AR(1) model and look at the coefficients

# Call Arima() with order=c(1,0,0) is the same as an AR(1) model
fit <- Arima(walmartTS, order=c(1,0,0))
fit
## Series: walmartTS 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.9558  52.9497
## s.e.  0.0187   1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13

To get the p-value, we use the coefficient and standard error values to find statistical significance. We will use the t-distribution.

# Calculate the two-tailed p-value from a t-distribution
2*pt(-abs((1 - fit$coef["ar1"]) / sqrt(diag(vcov(fit)))["ar1"]), df=length(walmartTS)-1)
##        ar1 
## 0.01889832

Let alpha be 0.01, we see that at this significance level, we accept the null hypothesis and say that this series is a random walk.

(d) What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below:

It is impossible to obtain useful forecasts of the series- no it it not impossible, but no forecast will be better than the naive forecast

The series is random- No

The changes in the series from one period to the other are random-Yes

Chapter 7 Problem 3

(a) Run a regression model with log(Sales) as the output variable with a linear trend and monthly predictors Remember to fit only the training period. Use this model to forecast the sales in Feb 2002.

Bring in the data and create the time series.

ssales <- read.csv("SouvenirSales.csv", stringsAsFactors = FALSE)

ssalesTS <- ts(ssales$Sales, start=c(1995,1), frequency = 12)
autoplot(ssalesTS)

Split into training and validation periods then run regression model

#Split into training and validation periods
validLength <- 12
trainLength <- length(ssalesTS) - validLength

ssalesTrain <- window(ssalesTS, end=c(1995, trainLength))
ssalesValid <- window(ssalesTS, start=c(1995,trainLength+1))

# Fit the model to the training set
ssalesLinearSeasonal <- tslm(log(ssalesTrain) ~ trend + season)

# See the estimated regression equation
summary(ssalesLinearSeasonal)
## 
## Call:
## tslm(formula = log(ssalesTrain) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16

Forecast sales in February 2002

febForecast <- ssalesLinearSeasonal$coefficients["(Intercept)"] + ssalesLinearSeasonal$coefficients["trend"]*86 + ssalesLinearSeasonal$coefficients["season2"]
exp(febForecast)
## (Intercept) 
##    17062.99

(b) Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2[ARIMA(2,0,0)] to the forecast errors.

Create the ACF plot

resACF <- Acf(ssalesLinearSeasonal$residuals, lag.max = 15)

Fit the errors to the AR lag-2 model

ARModel <- Arima(ssalesLinearSeasonal$residuals, order=c(2,0,0))
ARModel
## Series: ssalesLinearSeasonal$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.3072  0.3687  -0.0025
## s.e.  0.1090  0.1102   0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95

i.Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?

Estimate the p-value for the t-distribution

ARModel$coef["ar2"] / sqrt(diag(vcov(ARModel)))["ar2"]
##      ar2 
## 3.346371

This is greater than 2, so therefore statistically significant. This means we reject the null hypothesis and that this series is not a random walk.

ii. Use the autocorrelation information to compute a forecast for January 2002, using the regression model and AR(2) model above.

Forecast Jan first with the regression model

lrForecast <- forecast(ssalesLinearSeasonal, h=13)
lrForecast
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9.188097  8.917220  9.458974  8.769890  9.606304
## Feb 2001       9.491232  9.220354  9.762109  9.073024  9.909439
## Mar 2001       9.925335  9.654457 10.196212  9.507127 10.343542
## Apr 2001       9.625329  9.354452  9.896207  9.207122 10.043537
## May 2001       9.694286  9.423408  9.965163  9.276078 10.112493
## Jun 2001       9.740741  9.469864 10.011619  9.322534 10.158949
## Jul 2001       9.898195  9.627318 10.169072  9.479988 10.316402
## Aug 2001       9.882831  9.611954 10.153708  9.464624 10.301038
## Sep 2001       9.992619  9.721742 10.263496  9.574412 10.410826
## Oct 2001      10.107664  9.836787 10.378542  9.689457 10.525872
## Nov 2001      10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001      11.372615 11.101738 11.643493 10.954408 11.790823
## Jan 2002       9.441533  9.166476  9.716590  9.016873  9.866193

Generate a forecast using the error terms

errorForecast <- forecast(ARModel, h=13)
errorForecast
##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001    0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001    0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001    0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001    0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001    0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001    0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001    0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001    0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001    0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001    0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001    0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001    0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
## Jan 2002    0.003692175 -0.22217346 0.2295578 -0.3417395 0.3491239

Combine the two to get the Jan forecast. Since the forecast was mad eon log(Sales) we need to convert back to real values.

exp(9.441533) - exp(.003692175)
## [1] 12600.02

Chapter 7 Problem 4

(a) If we compute the autocorrelation of the series, which lag (>0) is most likely to have the largest coefficient (in absolute value)?

I would expect the lag-4 to have the largest coefficient since it’s quarterly data.

(b) Create and ACF plot and compare it with your answer.

Bring in data, create time series, and plot

appship <- read.csv("ApplianceShipments.csv", stringsAsFactors = FALSE)

appshipTS <- ts(appship$Shipments, start=c(1985,1), frequency = 4)
autoplot(appshipTS)

Create ACF plot

Acf(appshipTS)

The ACF plot confirms that the largest coefficient is with lag-4.