knitr::opts_chunk$set(echo = TRUE)
Assignment 7
Problem 2
Forecasting Wal-Mart Stock
Figure 7.10 shows a time plot of Wal-Mart daily closing prices between February 2001 and February 2002. The data is available at finance.yahoo.com and in WalMartStock.xls. The ACF plots of these daily closing prices and its lag-1 differenced series are in Figure 7.11. Table 7.4 shows the output from fitting an AR(1) model to the series of closing prices and to the series of differences. Use all the information to answer the following questions.
(a) Create a time plot of the differenced series.
# Plot differenced series
plot(diff(WMPrice.ts, lag=1), main="\n Wal-Mart Closing Price \n Lag=1 Differenced Series \n ")

(b) Which of the following is/are relevant for testing whether this stock is a random walk?
. The autocorrelations of the closing price series
. The AR(1) slope coefficient for the closing price series
. The AR(1) constant coefficient for the closing price series
. The autocorrelations of the differences series
. The AR(1) slope coefficient for the differenced series
. The AR(1) constant coefficient for the differenced series
The AR(1) slope coefficient for the closing price series and the autocorrelations of the differenced series are the relevant tests for whether the stock closing price is a random walk.
(c) Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.
# Call ARIMA() to get the estimated coefficients and statistical significance
fit<- Arima(WMPrice.ts, order=c(1,0,0))
fit
## Series: WMPrice.ts
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.956 53.0
## s.e. 0.019 1.3
##
## sigma^2 estimated as 0.981: log likelihood=-350
## AIC=706 AICc=706 BIC=716
The AR(1) model output shows a slope coefficient of 0.956 & standard error of 0.019. The two taken together are sufficiently close to one to 1, suggesting that this may not be a random Walk. Additionally, the ACF plot for the time series indicates that some correlation exists between lags 0-9.
(d) What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below.
. It is impossible to obtain useful forecasts of the series.
. The series is random.
. The changes in the series from one period to the other are random.
All of the above statements are true. If a series is a random walk then the values from one period to another are random. Any attempt to forecast future forecasts are strictly naive forecasts.
Problem 3
Souvenir Sales: The file SourvenirSales.xls contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, between 1995 and 2001.
Back in 2001, the store wanted to use the data to forecast sales for the next 12 months (year 2002). They hired an analyst to generate forecasts. The analyst first partitioned the data into training and validation periods, with the validation set containing the last 12 months of data (year 2001). She then fit a regression model to sales, using the training period.
(a) Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.
# Create a times series
s_sales.ts<-ts(s_sales$`Sales`, start=1995, frequency=12)
# Create training and validation periods
validLen <- 12
trainLen <- length(s_sales.ts) - validLen
s_sales_train <- window(s_sales.ts, end=c(1995, trainLen))
s_sales_valid <- window(s_sales.ts, start=c(1995, trainLen+1))
# Fit a linear trend with seasonality model with a logarithm to the training set
s_salesLogtrain<- tslm(log(s_sales_train) ~ trend + season)
summary(s_salesLogtrain)
##
## Call:
## tslm(formula = log(s_sales_train) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.64636 0.08412 90.90 < 2e-16 ***
## trend 0.02112 0.00109 19.45 < 2e-16 ***
## season2 0.28201 0.10903 2.59 0.01218 *
## season3 0.69500 0.10904 6.37 3.1e-08 ***
## season4 0.37387 0.10907 3.43 0.00112 **
## season5 0.42171 0.10911 3.87 0.00028 ***
## season6 0.44705 0.10916 4.10 0.00013 ***
## season7 0.58338 0.10922 5.34 1.6e-06 ***
## season8 0.54690 0.10929 5.00 5.4e-06 ***
## season9 0.63557 0.10937 5.81 2.7e-07 ***
## season10 0.72949 0.10946 6.66 1.0e-08 ***
## season11 1.20095 0.10956 10.96 7.4e-16 ***
## season12 1.95220 0.10968 17.80 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.19 on 59 degrees of freedom
## Multiple R-squared: 0.942, Adjusted R-squared: 0.931
## F-statistic: 80.4 on 12 and 59 DF, p-value: <2e-16
February 2002 forecast
# Forecast February 2002
feb_forecast<- s_salesLogtrain$coefficients["(Intercept)"] + s_salesLogtrain$coefficients["trend"]*86 + s_salesLogtrain$coefficients["season2"]
exp(feb_forecast)
## (Intercept)
## 17063
(b) Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with las-2 [ARIMA(2,0,0)]to the forecast errors.
# Create an ACF plot of the residuals
resSales<- Acf(exp(s_salesLogtrain$residuals), lag.max=15)

resSales
##
## Autocorrelations of series 'exp(s_salesLogtrain$residuals)', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.470 0.489 0.185 0.081 0.125 -0.004 0.030 0.095 0.038
## 10 11 12 13 14 15
## 0.168 -0.041 0.010 -0.030 -0.076 -0.020
# Call ARIMA() to get the estimated coefficients and statistical significance
ARModel<- Arima(s_salesLogtrain$residuals, order=c(2,0,0))
ARModel
## Series: s_salesLogtrain$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.31 0.37 -0.003
## s.e. 0.11 0.11 0.049
##
## sigma^2 estimated as 0.0205: log likelihood=39
## AIC=-70 AICc=-69 BIC=-61
i. Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?
The ACF plot indicates that statistically significant data exists as lags 1 & 2 and the forecast accuracy can be improved by adding a regressive model to the current log forecast.
Problem 4
Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985 and 1989. The series is plotted in Figure 7.13.
(a) If we compute the autocorrelation for the series, which lag (>0) is most likely to have the largest coefficient (in absolute value)?
The ACF plot, and values, indicate that lag-4 does indeed have the greatest absolute value. However, none of the lags have any stastically significant values, indicating that no further forecast data can be found using autocorrolation.