knitr::opts_chunk$set(echo = TRUE)

Assignment 7

Problem 2

Forecasting Wal-Mart Stock

Figure 7.10 shows a time plot of Wal-Mart daily closing prices between February 2001 and February 2002. The data is available at finance.yahoo.com and in WalMartStock.xls. The ACF plots of these daily closing prices and its lag-1 differenced series are in Figure 7.11. Table 7.4 shows the output from fitting an AR(1) model to the series of closing prices and to the series of differences. Use all the information to answer the following questions.

(a) Create a time plot of the differenced series.

# Plot differenced series 
plot(diff(WMPrice.ts, lag=1), main="\n Wal-Mart Closing Price \n Lag=1 Differenced Series \n ")

(b) Which of the following is/are relevant for testing whether this stock is a random walk?

. The autocorrelations of the closing price series
. The AR(1) slope coefficient for the closing price series
. The AR(1) constant coefficient for the closing price series
. The autocorrelations of the differences series
. The AR(1) slope coefficient for the differenced series
. The AR(1) constant coefficient for the differenced series

The AR(1) slope coefficient for the closing price series and the autocorrelations of the differenced series are the relevant tests for whether the stock closing price is a random walk.

(c) Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.

# Call ARIMA() to get the estimated coefficients and statistical significance
fit<- Arima(WMPrice.ts, order=c(1,0,0))
fit
## Series: WMPrice.ts 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##         ar1  mean
##       0.956  53.0
## s.e.  0.019   1.3
## 
## sigma^2 estimated as 0.981:  log likelihood=-350
## AIC=706   AICc=706   BIC=716

The AR(1) model output shows a slope coefficient of 0.956 & standard error of 0.019. The two taken together are sufficiently close to one to 1, suggesting that this may not be a random Walk. Additionally, the ACF plot for the time series indicates that some correlation exists between lags 0-9.

(d) What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below.

. It is impossible to obtain useful forecasts of the series.

. The series is random.

. The changes in the series from one period to the other are random.

All of the above statements are true. If a series is a random walk then the values from one period to another are random. Any attempt to forecast future forecasts are strictly naive forecasts.

Problem 3

Souvenir Sales: The file SourvenirSales.xls contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, between 1995 and 2001.

Back in 2001, the store wanted to use the data to forecast sales for the next 12 months (year 2002). They hired an analyst to generate forecasts. The analyst first partitioned the data into training and validation periods, with the validation set containing the last 12 months of data (year 2001). She then fit a regression model to sales, using the training period.

(a) Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.

#  Create a times series
s_sales.ts<-ts(s_sales$`Sales`, start=1995, frequency=12)
#  Create training and validation periods
validLen <- 12
trainLen <- length(s_sales.ts) - validLen
s_sales_train <- window(s_sales.ts, end=c(1995, trainLen))
s_sales_valid <- window(s_sales.ts, start=c(1995, trainLen+1))
#  Fit a linear trend with seasonality model  with a logarithm to the training set
s_salesLogtrain<- tslm(log(s_sales_train) ~ trend + season)
summary(s_salesLogtrain)
## 
## Call:
## tslm(formula = log(s_sales_train) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.64636    0.08412   90.90  < 2e-16 ***
## trend        0.02112    0.00109   19.45  < 2e-16 ***
## season2      0.28201    0.10903    2.59  0.01218 *  
## season3      0.69500    0.10904    6.37  3.1e-08 ***
## season4      0.37387    0.10907    3.43  0.00112 ** 
## season5      0.42171    0.10911    3.87  0.00028 ***
## season6      0.44705    0.10916    4.10  0.00013 ***
## season7      0.58338    0.10922    5.34  1.6e-06 ***
## season8      0.54690    0.10929    5.00  5.4e-06 ***
## season9      0.63557    0.10937    5.81  2.7e-07 ***
## season10     0.72949    0.10946    6.66  1.0e-08 ***
## season11     1.20095    0.10956   10.96  7.4e-16 ***
## season12     1.95220    0.10968   17.80  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.19 on 59 degrees of freedom
## Multiple R-squared:  0.942,  Adjusted R-squared:  0.931 
## F-statistic: 80.4 on 12 and 59 DF,  p-value: <2e-16

February 2002 forecast

#  Forecast February 2002
feb_forecast<- s_salesLogtrain$coefficients["(Intercept)"] + s_salesLogtrain$coefficients["trend"]*86 +  s_salesLogtrain$coefficients["season2"]
exp(feb_forecast)
## (Intercept) 
##       17063

(b) Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with las-2 [ARIMA(2,0,0)]to the forecast errors.

#  Create an ACF plot of the residuals
resSales<- Acf(exp(s_salesLogtrain$residuals), lag.max=15)

resSales
## 
## Autocorrelations of series 'exp(s_salesLogtrain$residuals)', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.470  0.489  0.185  0.081  0.125 -0.004  0.030  0.095  0.038 
##     10     11     12     13     14     15 
##  0.168 -0.041  0.010 -0.030 -0.076 -0.020
# Call ARIMA() to get the estimated coefficients and statistical significance
ARModel<- Arima(s_salesLogtrain$residuals, order=c(2,0,0))
ARModel
## Series: s_salesLogtrain$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##        ar1   ar2    mean
##       0.31  0.37  -0.003
## s.e.  0.11  0.11   0.049
## 
## sigma^2 estimated as 0.0205:  log likelihood=39
## AIC=-70   AICc=-69   BIC=-61

i. Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?

ii. Use the autocorrelation information to compute a forecast for January 2002, using the regression model and the AR(2) model above.

# Create validation forecast using the log regression model
lrForecast <- forecast(s_salesLogtrain, h=validLen)
lrForecast
##          Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001            9.2   8.9   9.5   8.8   9.6
## Feb 2001            9.5   9.2   9.8   9.1   9.9
## Mar 2001            9.9   9.7  10.2   9.5  10.3
## Apr 2001            9.6   9.4   9.9   9.2  10.0
## May 2001            9.7   9.4  10.0   9.3  10.1
## Jun 2001            9.7   9.5  10.0   9.3  10.2
## Jul 2001            9.9   9.6  10.2   9.5  10.3
## Aug 2001            9.9   9.6  10.2   9.5  10.3
## Sep 2001           10.0   9.7  10.3   9.6  10.4
## Oct 2001           10.1   9.8  10.4   9.7  10.5
## Nov 2001           10.6  10.3  10.9  10.2  11.0
## Dec 2001           11.4  11.1  11.6  11.0  11.8
# Generate a forecast for the errors using redisuals AR(2) model
erForecast <- forecast(ARModel, h=validLen)
erForecast
##          Point Forecast  Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001         0.1079 -0.076  0.29 -0.17  0.39
## Feb 2001         0.0986 -0.093  0.29 -0.20  0.39
## Mar 2001         0.0692 -0.141  0.28 -0.25  0.39
## Apr 2001         0.0568 -0.158  0.27 -0.27  0.39
## May 2001         0.0422 -0.178  0.26 -0.29  0.38
## Jun 2001         0.0331 -0.189  0.26 -0.31  0.37
## Jul 2001         0.0249 -0.199  0.25 -0.32  0.37
## Aug 2001         0.0190 -0.206  0.24 -0.32  0.36
## Sep 2001         0.0142 -0.211  0.24 -0.33  0.36
## Oct 2001         0.0106 -0.215  0.24 -0.33  0.36
## Nov 2001         0.0077 -0.218  0.23 -0.34  0.35
## Dec 2001         0.0054 -0.220  0.23 -0.34  0.35
# Create adjusted forecast using exp of the log forecast
adjForecast <- exp(lrForecast$mean) + exp(erForecast$mean)
adjForecast
##        Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov
## 2001  9781 13244 20443 15145 16226 16997 19895 19592 21866 24531 40146
##        Dec
## 2001 86910

The ACF plot indicates that statistically significant data exists as lags 1 & 2 and the forecast accuracy can be improved by adding a regressive model to the current log forecast.

Problem 4

Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985 and 1989. The series is plotted in Figure 7.13.

(a) If we compute the autocorrelation for the series, which lag (>0) is most likely to have the largest coefficient (in absolute value)?

I would expect lag=4 to have the greatest absolute value as the time series apears to have a quarterly seasonality. So I would expect the quarterly numbers to be most closely correlated and lags with mutiples of 4 would have the highest values.

(b) Create an ACF plot and compare it with your answer.

#  Create a times series
test<-Acf(hhappl.ts)

test
## 
## Autocorrelations of series 'hhappl.ts', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.261 -0.098  0.164  0.387 -0.030 -0.269  0.081  0.086 -0.168 
##     10     11     12     13 
## -0.325 -0.019  0.047 -0.096

The ACF plot, and values, indicate that lag-4 does indeed have the greatest absolute value. However, none of the lags have any stastically significant values, indicating that no further forecast data can be found using autocorrolation.