Assignment 7

Chapter 7

2). Forecasting Wal-Mart Stock: Figure 7.10 shows a time plot of Wal-Mart daily closing prices between February 2001 and February 2002. The data is available at finance.yahoo.com and in wmstockStock.xls.

The data is available at finance.yahoo.com and in wmstockStock.xls. The ACF plots of these daily closing prices and its lag-1 differenced series are in Figure 7.11. Table 7.4 shows the output from fitting an AR(1) model to the series of closing prices and to the series of differences. Use all the information to answer the following questions.

a.)Create a time plot of the differenced series.

library(forecast)
wmstock<- read.csv("WalMartStock.csv", header = TRUE, stringsAsFactors = FALSE)
wmstockTS<- ts(wmstock$Close, start = c(2001), frequency = 248)

yrange = (wmstockTS)
plot(wmstockTS, xlab = "Months", ylab = "Stock Closing Prices", bty = "l", xaxt = "n", yaxt = "n", main = "Walmart Stock Prices")

axis(1,at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(45,60,5),labels = format(seq(45.0,60.0,5)),las = 2)
lines(wmstockTS, bty = "l")

box()
box(which = "plot")

Acf(wmstockTS, lag.max = 10)

Acf(diff(wmstockTS, lag=1), lag.max = 10)

plot(diff(wmstockTS, lag = 1), main = "Time Plot of Walmart Differenced Series")

b.)Which of the following is/are relevant for testing whether this stock is a random walk?

.The AR(1) slope coefficient for the closing price series A random walk will have a slope coefficient equal to 1

.The autocorrelations of the differenced series A random walk ACF plot will have autocorrelation close to 0 at lags 1,2,3, etc.

c.)Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.

fit <- Arima(wmstockTS, order=c(1,0,0))
fit

## Series: wmstockTS 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.9558  52.9497
## s.e.  0.0187   1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13

#Calculate p-value from t-distribution
2*pt(-abs((1 - fit$coef["ar1"]) / 0.0187), df=length(wmstockTS)-1)

##        ar1 
## 0.01896261

#Calculate P-value, normal distribution
2*pnorm(-abs((1-fit$coef["ar1"])/0.0187))

##        ar1 
## 0.01818593

If we let $\alpha$ = 0.01 this series would be considered a random walk. The slope coefficient is close to 1 and the standard error is This result is .0187. I would assume that based in the nature of the test this is indeed a random walk. This is not surprising as if I was able to more accurrately predict stock prices I would most likely not be working on this report right now :)

d.)What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below.

. It is impossible to obtain useful forecasts of the series

. The changes in the series from one period to the other are random.

3). Souvenir Sales: The file SouvenirSales.xls contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, between 1995 and 2001.

Back in 2001, the store wanted to use the data to forecast sales for the next 12 months (year 2002). They hired an analyst to generate forecasts. The analyst first partitioned the data into training and validation periods, with the validation set containing the last 12 months of data (year 2001). She then fit a regression model to sales, using the training period.

a.)Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.

souvsales <- read.csv("SouvenirSales.csv")
souvsalesTS <- ts(souvsales$Sales,start=c(1995,1),frequency=12)
yrange = range(souvsalesTS)

plot(c(1995,2001),yrange,type="n",xlab="Year",ylab="Sales (thousands of Australian dollars)",bty="l",xaxt="n",yaxt="n")
lines(souvsalesTS,bty="l")

axis(1,at=seq(1995,2002,1),labels=format(seq(1995,2002,1)))
axis(2,at=seq(0,110000,10000),labels=format(seq(0,110,10)),las=2)
box()

box(which = "plot")

validLength <- 12
trainLength <- length(souvsalesTS) - validLength
souvsalesTrain <- window(souvsalesTS,end=c(1995,trainLength))
souvsalesValid <- window(souvsalesTS,start=c(1995,trainLength+1))

souvsalesLogLinearSeason <- tslm(log(souvsalesTrain) ~ trend + season)
summary(souvsalesLogLinearSeason)

## 
## Call:
## tslm(formula = log(souvsalesTrain) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16

souvsalesLogLinearSeasonForecast <- forecast(souvsalesLogLinearSeason,h=validLength)
feb2002Forecast <- souvsalesLogLinearSeason$coefficients["(Intercept)"] + souvsalesLogLinearSeason$coefficients["trend"]*86 + souvsalesLogLinearSeason$coefficients["season2"]
exp(feb2002Forecast)

## (Intercept) 
##    17062.99

We see that the forecast for February 2002 is $17062.99.

b.)Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2 [ARIMA(2, 0, 0)] to the forecast errors.

salesACF <- Acf(souvsalesLogLinearSeason$residuals,lag.max=15)

salesACF

## 
## Autocorrelations of series 'souvsalesLogLinearSeason$residuals', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.459  0.485  0.194  0.088  0.154  0.016  0.030  0.106  0.034 
##     10     11     12     13     14     15 
##  0.152 -0.055 -0.012 -0.047 -0.077 -0.023

#Create ARIMA 2
ARModel <- Arima(souvsalesLogLinearSeason$residuals,order=c(2,0,0))
ARModel

## Series: souvsalesLogLinearSeason$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.3072  0.3687  -0.0025
## s.e.  0.1090  0.1102   0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95

# Calculate the t statistics for each: coefficient / s.e.
# Rough rule is anything > 2 (or < -2) is statistically significant
ARModel$coef["ar1"] / sqrt(diag(vcov(ARModel)))["ar1"]

##      ar1 
## 2.819441

ARModel$coef["ar2"] / sqrt(diag(vcov(ARModel)))["ar2"]

##      ar2 
## 3.346371

# Now estimate p-value based on the normal distribution
2*pnorm(-abs(ARModel$coef["ar1"] / sqrt(diag(vcov(ARModel)))["ar1"]))

##         ar1 
## 0.004810743

2*pnorm(-abs(ARModel$coef["ar2"] / sqrt(diag(vcov(ARModel)))["ar1"]))

##          ar2 
## 0.0007139238

i.)Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?

Both t-statistics of the coefficient were greater than 2 which is indicative of being statistically significant. Also the P-value is below one, leading to similar conclusions. The AR2 model appears to be the better fit of the 2 models.

ii.)Use the autocorrelation information to compute a forecast for January 2002, using the regression model and the AR(2) model above.

lrForecast <- forecast(souvsalesLogLinearSeason,h=validLength)
lrForecast

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9.188097  8.917220  9.458974  8.769890  9.606304
## Feb 2001       9.491232  9.220354  9.762109  9.073024  9.909439
## Mar 2001       9.925335  9.654457 10.196212  9.507127 10.343542
## Apr 2001       9.625329  9.354452  9.896207  9.207122 10.043537
## May 2001       9.694286  9.423408  9.965163  9.276078 10.112493
## Jun 2001       9.740741  9.469864 10.011619  9.322534 10.158949
## Jul 2001       9.898195  9.627318 10.169072  9.479988 10.316402
## Aug 2001       9.882831  9.611954 10.153708  9.464624 10.301038
## Sep 2001       9.992619  9.721742 10.263496  9.574412 10.410826
## Oct 2001      10.107664  9.836787 10.378542  9.689457 10.525872
## Nov 2001      10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001      11.372615 11.101738 11.643493 10.954408 11.790823

# Generate a forecast for the error terms using residuals AR(2) model
errorForecast <- forecast(ARModel,h=validLength)
errorForecast

##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001    0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001    0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001    0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001    0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001    0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001    0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001    0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001    0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001    0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001    0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001    0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001    0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641

# Create the adjusted forecast by adding the two forecasts together
adjustedForecast <- lrForecast$mean + errorForecast$mean
adjustedForecast

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2001  9.295979  9.589783  9.994580  9.682130  9.736457  9.773830  9.923098
##            Aug       Sep       Oct       Nov       Dec
## 2001  9.901870 10.006838 10.118240 10.607927 11.378062

plot(souvsalesValid,xlab="2001",ylab="Sales (thousands of Australian dollars)",bty="l",xaxt="n",yaxt="n")

axis(1, at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(0,110000,10000),labels=format(seq(0,110,10)),las=2)

lines(exp(adjustedForecast),col="blue")
legend(2001,110000, c("Actuals", "Adjusted Forecast"), lty=c(1,1), col=c("black", "blue"), bty="n")

box()
box(which = "plot")

4).Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985 and 1989. The series is plotted in Figure 7.13.

a.)If we compute the autocorrelation of the series, which lag (> 0) is most likely to have the largest coefficient (in absolute value)?

*Lag-4 would likely show the largest coefficient due to the data being quarterly.

b.)Create an ACF plot and compare it with your answer

#Imported, made time series
Appliance <- read.csv("ApplianceShipments.csv")
ApplianceTS <- ts(Appliance$Shipments, start = c(1985,1), end = c(1989,4), freq=4)

#Created ACF plot
AppACF <- Acf(ApplianceTS)

#Printed results 
AppACF

## 
## Autocorrelations of series 'ApplianceTS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.261 -0.098  0.164  0.387 -0.030 -0.269  0.081  0.086 -0.168 
##     10     11     12     13 
## -0.325 -0.019  0.047 -0.096

Assignment 7

John Ross

April 01, 2018

Chapter 7

2). Forecasting Wal-Mart Stock: Figure 7.10 shows a time plot of Wal-Mart daily closing prices between February 2001 and February 2002. The data is available at finance.yahoo.com and in wmstockStock.xls.

3). Souvenir Sales: The file SouvenirSales.xls contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, between 1995 and 2001.

4).Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985 and 1989. The series is plotted in Figure 7.13.