The data is available at finance.yahoo.com and in wmstockStock.xls. The ACF plots of these daily closing prices and its lag-1 differenced series are in Figure 7.11. Table 7.4 shows the output from fitting an AR(1) model to the series of closing prices and to the series of differences. Use all the information to answer the following questions.
a.)Create a time plot of the differenced series.
library(forecast)
wmstock<- read.csv("WalMartStock.csv", header = TRUE, stringsAsFactors = FALSE)
wmstockTS<- ts(wmstock$Close, start = c(2001), frequency = 248)
yrange = (wmstockTS)
plot(wmstockTS, xlab = "Months", ylab = "Stock Closing Prices", bty = "l", xaxt = "n", yaxt = "n", main = "Walmart Stock Prices")
axis(1,at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(45,60,5),labels = format(seq(45.0,60.0,5)),las = 2)
lines(wmstockTS, bty = "l")
box()
box(which = "plot")
Acf(wmstockTS, lag.max = 10)
Acf(diff(wmstockTS, lag=1), lag.max = 10)
plot(diff(wmstockTS, lag = 1), main = "Time Plot of Walmart Differenced Series")
b.)Which of the following is/are relevant for testing whether this stock is a random walk?
.The AR(1) slope coefficient for the closing price series A random walk will have a slope coefficient equal to 1
.The autocorrelations of the differenced series A random walk ACF plot will have autocorrelation close to 0 at lags 1,2,3, etc.
c.)Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.
fit <- Arima(wmstockTS, order=c(1,0,0))
fit
## Series: wmstockTS
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.9558 52.9497
## s.e. 0.0187 1.3280
##
## sigma^2 estimated as 0.9815: log likelihood=-349.8
## AIC=705.59 AICc=705.69 BIC=716.13
#Calculate p-value from t-distribution
2*pt(-abs((1 - fit$coef["ar1"]) / 0.0187), df=length(wmstockTS)-1)
## ar1
## 0.01896261
#Calculate P-value, normal distribution
2*pnorm(-abs((1-fit$coef["ar1"])/0.0187))
## ar1
## 0.01818593
If we let \(\alpha\) = 0.01 this series would be considered a random walk. The slope coefficient is close to 1 and the standard error is This result is .0187. I would assume that based in the nature of the test this is indeed a random walk. This is not surprising as if I was able to more accurrately predict stock prices I would most likely not be working on this report right now :)
d.)What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below.
. It is impossible to obtain useful forecasts of the series
. The changes in the series from one period to the other are random.
Back in 2001, the store wanted to use the data to forecast sales for the next 12 months (year 2002). They hired an analyst to generate forecasts. The analyst first partitioned the data into training and validation periods, with the validation set containing the last 12 months of data (year 2001). She then fit a regression model to sales, using the training period.
a.)Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.
souvsales <- read.csv("SouvenirSales.csv")
souvsalesTS <- ts(souvsales$Sales,start=c(1995,1),frequency=12)
yrange = range(souvsalesTS)
plot(c(1995,2001),yrange,type="n",xlab="Year",ylab="Sales (thousands of Australian dollars)",bty="l",xaxt="n",yaxt="n")
lines(souvsalesTS,bty="l")
axis(1,at=seq(1995,2002,1),labels=format(seq(1995,2002,1)))
axis(2,at=seq(0,110000,10000),labels=format(seq(0,110,10)),las=2)
box()
box(which = "plot")
validLength <- 12
trainLength <- length(souvsalesTS) - validLength
souvsalesTrain <- window(souvsalesTS,end=c(1995,trainLength))
souvsalesValid <- window(souvsalesTS,start=c(1995,trainLength+1))
souvsalesLogLinearSeason <- tslm(log(souvsalesTrain) ~ trend + season)
summary(souvsalesLogLinearSeason)
##
## Call:
## tslm(formula = log(souvsalesTrain) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9306
## F-statistic: 80.4 on 12 and 59 DF, p-value: < 2.2e-16
souvsalesLogLinearSeasonForecast <- forecast(souvsalesLogLinearSeason,h=validLength)
feb2002Forecast <- souvsalesLogLinearSeason$coefficients["(Intercept)"] + souvsalesLogLinearSeason$coefficients["trend"]*86 + souvsalesLogLinearSeason$coefficients["season2"]
exp(feb2002Forecast)
## (Intercept)
## 17062.99
We see that the forecast for February 2002 is $17062.99.
b.)Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2 [ARIMA(2, 0, 0)] to the forecast errors.
salesACF <- Acf(souvsalesLogLinearSeason$residuals,lag.max=15)
salesACF
##
## Autocorrelations of series 'souvsalesLogLinearSeason$residuals', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.459 0.485 0.194 0.088 0.154 0.016 0.030 0.106 0.034
## 10 11 12 13 14 15
## 0.152 -0.055 -0.012 -0.047 -0.077 -0.023
#Create ARIMA 2
ARModel <- Arima(souvsalesLogLinearSeason$residuals,order=c(2,0,0))
ARModel
## Series: souvsalesLogLinearSeason$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
# Calculate the t statistics for each: coefficient / s.e.
# Rough rule is anything > 2 (or < -2) is statistically significant
ARModel$coef["ar1"] / sqrt(diag(vcov(ARModel)))["ar1"]
## ar1
## 2.819441
ARModel$coef["ar2"] / sqrt(diag(vcov(ARModel)))["ar2"]
## ar2
## 3.346371
# Now estimate p-value based on the normal distribution
2*pnorm(-abs(ARModel$coef["ar1"] / sqrt(diag(vcov(ARModel)))["ar1"]))
## ar1
## 0.004810743
2*pnorm(-abs(ARModel$coef["ar2"] / sqrt(diag(vcov(ARModel)))["ar1"]))
## ar2
## 0.0007139238
i.)Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?
Both t-statistics of the coefficient were greater than 2 which is indicative of being statistically significant. Also the P-value is below one, leading to similar conclusions. The AR2 model appears to be the better fit of the 2 models.
ii.)Use the autocorrelation information to compute a forecast for January 2002, using the regression model and the AR(2) model above.
lrForecast <- forecast(souvsalesLogLinearSeason,h=validLength)
lrForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9.188097 8.917220 9.458974 8.769890 9.606304
## Feb 2001 9.491232 9.220354 9.762109 9.073024 9.909439
## Mar 2001 9.925335 9.654457 10.196212 9.507127 10.343542
## Apr 2001 9.625329 9.354452 9.896207 9.207122 10.043537
## May 2001 9.694286 9.423408 9.965163 9.276078 10.112493
## Jun 2001 9.740741 9.469864 10.011619 9.322534 10.158949
## Jul 2001 9.898195 9.627318 10.169072 9.479988 10.316402
## Aug 2001 9.882831 9.611954 10.153708 9.464624 10.301038
## Sep 2001 9.992619 9.721742 10.263496 9.574412 10.410826
## Oct 2001 10.107664 9.836787 10.378542 9.689457 10.525872
## Nov 2001 10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001 11.372615 11.101738 11.643493 10.954408 11.790823
# Generate a forecast for the error terms using residuals AR(2) model
errorForecast <- forecast(ARModel,h=validLength)
errorForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001 0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001 0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001 0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001 0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001 0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001 0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001 0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001 0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001 0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001 0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001 0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
# Create the adjusted forecast by adding the two forecasts together
adjustedForecast <- lrForecast$mean + errorForecast$mean
adjustedForecast
## Jan Feb Mar Apr May Jun Jul
## 2001 9.295979 9.589783 9.994580 9.682130 9.736457 9.773830 9.923098
## Aug Sep Oct Nov Dec
## 2001 9.901870 10.006838 10.118240 10.607927 11.378062
plot(souvsalesValid,xlab="2001",ylab="Sales (thousands of Australian dollars)",bty="l",xaxt="n",yaxt="n")
axis(1, at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(0,110000,10000),labels=format(seq(0,110,10)),las=2)
lines(exp(adjustedForecast),col="blue")
legend(2001,110000, c("Actuals", "Adjusted Forecast"), lty=c(1,1), col=c("black", "blue"), bty="n")
box()
box(which = "plot")
a.)If we compute the autocorrelation of the series, which lag (> 0) is most likely to have the largest coefficient (in absolute value)?
*Lag-4 would likely show the largest coefficient due to the data being quarterly.
b.)Create an ACF plot and compare it with your answer
#Imported, made time series
Appliance <- read.csv("ApplianceShipments.csv")
ApplianceTS <- ts(Appliance$Shipments, start = c(1985,1), end = c(1989,4), freq=4)
#Created ACF plot
AppACF <- Acf(ApplianceTS)
#Printed results
AppACF
##
## Autocorrelations of series 'ApplianceTS', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.261 -0.098 0.164 0.387 -0.030 -0.269 0.081 0.086 -0.168
## 10 11 12 13
## -0.325 -0.019 0.047 -0.096