I would expect the lag-1 autocorrelation to be positive, because consecutive entries in this series tend to be moving in the same direction; stickiness seems to be relatively high.
setwd("~/Documents/MBA 678/Unit 5")
getwd()
## [1] "/Users/Josh/Documents/MBA 678/Unit 5"
library(forecast)
library(knitr)
Canadian <- read.csv("Canadian_Wrk_Hrs.csv", stringsAsFactors = FALSE)
CanadianTS <- ts(Canadian$HoursPerWeek, start=c(1966), frequency=1)
yrange <- range(CanadianTS)
plot(c(1966, 2000), c(34.5,38), type="n", xlab="Year", ylab="Canadian Manufacturing Worker Work-Hours", bty="l", xaxt="n", yaxt="n")
axis(1, at=seq(1966,2000,2), labels=format(seq(1966,2000,2)))
# Add the y-axis
axis(2, at=seq(34.5, 38, .5), labels=format(seq(34.5, 38, .5)), las=0)
lines(CanadianTS)
Acf(CanadianTS, lag.max=1)
CanadianACF <-Acf(CanadianTS)
CanadianACF
##
## Autocorrelations of series 'CanadianTS', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.928 0.839 0.752 0.665 0.571 0.473 0.369 0.265 0.164
## 10 11 12 13 14 15
## 0.047 -0.082 -0.185 -0.261 -0.310 -0.346
As predicted, we see a significantly positive lag-1 autocorrelation coming from the time series, with a positive .928 coefficient.
WalMart <- read.csv('WalMart_Stock.csv', header = TRUE, stringsAsFactors = FALSE)
Walmart.ts <- ts(WalMart$Close, frequency = 1)
yrange <- range(Walmart.ts)
plot(c(0, 248), c(44,60), type="n", xlab="Closing Period", ylab="WalMart Closing Stock Price", bty="l", xaxt="n", yaxt="n")
axis(1, at=seq(0,248,10), labels=format(seq(0,248,10)))
# Add the y-axis
axis(2, at=seq(44,60,10), labels=format(seq(44,60,10)), las=0)
#Print the base time series
lines(Walmart.ts)
title(main = "WalMart Closing Stock Price")
diffWM.ts <- diff(Walmart.ts, lag = 1)
yarnage <- range(diffWM.ts)
plot(c(0, 248), c(-4,4), type="n", xlab="Closing Period", ylab="Lag-1 Diff", bty="l", xaxt="n", yaxt="n")
axis(1, at=seq(0,248,10), labels=format(seq(0,248,10)))
# Add the y-axis
axis(2, at=seq(-4,4,1), labels=format(seq(-4,4,1)), las=0)
#Print the lag-1 time series
lines(diffWM.ts)
title(main = "WalMart Lag-1 Differenced Series")
The following tests / steps are relevant for determining if the series is a random walk.
1.The AR(1) slope coefficient for the closing price series.
2.We’ll also need the standard error from the AR(1) model so we can compute the p-value for the model. A p-value less than our select alpha suggests the series is not a random walk, while a p-value exceeding alpha suggests the series is a random walk.
For this example, let’s apply an alpha of 0.01, which means that p-values lower than the alpha do not suggest a random walk, while p-values higher than alpha suggest a random walk.
# Call Arima() with order=c(1,0,0) is the same as an AR(1) model
Walmart.fit <- Arima(Walmart.ts, order=c(1,0,0))
Walmart.fit
## Series: Walmart.ts
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.9558 52.9497
## s.e. 0.0187 1.3280
##
## sigma^2 estimated as 0.9815: log likelihood=-349.8
## AIC=705.59 AICc=705.69 BIC=716.13
# Calculate the two-tailed p-value from a t-distribution
2*pt(-abs((1 - Walmart.fit$coef["ar1"]) / 0.0187), df=length(Walmart.ts)-1)
## ar1
## 0.01896261
# Now calculate it using the normal distribution
2*pnorm(-abs((1 - Walmart.fit$coef["ar1"]) / 0.0187))
## ar1
## 0.01818593
In this example, both calculated p-values are greater than alpha, suggesting the series is a random walk.
Our findings in 2c suggest:
It is impossible to obtain useful forecasts for the series that would be more predictive than applying the naive forecast.
Changes from one value to the next in this series are random.
We cannot say that “the series [itself] is random,” because it reflects actual market activity. While its behavior cannot be predicted from itself, it’s plausible that another related series could have a predictable impact on our series of interest.
#Read the csv
sales <- read.csv("Souvenir_Sls.csv", header = TRUE, stringsAsFactors = FALSE)
#Create Time Series Object
sales.ts <- ts(sales$Sales, start = c(1995,1), frequency = 12)
# Set the length of the validation period to 12 months (one year)
validLength <- 12
# Set the length of the training period to everything else
trainLength <- length(sales.ts) - validLength
# Partition the data into training and validation periods
SalesTrain <- window(sales.ts, start=c(1995,1), end=c(1995, trainLength))
SalesValid <- window(sales.ts, start=c(1995,trainLength+1), end=c(1995,trainLength+validLength))
yrange <- range(sales.ts)
sales.Expo <- tslm(SalesTrain ~ trend + season, lambda=0)
summary(sales.Expo)
##
## Call:
## tslm(formula = SalesTrain ~ trend + season, lambda = 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2e+10 on 12 and 59 DF, p-value: < 2.2e-16
plot(c(1995,2002), c(0,120000), type="n", xlab="Year", ylab="Souvenir Sales", bty="l", xaxt="n", yaxt="n")
#Add the time series base
lines(sales.ts, bty="l", lwd = 2)
# Add the x-axis
axis(1, at=seq(1995,2002,2), labels=format(seq(1995,2002,2)))
# Add the y-axis
axis(2, at=seq(0,120000, 20000), labels=format(seq(0,120000,20000)), las=0)
abline(v=2001)
arrows(1995, 110000, 2001, 110000, code=3, length=0.1)
text(1998, 115000, "Training")
abline(v=2002)
arrows(2001, 110000, 2002, 110000, code=3, length=0.1)
text(2001.5, 115000, "Validation")
lines(sales.ts)
lines(sales.Expo$fitted.values, col='blue')
sales.fcst <-forecast(sales.Expo, h=validLength)
lines(sales.fcst$mean, col='red')
sales.fcst
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9780.022 7459.322 12822.72 6437.463 14858.16
## Feb 2001 13243.095 10100.643 17363.21 8716.947 20119.38
## Mar 2001 20441.749 15591.130 26801.46 13455.287 31055.83
## Apr 2001 15143.541 11550.133 19854.91 9967.870 23006.60
## May 2001 16224.628 12374.689 21272.34 10679.469 24649.03
## Jun 2001 16996.137 12963.127 22283.87 11187.297 25821.13
## Jul 2001 19894.424 15173.679 26083.86 13095.024 30224.31
## Aug 2001 19591.112 14942.340 25686.18 12895.376 29763.51
## Sep 2001 21864.492 16676.271 28666.84 14391.774 33217.31
## Oct 2001 24530.299 18709.509 32162.02 16146.477 37267.30
## Nov 2001 40144.775 30618.828 52634.38 26424.329 60989.36
## Dec 2001 86908.868 66286.278 113947.44 57205.664 132035.03
feb2002 <-forecast(sales.Expo, h=14)
feb2002
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9780.022 7459.322 12822.72 6437.463 14858.16
## Feb 2001 13243.095 10100.643 17363.21 8716.947 20119.38
## Mar 2001 20441.749 15591.130 26801.46 13455.287 31055.83
## Apr 2001 15143.541 11550.133 19854.91 9967.870 23006.60
## May 2001 16224.628 12374.689 21272.34 10679.469 24649.03
## Jun 2001 16996.137 12963.127 22283.87 11187.297 25821.13
## Jul 2001 19894.424 15173.679 26083.86 13095.024 30224.31
## Aug 2001 19591.112 14942.340 25686.18 12895.376 29763.51
## Sep 2001 21864.492 16676.271 28666.84 14391.774 33217.31
## Oct 2001 24530.299 18709.509 32162.02 16146.477 37267.30
## Nov 2001 40144.775 30618.828 52634.38 26424.329 60989.36
## Dec 2001 86908.868 66286.278 113947.44 57205.664 132035.03
## Jan 2002 12601.017 9570.837 16590.57 8240.964 19267.84
## Feb 2002 17062.994 12959.838 22465.23 11159.062 26090.52
Though we probably did more than we strictly needed to, we have arrived at a point forecast for February 2002 of $17,062. We accomplished this by training the model on the training period only, then asking the model to forecast to h=14, i.e., the validation period of 2001 + 2 months, to arrive at a point forecast for Feb 2002.
#Create ACF plot
sales.resACF <- Acf(sales.Expo$residuals, lag.max = 15)
#Create AR(2) model
sales.ARModel <- Arima(sales.Expo$residuals, order=c(2,0,0))
sales.ARModel
## Series: sales.Expo$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
See console output above for AR (2) model and ACF plot.
# Calculate the t statistics for each: coefficient / s.e.
# Rough rule is anything > 2 (or < -2) is statistically significant
sales.ARModel$coef["ar1"] / 0.1090
## ar1
## 2.818482
sales.ARModel$coef["ar2"] / 0.1102
## ar2
## 3.346186
# Now estimate p-value based on the normal distribution
2*pnorm(-abs(sales.ARModel$coef["ar1"] / 0.1090))
## ar1
## 0.004825136
2*pnorm(-abs(sales.ARModel$coef["ar2"] / 0.1102))
## ar2
## 0.0008193151
Acf(sales.ARModel$residuals)
From the first ACF plot, we note significant positive AR(1) and AR(2) correlation, suggesting we could add value to the forecast by accounting for autocorrelation in the residuals.
From the calculated p-values, we find that both the AR(1) and AR(2) terms are statistically significant. We could improve our previous sales forecast by accounting for autocorrelation in the model’s residuals in a second-layer model, as follows in 3bii, below.
From the second ACF plot, we can validate that the AR(2) model accounts for the series’ autoregression.
sales.errorForecast <- forecast(sales.ARModel, h=validLength)
sales.errorForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001 0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001 0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001 0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001 0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001 0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001 0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001 0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001 0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001 0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001 0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001 0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
#Just for a reasonablity check, it looks to me like we may need to log-transform these point forecasts back to the original scale.
# Create the adjusted forecast by adding the two forecasts together
sales.converrfcst <-exp(sales.errorForecast$mean)
sales.converrfcst
## Jan Feb Mar Apr May Jun Jul
## 2001 1.113916 1.103571 1.071699 1.058445 1.043073 1.033642 1.025216
## Aug Sep Oct Nov Dec
## 2001 1.019221 1.014321 1.010632 1.007709 1.005461
sales.adjustedForecast <- sales.fcst$mean + sales.converrfcst
sales.fcst
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9780.022 7459.322 12822.72 6437.463 14858.16
## Feb 2001 13243.095 10100.643 17363.21 8716.947 20119.38
## Mar 2001 20441.749 15591.130 26801.46 13455.287 31055.83
## Apr 2001 15143.541 11550.133 19854.91 9967.870 23006.60
## May 2001 16224.628 12374.689 21272.34 10679.469 24649.03
## Jun 2001 16996.137 12963.127 22283.87 11187.297 25821.13
## Jul 2001 19894.424 15173.679 26083.86 13095.024 30224.31
## Aug 2001 19591.112 14942.340 25686.18 12895.376 29763.51
## Sep 2001 21864.492 16676.271 28666.84 14391.774 33217.31
## Oct 2001 24530.299 18709.509 32162.02 16146.477 37267.30
## Nov 2001 40144.775 30618.828 52634.38 26424.329 60989.36
## Dec 2001 86908.868 66286.278 113947.44 57205.664 132035.03
plot(SalesValid)
lines(sales.adjustedForecast, col='red')
lines(sales.fcst$mean, col='blue')
sales.adjustedForecast
## Jan Feb Mar Apr May Jun Jul
## 2001 9781.136 13244.198 20442.820 15144.599 16225.671 16997.171 19895.449
## Aug Sep Oct Nov Dec
## 2001 19592.131 21865.507 24531.310 40145.783 86909.874
Based on the two-layer forecast we created above, we have a point forecast for January 2001 of $9,781.
The appliance shipments time series is quarterly. Though imperfect, we can see some evidence of seasonality in the plot printed in the book. Other patterns are not as strong based on the visual evidence - for example “stickiness” is not apparent in this series. For these reasons I expect lag-4 to have the highest autocorrelation.
appliance <- read.csv('appliance.csv', header=TRUE, stringsAsFactors = FALSE)
appliance.ts <- ts(appliance$Shipments, start=c(1985,1), freq = 4)
plot(appliance.ts)
Acf(appliance.ts)
As predicted, the lag-4 autocorrelation has the largest coefficient. Note, that none of the lagged correlations are significant, indicating that the base series is not strongly autocorrelated.