MBA 678 Assignment 6

Question 1

1a

I would expect the lag-1 autocorrelation to be positive, because consecutive entries in this series tend to be moving in the same direction; stickiness seems to be relatively high.

1b

setwd("~/Documents/MBA 678/Unit 5")
getwd()

## [1] "/Users/Josh/Documents/MBA 678/Unit 5"

library(forecast)
library(knitr)
Canadian <- read.csv("Canadian_Wrk_Hrs.csv", stringsAsFactors = FALSE)
CanadianTS <- ts(Canadian$HoursPerWeek, start=c(1966), frequency=1)
yrange <- range(CanadianTS)
plot(c(1966, 2000), c(34.5,38), type="n", xlab="Year",  ylab="Canadian Manufacturing Worker Work-Hours", bty="l", xaxt="n", yaxt="n")
axis(1, at=seq(1966,2000,2), labels=format(seq(1966,2000,2)))
# Add the y-axis
axis(2, at=seq(34.5, 38, .5), labels=format(seq(34.5, 38, .5)), las=0)
lines(CanadianTS)

Acf(CanadianTS, lag.max=1)

CanadianACF <-Acf(CanadianTS)

CanadianACF

## 
## Autocorrelations of series 'CanadianTS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.928  0.839  0.752  0.665  0.571  0.473  0.369  0.265  0.164 
##     10     11     12     13     14     15 
##  0.047 -0.082 -0.185 -0.261 -0.310 -0.346

As predicted, we see a significantly positive lag-1 autocorrelation coming from the time series, with a positive .928 coefficient.

Question 2

2a

WalMart <- read.csv('WalMart_Stock.csv', header = TRUE, stringsAsFactors = FALSE)
Walmart.ts <- ts(WalMart$Close, frequency = 1)
yrange <- range(Walmart.ts)
plot(c(0, 248), c(44,60), type="n", xlab="Closing Period",  ylab="WalMart Closing Stock Price", bty="l", xaxt="n", yaxt="n")
axis(1, at=seq(0,248,10), labels=format(seq(0,248,10)))
# Add the y-axis
axis(2, at=seq(44,60,10), labels=format(seq(44,60,10)), las=0)
#Print the base time series
lines(Walmart.ts)
title(main = "WalMart Closing Stock Price")

diffWM.ts <- diff(Walmart.ts, lag = 1)
yarnage <- range(diffWM.ts)
plot(c(0, 248), c(-4,4), type="n", xlab="Closing Period",  ylab="Lag-1 Diff", bty="l", xaxt="n", yaxt="n")
axis(1, at=seq(0,248,10), labels=format(seq(0,248,10)))
# Add the y-axis
axis(2, at=seq(-4,4,1), labels=format(seq(-4,4,1)), las=0)
#Print the lag-1 time series
lines(diffWM.ts)
title(main = "WalMart Lag-1 Differenced Series")

2b

The following tests / steps are relevant for determining if the series is a random walk.

1.The AR(1) slope coefficient for the closing price series.

2.We’ll also need the standard error from the AR(1) model so we can compute the p-value for the model. A p-value less than our select alpha suggests the series is not a random walk, while a p-value exceeding alpha suggests the series is a random walk.

2c

For this example, let’s apply an alpha of 0.01, which means that p-values lower than the alpha do not suggest a random walk, while p-values higher than alpha suggest a random walk.

# Call Arima() with order=c(1,0,0) is the same as an AR(1) model
Walmart.fit <- Arima(Walmart.ts, order=c(1,0,0))
Walmart.fit

## Series: Walmart.ts 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.9558  52.9497
## s.e.  0.0187   1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13

# Calculate the two-tailed p-value from a t-distribution
2*pt(-abs((1 - Walmart.fit$coef["ar1"]) / 0.0187), df=length(Walmart.ts)-1)

##        ar1 
## 0.01896261

# Now calculate it using the normal distribution
2*pnorm(-abs((1 - Walmart.fit$coef["ar1"]) / 0.0187))

##        ar1 
## 0.01818593

In this example, both calculated p-values are greater than alpha, suggesting the series is a random walk.

2d

Our findings in 2c suggest:

It is impossible to obtain useful forecasts for the series that would be more predictive than applying the naive forecast.
Changes from one value to the next in this series are random.
We cannot say that “the series [itself] is random,” because it reflects actual market activity. While its behavior cannot be predicted from itself, it’s plausible that another related series could have a predictable impact on our series of interest.

Question 3

3a

#Read the csv
sales <- read.csv("Souvenir_Sls.csv", header = TRUE, stringsAsFactors = FALSE)
#Create Time Series Object
sales.ts <- ts(sales$Sales, start = c(1995,1), frequency = 12)
# Set the length of the validation period to 12 months (one year)
validLength <- 12
# Set the length of the training period to everything else
trainLength <- length(sales.ts) - validLength
# Partition the data into training and validation periods
SalesTrain <- window(sales.ts, start=c(1995,1), end=c(1995, trainLength))
SalesValid <- window(sales.ts, start=c(1995,trainLength+1), end=c(1995,trainLength+validLength))
yrange <- range(sales.ts)
sales.Expo <- tslm(SalesTrain ~ trend + season, lambda=0)
summary(sales.Expo)

## 
## Call:
## tslm(formula = SalesTrain ~ trend + season, lambda = 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2e+10 on 12 and 59 DF,  p-value: < 2.2e-16

plot(c(1995,2002), c(0,120000), type="n", xlab="Year",  ylab="Souvenir Sales", bty="l", xaxt="n", yaxt="n")
#Add the time series base
lines(sales.ts, bty="l", lwd = 2)
# Add the x-axis
axis(1, at=seq(1995,2002,2), labels=format(seq(1995,2002,2)))
# Add the y-axis
axis(2, at=seq(0,120000, 20000), labels=format(seq(0,120000,20000)), las=0)
abline(v=2001)
arrows(1995, 110000, 2001, 110000, code=3, length=0.1)
text(1998, 115000, "Training")
abline(v=2002)
arrows(2001, 110000, 2002, 110000, code=3, length=0.1)
text(2001.5, 115000, "Validation")
lines(sales.ts)
lines(sales.Expo$fitted.values, col='blue')
sales.fcst <-forecast(sales.Expo, h=validLength)
lines(sales.fcst$mean, col='red')

sales.fcst

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9780.022  7459.322  12822.72  6437.463  14858.16
## Feb 2001      13243.095 10100.643  17363.21  8716.947  20119.38
## Mar 2001      20441.749 15591.130  26801.46 13455.287  31055.83
## Apr 2001      15143.541 11550.133  19854.91  9967.870  23006.60
## May 2001      16224.628 12374.689  21272.34 10679.469  24649.03
## Jun 2001      16996.137 12963.127  22283.87 11187.297  25821.13
## Jul 2001      19894.424 15173.679  26083.86 13095.024  30224.31
## Aug 2001      19591.112 14942.340  25686.18 12895.376  29763.51
## Sep 2001      21864.492 16676.271  28666.84 14391.774  33217.31
## Oct 2001      24530.299 18709.509  32162.02 16146.477  37267.30
## Nov 2001      40144.775 30618.828  52634.38 26424.329  60989.36
## Dec 2001      86908.868 66286.278 113947.44 57205.664 132035.03

feb2002 <-forecast(sales.Expo, h=14)
feb2002

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9780.022  7459.322  12822.72  6437.463  14858.16
## Feb 2001      13243.095 10100.643  17363.21  8716.947  20119.38
## Mar 2001      20441.749 15591.130  26801.46 13455.287  31055.83
## Apr 2001      15143.541 11550.133  19854.91  9967.870  23006.60
## May 2001      16224.628 12374.689  21272.34 10679.469  24649.03
## Jun 2001      16996.137 12963.127  22283.87 11187.297  25821.13
## Jul 2001      19894.424 15173.679  26083.86 13095.024  30224.31
## Aug 2001      19591.112 14942.340  25686.18 12895.376  29763.51
## Sep 2001      21864.492 16676.271  28666.84 14391.774  33217.31
## Oct 2001      24530.299 18709.509  32162.02 16146.477  37267.30
## Nov 2001      40144.775 30618.828  52634.38 26424.329  60989.36
## Dec 2001      86908.868 66286.278 113947.44 57205.664 132035.03
## Jan 2002      12601.017  9570.837  16590.57  8240.964  19267.84
## Feb 2002      17062.994 12959.838  22465.23 11159.062  26090.52

Though we probably did more than we strictly needed to, we have arrived at a point forecast for February 2002 of $17,062. We accomplished this by training the model on the training period only, then asking the model to forecast to h=14, i.e., the validation period of 2001 + 2 months, to arrive at a point forecast for Feb 2002.

3b

#Create ACF plot
sales.resACF <- Acf(sales.Expo$residuals, lag.max = 15)

#Create AR(2) model
sales.ARModel <- Arima(sales.Expo$residuals, order=c(2,0,0))
sales.ARModel

## Series: sales.Expo$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.3072  0.3687  -0.0025
## s.e.  0.1090  0.1102   0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95

See console output above for AR (2) model and ACF plot.

3bi

# Calculate the t statistics for each: coefficient / s.e.
# Rough rule is anything > 2 (or < -2) is statistically significant
sales.ARModel$coef["ar1"] / 0.1090

##      ar1 
## 2.818482

sales.ARModel$coef["ar2"] / 0.1102

##      ar2 
## 3.346186

# Now estimate p-value based on the normal distribution
2*pnorm(-abs(sales.ARModel$coef["ar1"] / 0.1090))

##         ar1 
## 0.004825136

2*pnorm(-abs(sales.ARModel$coef["ar2"] / 0.1102))

##          ar2 
## 0.0008193151

Acf(sales.ARModel$residuals)

From the first ACF plot, we note significant positive AR(1) and AR(2) correlation, suggesting we could add value to the forecast by accounting for autocorrelation in the residuals.

From the calculated p-values, we find that both the AR(1) and AR(2) terms are statistically significant. We could improve our previous sales forecast by accounting for autocorrelation in the model’s residuals in a second-layer model, as follows in 3bii, below.

From the second ACF plot, we can validate that the AR(2) model accounts for the series’ autoregression.

3bii

sales.errorForecast <- forecast(sales.ARModel, h=validLength)
sales.errorForecast

##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001    0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001    0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001    0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001    0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001    0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001    0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001    0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001    0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001    0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001    0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001    0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001    0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641

#Just for a reasonablity check, it looks to me like we may need to log-transform these point forecasts back to the original scale.  
# Create the adjusted forecast by adding the two forecasts together
sales.converrfcst <-exp(sales.errorForecast$mean)
sales.converrfcst

##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 2001 1.113916 1.103571 1.071699 1.058445 1.043073 1.033642 1.025216
##           Aug      Sep      Oct      Nov      Dec
## 2001 1.019221 1.014321 1.010632 1.007709 1.005461

sales.adjustedForecast <- sales.fcst$mean + sales.converrfcst
sales.fcst

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9780.022  7459.322  12822.72  6437.463  14858.16
## Feb 2001      13243.095 10100.643  17363.21  8716.947  20119.38
## Mar 2001      20441.749 15591.130  26801.46 13455.287  31055.83
## Apr 2001      15143.541 11550.133  19854.91  9967.870  23006.60
## May 2001      16224.628 12374.689  21272.34 10679.469  24649.03
## Jun 2001      16996.137 12963.127  22283.87 11187.297  25821.13
## Jul 2001      19894.424 15173.679  26083.86 13095.024  30224.31
## Aug 2001      19591.112 14942.340  25686.18 12895.376  29763.51
## Sep 2001      21864.492 16676.271  28666.84 14391.774  33217.31
## Oct 2001      24530.299 18709.509  32162.02 16146.477  37267.30
## Nov 2001      40144.775 30618.828  52634.38 26424.329  60989.36
## Dec 2001      86908.868 66286.278 113947.44 57205.664 132035.03

plot(SalesValid)
lines(sales.adjustedForecast, col='red')
lines(sales.fcst$mean, col='blue')

sales.adjustedForecast

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2001  9781.136 13244.198 20442.820 15144.599 16225.671 16997.171 19895.449
##            Aug       Sep       Oct       Nov       Dec
## 2001 19592.131 21865.507 24531.310 40145.783 86909.874

Based on the two-layer forecast we created above, we have a point forecast for January 2001 of $9,781.

Question 4

4a

The appliance shipments time series is quarterly. Though imperfect, we can see some evidence of seasonality in the plot printed in the book. Other patterns are not as strong based on the visual evidence - for example “stickiness” is not apparent in this series. For these reasons I expect lag-4 to have the highest autocorrelation.

4b

appliance <- read.csv('appliance.csv', header=TRUE, stringsAsFactors = FALSE)
appliance.ts <- ts(appliance$Shipments, start=c(1985,1), freq = 4)
plot(appliance.ts)

Acf(appliance.ts)

4b

As predicted, the lag-4 autocorrelation has the largest coefficient. Note, that none of the lagged correlations are significant, indicating that the base series is not strongly autocorrelated.