Question 1.a

From visual inspection of the time series plot, lag-1 autocorrelation of the series would exhibit positive autocorrelation.

Analyzing the graph for lag-1 by comparing the output at any t with the output at t-1, we find that a large percentage of the trends (slopes) at t and t-1 are both increasing or both decreasing (y[t] and y[t-1] are moving in the same direction [stickiness]) which constitutes a positive autocorrelation.

Question 1.b

The original series of work hours per week was plotted.

The autocorrelations were computed: (1) without specifying a maximum lag, and (2) for lag-1.

The lag-1 autocorrelation of 0.928 was positive as predicted by visual inspection.

#setwd("~/wd678/Unit 5b")
#install.packages("forecast")
library(forecast)

## Warning: package 'forecast' was built under R version 3.3.3

dataFrame <- read.csv("CanadianWorkHours.csv", stringsAsFactors=FALSE, header=TRUE)
workHours.ts <- ts(dataFrame$Hours.per.week, start=c(1966,1), end = c(2000,1), freq=1)
plot(workHours.ts, bty="l", main="Original series - Work hours per week")

Below is the Acf graph without a maximum lag specified in the argument. Lag-1 had the largest autocorrelation.

lags.ac <- Acf(workHours.ts)

lags.ac

## 
## Autocorrelations of series 'workHours.ts', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.928  0.839  0.752  0.665  0.571  0.473  0.369  0.265  0.164 
##     10     11     12     13     14     15 
##  0.047 -0.082 -0.185 -0.261 -0.310 -0.346

Below is the autocorrelation graph for lag-1.

lag1.ac <- Acf(workHours.ts,lag.max=1)

lag1.ac

## 
## Autocorrelations of series 'workHours.ts', by lag
## 
##     0     1 
## 1.000 0.928

Question 2a

Below are the time plots for the original closing price series and then the differenced series.

walmart <- read.csv("WalmartStock.csv", stringsAsFactors=FALSE, header=TRUE)
stockClose.ts <- ts(walmart$Close)
yrange=range(stockClose.ts)
plot(stockClose.ts, bty="l", main="Original series - Closing prices")

stockPriceLag1 <- diff(stockClose.ts,lag=1)
plot(stockPriceLag1, ylab="Lag-1", xlab="Time", bty="l", main="Time plot for Feb '01-02 stock prices with lag-1 differencing")

Question 2.b

The following information is relevant to testing whether the series is a random walk:

1. The AR(1) slope coefficient for the closing price series - If this slope coefficient is equal to one, then the series is a random walk.

2. The autocorrelations of the differenced series - If all the lags of a series are approximately equal to zero, then the series is a random walk. This means that there was not a relationship in the direction (slope) of the outputs of any two periods.

Question 2.c

The AR(1) model output in Table 7-4 was recreated below:

The Walmart stock closing prices was a random walk. The slope coefficient (0.9558) was within 2.4 standard errors of the number one. The statistical significance was tested. The two-tailed p-value from a t-distribution was 0.01896261. For an alpha=0.01, the p-value was larger so the null hypothesis that the series was a random walk was accepted.

stocks.AR1 <- Arima(stockClose.ts, order=c(1,0,0))
stocks.AR1

## Series: stockClose.ts 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.9558  52.9497
## s.e.  0.0187   1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13

2*pt(-abs((1-stocks.AR1$coef["ar1"])/0.0187), df=length(stockClose.ts)-1)

##        ar1 
## 0.01896261

Question 2.d

The following “true or false” statements describe the sentence that follows them:

TRUE - It is impossible to obtain useful forecasts of the series. Forecast models could not predict better than the naive forecast.

TRUE - The changes in the series from one period to the other are random. - The change in output, up or down, occurring in any period of the series cannot be used to predict the direction of change occuring in other periods of the same series.

Question 3.a

Below is the plot of a regression model, with linear trend and monthly predictors, for the original Sales time series. The next plot is for: (1) forecasts fitted for the training period, (2) forecasts for the validation period and (3) the forecast for February 2002. For February 2002, the log scale forecasted value was 9.74 and original scale value was 17063.

data.Frame <- read.csv("SouvenirSales.csv", stringsAsFactors=FALSE, header=TRUE)
souvSales.ts <- ts(data.Frame$Sales, start=c(1995,1), end = c(2001,12), freq=12)
plot(souvSales.ts, bty="l", main="Original series for Souvenir Sales")

nValid3 <- 12
nTrain3 <- length(souvSales.ts) - nValid3
train3.ts <- window(souvSales.ts, start = c(1995,1), end=c(1995, nTrain3))
valid3.ts <- window(souvSales.ts, start = c(1995, nTrain3 + 1), end = c(1995, nTrain3 + nValid3))

The regression model of log(sales) was found for the training period.

lmTrainSouvSales <- tslm(log(train3.ts) ~ trend + season)
summary(lmTrainSouvSales)

## 
## Call:
## tslm(formula = log(train3.ts) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16

The regression model of log(sales) was found for the validation period.

lmValidSouvSalesForecast <- forecast(lmTrainSouvSales, h=nValid3)
lmValidSouvSalesForecast$mean

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2001  9.188097  9.491232  9.925335  9.625329  9.694286  9.740741  9.898195
##            Aug       Sep       Oct       Nov       Dec
## 2001  9.882831  9.992619 10.107664 10.600248 11.372615

exp(lmValidSouvSalesForecast$mean)

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2001  9780.022 13243.095 20441.749 15143.541 16224.628 16996.137 19894.424
##            Aug       Sep       Oct       Nov       Dec
## 2001 19591.112 21864.492 24530.299 40144.775 86908.868

The residuals of the regression model of log(sales) in the validation period was calculated.

lmValidResiduals <- log(valid3.ts) - lmValidSouvSalesForecast$mean
lmValidResiduals

##              Jan         Feb         Mar         Apr         May
## 2001  0.04627623 -0.16160882  0.06556110  0.13644081 -0.01407973
##              Jun         Jul         Aug         Sep         Oct
## 2001  0.09025773  0.27360664  0.37785931  0.33304008  0.22829793
##              Nov         Dec
## 2001  0.14984574  0.18586333

The forecast for February 2002 was calculated in the log and original scales.

febForecast <- lmTrainSouvSales$coefficients["(Intercept)"] + lmTrainSouvSales$coefficients["trend"]*86 + lmTrainSouvSales$coefficients["season2"]
febForecast

## (Intercept) 
##    9.744667

exp(febForecast)

## (Intercept) 
##    17062.99

A plot was created for: (1) the log(sales) original series, (2) the regression model for the training period, (3) the forecast for the validation period, and (4) the forecast for February 2002.

#yrange2=range(log(souvSales.ts))
plot(c(1995,2003), c(6.8,12.6), type="n", xlab="Time", ylab="log(Sales [$])", bty="l", xaxt="n", yaxt="n", main="Regression model for log(Sales)")
axis(1,at=seq(1995,2003,1), labels=format(seq(1995,2003,1)))
axis(2,at=seq(6.8,12.6,1.0), labels=format(seq(6.8,12.6,1.0)), las=2)
lines(log(souvSales.ts), lty=2)
lines(lmTrainSouvSales$fitted, col="red")
lines(lmValidSouvSalesForecast$mean, col="green")
points(2003, febForecast, col="blue")

legend(1994.75,12.8, c("Original Series", "Fitted in training period", "Forecasted in validation period", "February 2002"), lty=c(2,1,1,1), col=c("black", "red", "green","blue"), bty="n")

Question 3.b

The ACF for forecast errors up to lag-15 was graphed and an AR(2) model was fit below. The null hypothesis that the AR(2) was a random walk was rejected.

train.Resid.ACF <- Acf(lmTrainSouvSales$residuals, lag.max=15 )

AR2Model.Errors.Train <- Arima(lmTrainSouvSales$residuals, order=c(2,0,0))
AR2Model.Errors.Train

## Series: lmTrainSouvSales$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.3072  0.3687  -0.0025
## s.e.  0.1090  0.1102   0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95

residuals <- Acf(lmTrainSouvSales$residuals)

residuals

## 
## Autocorrelations of series 'lmTrainSouvSales$residuals', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.459  0.485  0.194  0.088  0.154  0.016  0.030  0.106  0.034 
##     10     11     12     13     14     15     16     17     18     19 
##  0.152 -0.055 -0.012 -0.047 -0.077 -0.023 -0.139 -0.056 -0.125 -0.135 
##     20     21     22     23     24 
## -0.088 -0.258 -0.203 -0.257 -0.334

See below: The t-statistics and p-values were statistically significant (t-statistic was outside the (-2) to 2 interval, and p < 0.0.5). The residuals series of the regression model for sales was not a random walk.

Here are the t-statistics:

AR2Model.Errors.Train$coef["ar1"]/0.1090

##      ar1 
## 2.818482

AR2Model.Errors.Train$coef["ar2"]/0.1102

##      ar2 
## 3.346186

Here are the p-values:

2*pnorm(-abs(AR2Model.Errors.Train$coef["ar1"]/0.1090))

##         ar1 
## 0.004825136

2*pnorm(-abs(AR2Model.Errors.Train$coef["ar2"]/0.1102))

##          ar2 
## 0.0008193151

Acf(AR2Model.Errors.Train$residuals)

Question 3.b.i

The ACF plot for residuals of predicted sales showed significant correlation in lag-1 and lag-2. This indicated that the trend was not adequately captured by the regression model of sales. The uncaptured trend was carried forward to the residual series. One could attempt to better capture trend in the model for sales or capture the trend using a second-layer AR model.

The estimated coefficients of the AR(1) and AR(2) models were not equal to one after considering their standard errors; therefore, the residual series was not a random walk. This indicated that there was still alot of systemic pattern to the data. The trend of sales had not been adequately captured by the regression model. Trend had been passed on to the residual series. Lag-12 was less than threshold; therefore, seasonality had been captured by the regression model.

The ACF graph for the AR(2) model doesn’t have autocorrelation for any lags because all the autocorrelation of the original sales time series was captured.

Question 3.b.ii

Below are the January 2001 & January 2002 sales forecasts (in log and original scales) which were adjusted using an AR(2) model of the forecasted error.

The January 2002 forecast (in log and original scales), before adjustment, was calculated below:

january2002Forecast <- lmTrainSouvSales$coefficients["(Intercept)"] + lmTrainSouvSales$coefficients["trend"]*85
january2002Forecast

## (Intercept) 
##    9.441533

exp(january2002Forecast)

## (Intercept) 
##    12601.02

The above AR(2) training period model of the regression model errors was copied below:

#errors.AR2Model.Train <- Arima(lmTrainSouvSales$residuals, order=c(2,0,0))
#summary(errors.AR2Model.Train)
AR2Model.Errors.Train <- Arima(lmTrainSouvSales$residuals, order=c(2,0,0))
summary(AR2Model.Errors.Train)

## Series: lmTrainSouvSales$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.3072  0.3687  -0.0025
## s.e.  0.1090  0.1102   0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95
## 
## Training set error measures:
##                       ME      RMSE       MAE      MPE     MAPE      MASE
## Training set 0.004605836 0.1401718 0.1140246 854.4841 1235.335 0.5953986
##                    ACF1
## Training set 0.02836869

An AR(2) validation period model of the regression model’s errors follows below:

errorsAR2Model.Valid <- forecast(AR2Model.Errors.Train, h=nValid3)
errorsAR2Model.Valid$mean

##              Jan         Feb         Mar         Apr         May
## 2001 0.107882119 0.098551352 0.069245043 0.056801003 0.042171322
##              Jun         Jul         Aug         Sep         Oct
## 2001 0.033088136 0.024902959 0.019038932 0.014219137 0.010576068
##              Nov         Dec
## 2001 0.007679567 0.005446339

errorsAR2Model.Valid$mean[12]

## [1] 0.005446339

errorsAR2Model.Valid$mean[11]

## [1] 0.007679567

exp(errorsAR2Model.Valid$mean)

##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 2001 1.113916 1.103571 1.071699 1.058445 1.043073 1.033642 1.025216
##           Aug      Sep      Oct      Nov      Dec
## 2001 1.019221 1.014321 1.010632 1.007709 1.005461

The adjusted 2001 forecast, computed below, is given in original and log scales.

For January 2001, the sales forecast was $9,781.14

adjustedForecast2001 <- exp(lmValidSouvSalesForecast$mean) + exp(errorsAR2Model.Valid$mean)
adjustedForecast2001

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2001  9781.136 13244.198 20442.820 15144.599 16225.671 16997.171 19895.449
##            Aug       Sep       Oct       Nov       Dec
## 2001 19592.131 21865.507 24531.310 40145.783 86909.874

log(adjustedForecast2001)

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2001  9.188211  9.491315  9.925387  9.625399  9.694350  9.740802  9.898246
##            Aug       Sep       Oct       Nov       Dec
## 2001  9.882883  9.992666 10.107706 10.600273 11.372627

The actual and adjusted forecasted sales in 2001 for the validation period were plotted below:

plot(valid3.ts,main="Actual & adjusted forecasted sales in validation period")
lines(exp(lmValidSouvSalesForecast$mean), col="black")
lines(adjustedForecast2001, col="blue")
legend(2001,100000, c("Adjusted Forecast","Actuals"), lty=c(1,1),col=c("black","blue"))

For January 2002 sales, the error of regression was calculated using past values as predictors in the equation for an AR(2) model in the validation period. The intercept and beta coefficients were taken from the summary of the AR(2) model above.

errors.January2002ForecastErrors <- -0.0025 + (0.3072*(errorsAR2Model.Valid$mean[12])) + (0.3687*(errorsAR2Model.Valid$mean[11]))
errors.January2002ForecastErrors

## [1] 0.002004572

The AR(2) model for the errors of the regression of sales was plotted in the training and validation period, and for the January 2002 forecast.

plot(c(1995,2003), c(-0.2,0.64), type="n",xlab="Time",ylab="log(error of sales regression)",bty="l",xaxt="n",yaxt="n",main="AR(2) model for the errors in regression of sales")
axis(1,at=seq(1995,2003,1), labels=format(seq(1995,2003,1)))
axis(2,at=seq(-0.2,0.64,0.1), labels=format(seq(-0.2,0.64,0.1)),las=2)
lines(lmTrainSouvSales$residuals, lty=1)
lines(lmValidResiduals, lty=1)
lines(AR2Model.Errors.Train$fitted, lty=2, col="red")
lines(errorsAR2Model.Valid$mean, lty=2, col="green")
points(2003, errors.January2002ForecastErrors, col="blue")
legend(1994.75,0.68, c("regression errors (traing & validation periods)","AR fitted errors in training period", "AR forecasted errors in validation period", "AR forecasted error for January 2002"), lty=c(1,2,2,1), col=c("black","red", "green", "blue"), bty="n")

The forecasted sales for January 2002, obtained using the regression model, was adjusted with the forecasted error, obtained using the AR(2) model. The original and log scales are below:

adjustedJanuary2002Forecast <- exp(january2002Forecast) + exp(errors.January2002ForecastErrors)
adjustedJanuary2002Forecast

## (Intercept) 
##    12602.02

log(adjustedJanuary2002Forecast)

## (Intercept) 
##    9.441612

Question 4

The original appliance shipments time series is plotted below:

data.Frame <- read.csv("ApplianceShipments.csv", stringsAsFactors=FALSE, header=TRUE)
shipments.ts <- ts(data.Frame$Shipments, start=c(1985,1), end = c(1989,4), freq=4)
plot(shipments.ts, bty="l", main="Original series - Appliance shipments")

Question 4.a

Lag-1 and lag-4 were expected to have the largest coefficients because they represent trend and seasonality which were obviously present in the quarterly time series.

Question 4.b

No lagged autocorrelations were larger than the upper or lower thresholds indicating that original appliance shipment series was not significantly autocorrelated. Lag-4 had the largest coefficient of 0.387

shipmentsAcf <- Acf(shipments.ts)

shipmentsAcf

## 
## Autocorrelations of series 'shipments.ts', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.261 -0.098  0.164  0.387 -0.030 -0.269  0.081  0.086 -0.168 
##     10     11     12     13 
## -0.325 -0.019  0.047 -0.096

Assignment 6

Cesar Garcia

April 10, 2017

Question 1.a

From visual inspection of the time series plot, lag-1 autocorrelation of the series would exhibit positive autocorrelation.

Question 1.b

The original series of work hours per week was plotted.

The autocorrelations were computed: (1) without specifying a maximum lag, and (2) for lag-1.

The lag-1 autocorrelation of 0.928 was positive as predicted by visual inspection.

Below is the Acf graph without a maximum lag specified in the argument. Lag-1 had the largest autocorrelation.

Below is the autocorrelation graph for lag-1.

Question 2a

Below are the time plots for the original closing price series and then the differenced series.

Question 2.b

The following information is relevant to testing whether the series is a random walk:

1. The AR(1) slope coefficient for the closing price series - If this slope coefficient is equal to one, then the series is a random walk.

2. The autocorrelations of the differenced series - If all the lags of a series are approximately equal to zero, then the series is a random walk. This means that there was not a relationship in the direction (slope) of the outputs of any two periods.

Question 2.c

The AR(1) model output in Table 7-4 was recreated below:

Question 2.d

The following “true or false” statements describe the sentence that follows them:

TRUE - It is impossible to obtain useful forecasts of the series. Forecast models could not predict better than the naive forecast.

FALSE - The series is random. - The changes in output, up or down, from one period to another in the same series, are not related. The presence of random changes in the directions of outputs does not imply that the series outputs are random. Series outputs can be influenced by external information.

TRUE - The changes in the series from one period to the other are random. - The change in output, up or down, occurring in any period of the series cannot be used to predict the direction of change occuring in other periods of the same series.

Question 3.a

The regression model of log(sales) was found for the training period.

The regression model of log(sales) was found for the validation period.

The residuals of the regression model of log(sales) in the validation period was calculated.

The forecast for February 2002 was calculated in the log and original scales.

A plot was created for: (1) the log(sales) original series, (2) the regression model for the training period, (3) the forecast for the validation period, and (4) the forecast for February 2002.

Question 3.b

The ACF for forecast errors up to lag-15 was graphed and an AR(2) model was fit below. The null hypothesis that the AR(2) was a random walk was rejected.

See below: The t-statistics and p-values were statistically significant (t-statistic was outside the (-2) to 2 interval, and p < 0.0.5). The residuals series of the regression model for sales was not a random walk.

Here are the t-statistics:

Here are the p-values:

Question 3.b.i

The ACF graph for the AR(2) model doesn’t have autocorrelation for any lags because all the autocorrelation of the original sales time series was captured.

Question 3.b.ii

Below are the January 2001 & January 2002 sales forecasts (in log and original scales) which were adjusted using an AR(2) model of the forecasted error.

The January 2002 forecast (in log and original scales), before adjustment, was calculated below:

The above AR(2) training period model of the regression model errors was copied below:

An AR(2) validation period model of the regression model’s errors follows below:

The adjusted 2001 forecast, computed below, is given in original and log scales.

For January 2001, the sales forecast was $9,781.14

The actual and adjusted forecasted sales in 2001 for the validation period were plotted below:

For January 2002 sales, the error of regression was calculated using past values as predictors in the equation for an AR(2) model in the validation period. The intercept and beta coefficients were taken from the summary of the AR(2) model above.

The AR(2) model for the errors of the regression of sales was plotted in the training and validation period, and for the January 2002 forecast.

The forecasted sales for January 2002, obtained using the regression model, was adjusted with the forecasted error, obtained using the AR(2) model. The original and log scales are below:

Question 4

The original appliance shipments time series is plotted below:

Question 4.a

Lag-1 and lag-4 were expected to have the largest coefficients because they represent trend and seasonality which were obviously present in the quarterly time series.

Question 4.b

No lagged autocorrelations were larger than the upper or lower thresholds indicating that original appliance shipment series was not significantly autocorrelated. Lag-4 had the largest coefficient of 0.387