Ch.7: Q.1-4

Q.1 \(\textit{Analysis of Canadian Manufacturing Workers Work-Hours}\)

library(forecast)
leafhrs <- read.csv("CanadianWorkHours.csv")
#str(leafhrs)
#head(leafhrs)
#tail(leafhrs)
leafhrsTS <- ts(leafhrs$HoursPerWeek,start=c(1966,1))
yrange = range(leafhrsTS)
plot(c(1966,2000),yrange,type="n",xlab="Year",ylab="Hours Per Week",bty="l",xaxt="n",yaxt="n")
axis(1,at=seq(1965,2000,5),labels=format(seq(1965,2000,5)))
axis(2,at=seq(34.5,38.0,0.5),labels=format(seq(34.5,38.0,0.5)),las=2)
lines(leafhrsTS,bty="l")

  1. Lag-1 negative autocorrelation would mean the plot exhibits high values that are immediately followed by low values, and vice-versa. Positive autocorrelation, on the other hand, is where values tend to “stick” together. With the exception of a couple values (at 1987 and 1998), immediate values stay relatively close to each other hence the lag-1 autocorrelation would probably be positive.

  2. We compute the autocorrelation and produce the ACF plots below:

Acf(leafhrsTS,lag.max=1,main="ACF Plot for Canadian Work-Hours (lag-1)")

leafhrsACF <- Acf(leafhrsTS,main="ACF Plot for Canadian Work-Hours")

leafhrsACF
## 
## Autocorrelations of series 'leafhrsTS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.928  0.839  0.752  0.665  0.571  0.473  0.369  0.265  0.164 
##     10     11     12     13     14     15 
##  0.047 -0.082 -0.185 -0.261 -0.310 -0.346

We observe from the output a very strong positive lag-1 autocorrelation of 0.928. It is also observed that the autocorrelation decreases linearly as the lag increases and the autocorrelation becomes negative starting at lag-11.

Q.2 \(\textit{Forecasting Wal-Mart Stock}\)

First plot the raw series:

walmartstock <- read.csv("WalMartStock.csv")
#str(walmartstock)
#head(walmartstock)
#tail(walmartstock)
walmartstockTS <- ts(walmartstock$Close)
yrange = range(walmartstockTS)
plot(walmartstockTS,xlab="Time",ylab="Closing Price ($)",bty="l",xaxt="n",yaxt="n")
axis(1,at=c(18,60,103,147,185,226),labels=c("Mar-01","May-01","Jul-01","Sep-01","Nov-01","Jan-02"))
axis(2,at=seq(40,60,5),labels=format(seq(40,60,5)),las=2)

  1. A time plot of the differenced series (lag-1) is given below:
plot(diff(walmartstockTS,lag=1),xlab="Time",ylab="Lag-1",bty="l",xaxt="n",yaxt="n")
axis(1,at=c(18,60,103,147,185,226),labels=c("Mar-01","May-01","Jul-01","Sep-01","Nov-01","Jan-02"))
axis(2,at=seq(-3,3,1),labels=format(seq(-3,3,1)),las=2)

  1. Which of the following is/are relevant for testing whether this stock is a random walk?

\(\bullet\) The autocorrelations of the closing price series

\(\bullet\) The AR(1) slope coefficient for the closing price series

A random walk has slope coefficient equal to 1, so testing for a random walk involves testing the slope coefficient equal to 1.

\(\bullet\) The AR(1) constant coefficient for the closing price series

\(\bullet\) The autocorrelations of the differenced series

If we have a random walk, the ACF plot of the differenced series will have autocorrelations roughly equal to 0 at lags 1,2,3,…

\(\bullet\) The AR(1) slope cofficient for the differenced series

\(\bullet\) The AR(1) constant coefficient for the differenced series

  1. The fitted AR(1) model is computed below:
fit <- Arima(walmartstockTS,order=c(1,0,0))
fit
## Series: walmartstockTS 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1  intercept
##       0.9558    52.9497
## s.e.  0.0187     1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13

We observe the slope is estimated to be 0.9558. To see whether or not we have a random walk, we have the following hypothesis test: \[H_0: \beta_1 = 1 \text{ vs. } H_a: \beta_1 \not = 1\]

2*pt(-abs((1 - fit$coef["ar1"]) / 0.0187), df=length(walmartstockTS)-1)
##        ar1 
## 0.01896261

The two-tailed p-value from a t-distribution is roughly 0.01896, which indicates significance at the \(\alpha=0.05\) critical value. So there is strong evidence to reject the null hypothesis that the slope coefficient is equal to 1 and suggest that the AR model does not indicate that this is a random walk.

  1. What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below:

\(\bullet\) It is impossible to obtain useful forecasts of the series.

We can still use a naive forecast.

\(\bullet\) The series is random.

To say the series itself is random is to say we are unable to predict future values from past values yet a naive forecast may yield useful predictions. There may even be other “predictable” series that are correlated with a random walk. The \(\textit{change}\) in the series, however, is random.

\(\bullet\) The changes in the series from one period to the other are random.

True. This is the definition of a random walk.

Q3. \(\textit{Souvenir Sales}\)

First plot the raw data:

souvsales <- read.csv("SouvenirSales.csv")
#str(souvsales)
#head(souvsales)
#tail(souvsales)
souvsalesTS <- ts(souvsales$Sales,start=c(1995,1),frequency=12)
yrange = range(souvsalesTS)
plot(c(1995,2001),yrange,type="n",xlab="Year",ylab="Sales (thousands of Australian dollars)",bty="l",xaxt="n",yaxt="n")
lines(souvsalesTS,bty="l")
axis(1,at=seq(1995,2002,1),labels=format(seq(1995,2002,1)))
axis(2,at=seq(0,110000,10000),labels=format(seq(0,110,10)),las=2)

validLength <- 12
trainLength <- length(souvsalesTS) - validLength
souvsalesTrain <- window(souvsalesTS,end=c(1995,trainLength))
souvsalesValid <- window(souvsalesTS,start=c(1995,trainLength+1))
  1. We run a regression model with log(Sales) as the output variable with a linear trend and monthly predictors and is summarized below:
souvsalesLogLinearSeason <- tslm(log(souvsalesTrain) ~ trend + season)
summary(souvsalesLogLinearSeason)
## 
## Call:
## tslm(formula = log(souvsalesTrain) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16

To obtain a forecast for February 2002, we input a time of 86 and season of season2 then exponentiate back.

souvsalesLogLinearSeasonForecast <- forecast(souvsalesLogLinearSeason,h=validLength)
feb2002Forecast <- souvsalesLogLinearSeason$coefficients["(Intercept)"] + souvsalesLogLinearSeason$coefficients["trend"]*86 + souvsalesLogLinearSeason$coefficients["season2"]
exp(feb2002Forecast)
## (Intercept) 
##    17062.99

So we see an approximate sales forecast for February 2002 to be about 17,063 Australian dollars.

  1. The ACF plot until lag-15 for the forecast errors is
resACF <- Acf(souvsalesLogLinearSeason$residuals,lag.max=15)

resACF
## 
## Autocorrelations of series 'souvsalesLogLinearSeason$residuals', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.459  0.485  0.194  0.088  0.154  0.016  0.030  0.106  0.034 
##     10     11     12     13     14     15 
##  0.152 -0.055 -0.012 -0.047 -0.077 -0.023
ARModel2 <- Arima(souvsalesLogLinearSeason$residuals,order=c(2,0,0))
ARModel2
## Series: souvsalesLogLinearSeason$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.3072  0.3687    -0.0025
## s.e.  0.1090  0.1102     0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95

The p-value associated with the AR(1) coefficient is

2*pnorm(-abs(ARModel2$coef["ar1"]/0.1089629))
##         ar1 
## 0.004810732

and the p-value associated with the AR(2) coefficient is

2*pnorm(-abs(ARModel2$coef["ar2"]/0.1101939))
##          ar2 
## 0.0008187679

Also look at the ACF plot of the residuals of the AR(2) model…

Acf(ARModel2$residuals)

We do not examine any significant autocorrelation at any lag.

  1. The ACF plot exhibits strong autocorrelation at lag-1 and lag-2. In fact, lag-2 autocorrelation is 0.485 which is larger than the lag-1 of 0.459. We examined significant p-values for the coefficients for both the AR(1) and AR(2) models. Due to the stronger lag-2 autocorrelation and a very significant p-value associated with the AR(2) coefficient, an AR model with lag-2 is seems to be an appropriate choice which implies that the regression forecasts that are 2 time periods apart are correlated.

  2. Combining the regression model and the AR(2) model, we forecast for January 2001 as follows:

lrForecast <- forecast(souvsalesLogLinearSeason,h=validLength)
lrForecast
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9.188097  8.917220  9.458974  8.769890  9.606304
## Feb 2001       9.491232  9.220354  9.762109  9.073024  9.909439
## Mar 2001       9.925335  9.654457 10.196212  9.507127 10.343542
## Apr 2001       9.625329  9.354452  9.896207  9.207122 10.043537
## May 2001       9.694286  9.423408  9.965163  9.276078 10.112493
## Jun 2001       9.740741  9.469864 10.011619  9.322534 10.158949
## Jul 2001       9.898195  9.627318 10.169072  9.479988 10.316402
## Aug 2001       9.882831  9.611954 10.153708  9.464624 10.301038
## Sep 2001       9.992619  9.721742 10.263496  9.574412 10.410826
## Oct 2001      10.107664  9.836787 10.378542  9.689457 10.525872
## Nov 2001      10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001      11.372615 11.101738 11.643493 10.954408 11.790823
errorForecast <- forecast(ARModel2,h=validLength)
errorForecast
##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001    0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001    0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001    0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001    0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001    0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001    0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001    0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001    0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001    0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001    0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001    0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001    0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
adjustedForecast <- lrForecast$mean + errorForecast$mean
adjustedForecast
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2001  9.295979  9.589783  9.994580  9.682130  9.736457  9.773830  9.923098
##            Aug       Sep       Oct       Nov       Dec
## 2001  9.901870 10.006838 10.118240 10.607927 11.378062
#exponentiate back
exp(adjustedForecast)
##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 2001 10894.13 14614.70 21907.40 16028.61 16923.47 17567.92 20396.07
##           Aug      Sep      Oct      Nov      Dec
## 2001 19967.68 22177.61 24791.11 40454.26 87383.49
plot(souvsalesValid,xlab="Year of 2001",ylab="Sales (thousands of Australian dollars)",bty="l",xaxt="n",yaxt="n")
axis(1, at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(0,110000,10000),labels=format(seq(0,110,10)),las=2)
lines(exp(adjustedForecast),col="blue")
legend(2001,110000, c("Actuals", "Adjusted Forecast"), lty=c(1,1), col=c("black", "blue"), bty="n")

The forecast for January 2001 is 10,894.13 Australian dollars.

Q4. \(\textit{Shipments of Household Appliances}\)

Plot of raw series:

shipments <- read.csv("ApplianceShipments.csv")
#str(shipments)
#head(shipments)
#tail(shipments)
shipmentsTS <- ts(shipments$Shipments,start=c(1985,1),frequency=4)
plot(shipmentsTS,bty="l",ylab="Shipments (millions $)")

  1. If we assume there is quarterly seasonality, then there would be a relationship between values that are 4 time periods apart. So lag-4 would most likely have the largest coefficient (in absolute value).

  2. The ACF plot is

shipmentsACF <- Acf(shipmentsTS)

shipmentsACF
## 
## Autocorrelations of series 'shipmentsTS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.261 -0.098  0.164  0.387 -0.030 -0.269  0.081  0.086 -0.168 
##     10     11     12     13 
## -0.325 -0.019  0.047 -0.096

It is observed that lag-4 does indeed have the largest autocorrelation (in absolute value) of 0.387. However, none of the autocorrelations are statistically significant.