Ch.7: Q.1-4
Q.1 \(\textit{Analysis of Canadian Manufacturing Workers Work-Hours}\)
library(forecast)
leafhrs <- read.csv("CanadianWorkHours.csv")
#str(leafhrs)
#head(leafhrs)
#tail(leafhrs)
leafhrsTS <- ts(leafhrs$HoursPerWeek,start=c(1966,1))
yrange = range(leafhrsTS)
plot(c(1966,2000),yrange,type="n",xlab="Year",ylab="Hours Per Week",bty="l",xaxt="n",yaxt="n")
axis(1,at=seq(1965,2000,5),labels=format(seq(1965,2000,5)))
axis(2,at=seq(34.5,38.0,0.5),labels=format(seq(34.5,38.0,0.5)),las=2)
lines(leafhrsTS,bty="l")
Lag-1 negative autocorrelation would mean the plot exhibits high values that are immediately followed by low values, and vice-versa. Positive autocorrelation, on the other hand, is where values tend to “stick” together. With the exception of a couple values (at 1987 and 1998), immediate values stay relatively close to each other hence the lag-1 autocorrelation would probably be positive.
We compute the autocorrelation and produce the ACF plots below:
Acf(leafhrsTS,lag.max=1,main="ACF Plot for Canadian Work-Hours (lag-1)")
leafhrsACF <- Acf(leafhrsTS,main="ACF Plot for Canadian Work-Hours")
leafhrsACF
##
## Autocorrelations of series 'leafhrsTS', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.928 0.839 0.752 0.665 0.571 0.473 0.369 0.265 0.164
## 10 11 12 13 14 15
## 0.047 -0.082 -0.185 -0.261 -0.310 -0.346
We observe from the output a very strong positive lag-1 autocorrelation of 0.928. It is also observed that the autocorrelation decreases linearly as the lag increases and the autocorrelation becomes negative starting at lag-11.
Q.2 \(\textit{Forecasting Wal-Mart Stock}\)
First plot the raw series:
walmartstock <- read.csv("WalMartStock.csv")
#str(walmartstock)
#head(walmartstock)
#tail(walmartstock)
walmartstockTS <- ts(walmartstock$Close)
yrange = range(walmartstockTS)
plot(walmartstockTS,xlab="Time",ylab="Closing Price ($)",bty="l",xaxt="n",yaxt="n")
axis(1,at=c(18,60,103,147,185,226),labels=c("Mar-01","May-01","Jul-01","Sep-01","Nov-01","Jan-02"))
axis(2,at=seq(40,60,5),labels=format(seq(40,60,5)),las=2)
plot(diff(walmartstockTS,lag=1),xlab="Time",ylab="Lag-1",bty="l",xaxt="n",yaxt="n")
axis(1,at=c(18,60,103,147,185,226),labels=c("Mar-01","May-01","Jul-01","Sep-01","Nov-01","Jan-02"))
axis(2,at=seq(-3,3,1),labels=format(seq(-3,3,1)),las=2)
\(\bullet\) The autocorrelations of the closing price series
\(\bullet\) The AR(1) slope coefficient for the closing price series
A random walk has slope coefficient equal to 1, so testing for a random walk involves testing the slope coefficient equal to 1.
\(\bullet\) The AR(1) constant coefficient for the closing price series
\(\bullet\) The autocorrelations of the differenced series
If we have a random walk, the ACF plot of the differenced series will have autocorrelations roughly equal to 0 at lags 1,2,3,…
\(\bullet\) The AR(1) slope cofficient for the differenced series
\(\bullet\) The AR(1) constant coefficient for the differenced series
fit <- Arima(walmartstockTS,order=c(1,0,0))
fit
## Series: walmartstockTS
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 intercept
## 0.9558 52.9497
## s.e. 0.0187 1.3280
##
## sigma^2 estimated as 0.9815: log likelihood=-349.8
## AIC=705.59 AICc=705.69 BIC=716.13
We observe the slope is estimated to be 0.9558. To see whether or not we have a random walk, we have the following hypothesis test: \[H_0: \beta_1 = 1 \text{ vs. } H_a: \beta_1 \not = 1\]
2*pt(-abs((1 - fit$coef["ar1"]) / 0.0187), df=length(walmartstockTS)-1)
## ar1
## 0.01896261
The two-tailed p-value from a t-distribution is roughly 0.01896, which indicates significance at the \(\alpha=0.05\) critical value. So there is strong evidence to reject the null hypothesis that the slope coefficient is equal to 1 and suggest that the AR model does not indicate that this is a random walk.
\(\bullet\) It is impossible to obtain useful forecasts of the series.
We can still use a naive forecast.
\(\bullet\) The series is random.
To say the series itself is random is to say we are unable to predict future values from past values yet a naive forecast may yield useful predictions. There may even be other “predictable” series that are correlated with a random walk. The \(\textit{change}\) in the series, however, is random.
\(\bullet\) The changes in the series from one period to the other are random.
True. This is the definition of a random walk.
Q3. \(\textit{Souvenir Sales}\)
First plot the raw data:
souvsales <- read.csv("SouvenirSales.csv")
#str(souvsales)
#head(souvsales)
#tail(souvsales)
souvsalesTS <- ts(souvsales$Sales,start=c(1995,1),frequency=12)
yrange = range(souvsalesTS)
plot(c(1995,2001),yrange,type="n",xlab="Year",ylab="Sales (thousands of Australian dollars)",bty="l",xaxt="n",yaxt="n")
lines(souvsalesTS,bty="l")
axis(1,at=seq(1995,2002,1),labels=format(seq(1995,2002,1)))
axis(2,at=seq(0,110000,10000),labels=format(seq(0,110,10)),las=2)
validLength <- 12
trainLength <- length(souvsalesTS) - validLength
souvsalesTrain <- window(souvsalesTS,end=c(1995,trainLength))
souvsalesValid <- window(souvsalesTS,start=c(1995,trainLength+1))
souvsalesLogLinearSeason <- tslm(log(souvsalesTrain) ~ trend + season)
summary(souvsalesLogLinearSeason)
##
## Call:
## tslm(formula = log(souvsalesTrain) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9306
## F-statistic: 80.4 on 12 and 59 DF, p-value: < 2.2e-16
To obtain a forecast for February 2002, we input a time of 86 and season of season2 then exponentiate back.
souvsalesLogLinearSeasonForecast <- forecast(souvsalesLogLinearSeason,h=validLength)
feb2002Forecast <- souvsalesLogLinearSeason$coefficients["(Intercept)"] + souvsalesLogLinearSeason$coefficients["trend"]*86 + souvsalesLogLinearSeason$coefficients["season2"]
exp(feb2002Forecast)
## (Intercept)
## 17062.99
So we see an approximate sales forecast for February 2002 to be about 17,063 Australian dollars.
resACF <- Acf(souvsalesLogLinearSeason$residuals,lag.max=15)
resACF
##
## Autocorrelations of series 'souvsalesLogLinearSeason$residuals', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.459 0.485 0.194 0.088 0.154 0.016 0.030 0.106 0.034
## 10 11 12 13 14 15
## 0.152 -0.055 -0.012 -0.047 -0.077 -0.023
ARModel2 <- Arima(souvsalesLogLinearSeason$residuals,order=c(2,0,0))
ARModel2
## Series: souvsalesLogLinearSeason$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 intercept
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
The p-value associated with the AR(1) coefficient is
2*pnorm(-abs(ARModel2$coef["ar1"]/0.1089629))
## ar1
## 0.004810732
and the p-value associated with the AR(2) coefficient is
2*pnorm(-abs(ARModel2$coef["ar2"]/0.1101939))
## ar2
## 0.0008187679
Also look at the ACF plot of the residuals of the AR(2) model…
Acf(ARModel2$residuals)
We do not examine any significant autocorrelation at any lag.
The ACF plot exhibits strong autocorrelation at lag-1 and lag-2. In fact, lag-2 autocorrelation is 0.485 which is larger than the lag-1 of 0.459. We examined significant p-values for the coefficients for both the AR(1) and AR(2) models. Due to the stronger lag-2 autocorrelation and a very significant p-value associated with the AR(2) coefficient, an AR model with lag-2 is seems to be an appropriate choice which implies that the regression forecasts that are 2 time periods apart are correlated.
Combining the regression model and the AR(2) model, we forecast for January 2001 as follows:
lrForecast <- forecast(souvsalesLogLinearSeason,h=validLength)
lrForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9.188097 8.917220 9.458974 8.769890 9.606304
## Feb 2001 9.491232 9.220354 9.762109 9.073024 9.909439
## Mar 2001 9.925335 9.654457 10.196212 9.507127 10.343542
## Apr 2001 9.625329 9.354452 9.896207 9.207122 10.043537
## May 2001 9.694286 9.423408 9.965163 9.276078 10.112493
## Jun 2001 9.740741 9.469864 10.011619 9.322534 10.158949
## Jul 2001 9.898195 9.627318 10.169072 9.479988 10.316402
## Aug 2001 9.882831 9.611954 10.153708 9.464624 10.301038
## Sep 2001 9.992619 9.721742 10.263496 9.574412 10.410826
## Oct 2001 10.107664 9.836787 10.378542 9.689457 10.525872
## Nov 2001 10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001 11.372615 11.101738 11.643493 10.954408 11.790823
errorForecast <- forecast(ARModel2,h=validLength)
errorForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001 0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001 0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001 0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001 0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001 0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001 0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001 0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001 0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001 0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001 0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001 0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
adjustedForecast <- lrForecast$mean + errorForecast$mean
adjustedForecast
## Jan Feb Mar Apr May Jun Jul
## 2001 9.295979 9.589783 9.994580 9.682130 9.736457 9.773830 9.923098
## Aug Sep Oct Nov Dec
## 2001 9.901870 10.006838 10.118240 10.607927 11.378062
#exponentiate back
exp(adjustedForecast)
## Jan Feb Mar Apr May Jun Jul
## 2001 10894.13 14614.70 21907.40 16028.61 16923.47 17567.92 20396.07
## Aug Sep Oct Nov Dec
## 2001 19967.68 22177.61 24791.11 40454.26 87383.49
plot(souvsalesValid,xlab="Year of 2001",ylab="Sales (thousands of Australian dollars)",bty="l",xaxt="n",yaxt="n")
axis(1, at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(0,110000,10000),labels=format(seq(0,110,10)),las=2)
lines(exp(adjustedForecast),col="blue")
legend(2001,110000, c("Actuals", "Adjusted Forecast"), lty=c(1,1), col=c("black", "blue"), bty="n")
The forecast for January 2001 is 10,894.13 Australian dollars.
Q4. \(\textit{Shipments of Household Appliances}\)
Plot of raw series:
shipments <- read.csv("ApplianceShipments.csv")
#str(shipments)
#head(shipments)
#tail(shipments)
shipmentsTS <- ts(shipments$Shipments,start=c(1985,1),frequency=4)
plot(shipmentsTS,bty="l",ylab="Shipments (millions $)")
If we assume there is quarterly seasonality, then there would be a relationship between values that are 4 time periods apart. So lag-4 would most likely have the largest coefficient (in absolute value).
The ACF plot is
shipmentsACF <- Acf(shipmentsTS)
shipmentsACF
##
## Autocorrelations of series 'shipmentsTS', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.261 -0.098 0.164 0.387 -0.030 -0.269 0.081 0.086 -0.168
## 10 11 12 13
## -0.325 -0.019 0.047 -0.096
It is observed that lag-4 does indeed have the largest autocorrelation (in absolute value) of 0.387. However, none of the autocorrelations are statistically significant.