Question 1 - Analysis of Canadian Manufacturing Workers Work-Hours: The time series plot in Figure 7.9 describes the average annual number of weekly hours spent by Canadian manufacturing workers. The data is available in CanadianWorkHours.xls.

(a) If we computed the autocorrelation of this series, would the lag-1 autocorrelation exhibit negative, positive, or no autocorrelation? How can you see this from the plot?

The plot seems fairly smooth with consequent values generally moving in the same direction. There’s a trend and no sharp differences so I would guess there is a positive lag-1 autocorrelation.

(b) Compute the autocorrelation and produce an ACF plot. Verify your answer to the previous question.
CanadaHours <- read.csv("/Users/wendyhayes/Desktop/MBA 678-Predictive Analytics/CanadianWorkHours.csv",stringsAsFactors=FALSE)
library(forecast)
str(CanadaHours)
## 'data.frame':    35 obs. of  2 variables:
##  $ Year          : int  1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 ...
##  $ Hours.per.week: num  37.2 37 37.4 37.5 37.7 37.7 37.4 37.2 37.3 37.2 ...
head(CanadaHours)
##   Year Hours.per.week
## 1 1966           37.2
## 2 1967           37.0
## 3 1968           37.4
## 4 1969           37.5
## 5 1970           37.7
## 6 1971           37.7
tail(CanadaHours)
##    Year Hours.per.week
## 30 1995           35.7
## 31 1996           35.7
## 32 1997           35.5
## 33 1998           35.6
## 34 1999           36.3
## 35 2000           36.5
CanadaHoursTS <- ts(CanadaHours$Hours.per.week, start=c(1966, 1), frequency=1)

yrange = range(CanadaHoursTS)
plot(c(1966, 2000), yrange, type="n", xlab="Year",  ylab="Canadian Manufacturers hours per week", bty="l", xaxt="n", yaxt="n")

lines(CanadaHoursTS, bty="l")

axis(1, at=seq(1970,2000,5), labels=format(seq(1970,2000,5)))
axis(2, at=seq(35,39,.5), labels=format(seq(35,39,.5)), las=2)

Acf(CanadaHoursTS, lag.max=1)

The ACF plot verified the prediction of a positive lag-1 correlation.

_______________________________________________________________________

Question 2 - Forecasting Wal-Mart Stock: Figure 7.10 shows a time plot of Wal-Mart daily closing prices between February 2001 and February 2002. The data is available at finance.yahoo.com and in WalMartStock.xls. The ACF plots of these daily closing prices and its lag-1 differenced series are in Figure 7.11. Table 7.4 shows the output from fitting an AR(1) model to the series of closing prices and to the series of differences. Use all the information to answer the following questions.

(a) Create a time plot of the differenced series.####
WalMartStock <- read.csv("/Users/wendyhayes/Desktop/MBA 678-Predictive Analytics/WalMartStock.csv",stringsAsFactors=FALSE)
str(WalMartStock)
## 'data.frame':    248 obs. of  2 variables:
##  $ Date : chr  "5-Feb-01" "6-Feb-01" "7-Feb-01" "8-Feb-01" ...
##  $ Close: num  53.8 53.2 54.7 52.3 50.4 ...
head(WalMartStock)
##        Date Close
## 1  5-Feb-01 53.84
## 2  6-Feb-01 53.20
## 3  7-Feb-01 54.66
## 4  8-Feb-01 52.30
## 5  9-Feb-01 50.40
## 6 12-Feb-01 53.45
tail(WalMartStock)
##          Date Close
## 243 28-Jan-02 58.63
## 244 29-Jan-02 57.91
## 245 30-Jan-02 59.75
## 246 31-Jan-02 59.98
## 247  1-Feb-02 59.26
## 248  4-Feb-02 58.90
WalMartStockTS <- ts(WalMartStock$Close, start=c(2001, 1), frequency=225)

par(mfrow = c(2,2))
plot(WalMartStockTS, ylab="Stock Prices($)", xlab="Year", bty="l", main="WalMart Stock Prices ($)")

#Plot differenced series
plot(diff(WalMartStockTS, lag=1), ylab="Lag-1", xlab="Year", bty="l", main="Lag-1 Difference")

(b) Which of the following is/are relevant for testing whether this stock is a random walk?

• The autocorrelations of the closing price series - Relevant

• The AR(1) slope coefficient for the closing price series-Not relevant

• The AR(1) constant coefficient for the closing price series-Not relevant

• The autocorrelations of the differenced series-Relevant

• The AR(1) slope coefficient for the differenced series-Relevant

• The AR(1) constant coefficient for the differenced series-Not relevant

(c) Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.
fit <- Arima(WalMartStockTS, order=c(1,0,0))
fit
## Series: WalMartStockTS 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1  intercept
##       0.9558    52.9497
## s.e.  0.0187     1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13
2*pt(-abs((1-fit$coef["ar1"])/0.0187), df=length(WalMartStockTS)-1)
##        ar1 
## 0.01896261

This p-value, 0.01896261, is larger than 0.01 which would support the hypotheses that this series is a random walk.

(d) What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below.

• It is impossible to obtain useful forecasts of the series.You can make a naive forecast which is of limited usefulness.

• The series is random. This is true, the series is random.

• The changes in the series from one period to the other are random. This should be true too as a pattern would indicate a lack of randomness.

_______________________________________________________________________

Question 3 - Souvenir Sales: The file SouvenirSales.xls contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, between 1995 and 2001.

Back in 2001, the store wanted to use the data to forecast sales for the next 12 months (year 2002). They hired an analyst to generate forecasts. The analyst first partitioned the data into training and validation periods, with the validation set containing the last 12 months of data (year 2001). She then fit a regression model to sales, using the training period.

(a) Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.
SouvSales <- read.csv("/Users/wendyhayes/Desktop/MBA 678-Predictive Analytics/SouvenirSales.csv",stringsAsFactors=FALSE)
str(SouvSales)
## 'data.frame':    84 obs. of  2 variables:
##  $ Date : chr  "Jan-95" "Feb-95" "Mar-95" "Apr-95" ...
##  $ Sales: num  1665 2398 2841 3547 3753 ...
head(SouvSales)
##     Date   Sales
## 1 Jan-95 1664.81
## 2 Feb-95 2397.53
## 3 Mar-95 2840.71
## 4 Apr-95 3547.29
## 5 May-95 3752.96
## 6 Jun-95 3714.74
tail(SouvSales)
##     Date     Sales
## 79 1-Jul  26155.15
## 80 1-Aug  28586.52
## 81 1-Sep  30505.41
## 82 1-Oct  30821.33
## 83 1-Nov  46634.38
## 84 1-Dec 104660.67
SouvSalesTS <- ts(SouvSales$Sales, start=c(1995, 1), frequency=12)

validLength <- 12
trainLength <- length(SouvSalesTS ) - validLength

SouvTrain <- window(SouvSalesTS , end=c(1995, trainLength))
SouvValid <- window(SouvSalesTS , start=c(1995,trainLength+1))

SalesExpo <- tslm(SouvTrain ~ trend + season, lambda=0)
summary(SalesExpo)
## 
## Call:
## tslm(formula = SouvTrain ~ trend + season, lambda = 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2e+10 on 12 and 59 DF,  p-value: < 2.2e-16
validPeriodForecasts <- forecast(SalesExpo, h=validLength)
validPeriodForecasts
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9780.022  7459.322  12822.72  6437.463  14858.16
## Feb 2001      13243.095 10100.643  17363.21  8716.947  20119.38
## Mar 2001      20441.749 15591.130  26801.46 13455.287  31055.83
## Apr 2001      15143.541 11550.133  19854.91  9967.870  23006.60
## May 2001      16224.628 12374.689  21272.34 10679.469  24649.03
## Jun 2001      16996.137 12963.127  22283.87 11187.297  25821.13
## Jul 2001      19894.424 15173.679  26083.86 13095.024  30224.31
## Aug 2001      19591.112 14942.340  25686.18 12895.376  29763.51
## Sep 2001      21864.492 16676.271  28666.84 14391.774  33217.31
## Oct 2001      24530.299 18709.509  32162.02 16146.477  37267.30
## Nov 2001      40144.775 30618.828  52634.38 26424.329  60989.36
## Dec 2001      86908.868 66286.278 113947.44 57205.664 132035.03
Feb <- forecast(SalesExpo, h=14)
yrange=range(SouvSalesTS+2)
plot(c(1995, 2003), yrange, type="n", xlab="Year", ylab="Souvenir Sales (in thousands in Australian Dollars)", bty="l", xaxt="n", yaxt="n")
lines(SouvSalesTS, bty="l")
axis(1, at=seq(1995,2003,1),labels=format(seq(1995,2003,1)))
axis(2, at=seq(1600,110000,10000), labels=format(seq(16,1100,100)), las=2)
lines(SalesExpo$fitted, col="red")
lines(Feb$mean,col="blue",lty=2)
legend(1995, 110000, c("ARPM", "Exponential Trend+Season", "Forecasts"), lty=c(1,1,2), col=c("black","red","blue"), bty="n")

(b) Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2 [ARIMA(2, 0, 0)] to the forecast errors.
str(SalesExpo)
## List of 14
##  $ coefficients : Named num [1:13] 7.6464 0.0211 0.282 0.695 0.3739 ...
##   ..- attr(*, "names")= chr [1:13] "(Intercept)" "trend" "season2" "season3" ...
##  $ residuals    : Time-Series [1:72] from 1995 to 2001: -0.25 -0.1884 -0.4529 0.0692 0.0566 ...
##   ..- attr(*, "names")= chr [1:72] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:72] -76.9862 4.2655 -0.9226 0.0412 -0.7915 ...
##   ..- attr(*, "names")= chr [1:72] "(Intercept)" "trend" "season2" "season3" ...
##  $ rank         : int 13
##  $ fitted.values: Time-Series [1:72] from 1995 to 2001: 2138 2895 4468 3310 3546 ...
##   ..- attr(*, "names")= chr [1:72] "1" "2" "3" "4" ...
##  $ assign       : int [1:13] 0 1 2 2 2 2 2 2 2 2 ...
##  $ qr           :List of 5
##   ..$ qr   : num [1:72, 1:13] -8.485 0.118 0.118 0.118 0.118 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:72] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:13] "(Intercept)" "trend" "season2" "season3" ...
##   .. ..- attr(*, "assign")= int [1:13] 0 1 2 2 2 2 2 2 2 2 ...
##   .. ..- attr(*, "contrasts")=List of 1
##   .. .. ..$ season: chr "contr.treatment"
##   ..$ qraux: num [1:13] 1.12 1.17 1.1 1.08 1.07 ...
##   ..$ pivot: int [1:13] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ tol  : num 1e-07
##   ..$ rank : int 13
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 59
##  $ contrasts    :List of 1
##   ..$ season: chr "contr.treatment"
##  $ xlevels      :List of 1
##   ..$ season: chr [1:12] "1" "2" "3" "4" ...
##  $ call         : language tslm(formula = SouvTrain ~ trend + season, lambda = 0)
##  $ terms        :Classes 'terms', 'formula'  language SouvTrain ~ trend + season
##   .. ..- attr(*, "variables")= language list(SouvTrain, trend, season)
##   .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:3] "SouvTrain" "trend" "season"
##   .. .. .. ..$ : chr [1:2] "trend" "season"
##   .. ..- attr(*, "term.labels")= chr [1:2] "trend" "season"
##   .. ..- attr(*, "order")= int [1:2] 1 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(SouvTrain, trend, season)
##   .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "factor"
##   .. .. ..- attr(*, "names")= chr [1:3] "SouvTrain" "trend" "season"
##  $ model        :'data.frame':   72 obs. of  3 variables:
##   ..$ SouvTrain: num [1:72] 7.42 7.78 7.95 8.17 8.23 ...
##   ..$ trend    : int [1:72] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ season   : Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language SouvTrain ~ trend + season
##   .. .. ..- attr(*, "variables")= language list(SouvTrain, trend, season)
##   .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:3] "SouvTrain" "trend" "season"
##   .. .. .. .. ..$ : chr [1:2] "trend" "season"
##   .. .. ..- attr(*, "term.labels")= chr [1:2] "trend" "season"
##   .. .. ..- attr(*, "order")= int [1:2] 1 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(SouvTrain, trend, season)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "factor"
##   .. .. .. ..- attr(*, "names")= chr [1:3] "SouvTrain" "trend" "season"
##  $ lambda       : num 0
##  - attr(*, "class")= chr "lm"
Acf(SalesExpo$residuals, lag.max=15)

fit <- Arima(SalesExpo$residuals, order=c(2,0,0))
fit
## Series: SalesExpo$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.3072  0.3687    -0.0025
## s.e.  0.1090  0.1102     0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95
i. Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?
#t stat for coefficient, anything >2 or <-2 is statitistically significant
fit$coef["ar1"]/0.1090
##      ar1 
## 2.818482
fit$coef["ar2"]/ 0.1102
##      ar2 
## 3.346186
#p-value based on normal distribution
2*pnorm(-abs(fit$coef["ar1"]/ 0.1090))
##         ar1 
## 0.004825136
2*pnorm(-abs(fit$coef["ar2"]/ 0.1102))
##          ar2 
## 0.0008193151

The t-statistic of the coefficient for ar1 was 2.818482 and for ar2 was 3.346186, both being greater than 2, indicating that it is statistically significant. The P-value for both ar1 and ar2 was less than 1%, indicating that they are stistically significant. This means that that the ar2 model is providing an improved fit. The forecasts will not be a random walk but will have a pattern that you can use to perform forecasts.

ii. Use the autocorrelation information to compute a forecast for January 2002( please forecast for January 2001 instead of 2002), using the regression model and the AR(2) model above.
autoForecast <- forecast(SalesExpo, h=validLength)
autoForecast
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9780.022  7459.322  12822.72  6437.463  14858.16
## Feb 2001      13243.095 10100.643  17363.21  8716.947  20119.38
## Mar 2001      20441.749 15591.130  26801.46 13455.287  31055.83
## Apr 2001      15143.541 11550.133  19854.91  9967.870  23006.60
## May 2001      16224.628 12374.689  21272.34 10679.469  24649.03
## Jun 2001      16996.137 12963.127  22283.87 11187.297  25821.13
## Jul 2001      19894.424 15173.679  26083.86 13095.024  30224.31
## Aug 2001      19591.112 14942.340  25686.18 12895.376  29763.51
## Sep 2001      21864.492 16676.271  28666.84 14391.774  33217.31
## Oct 2001      24530.299 18709.509  32162.02 16146.477  37267.30
## Nov 2001      40144.775 30618.828  52634.38 26424.329  60989.36
## Dec 2001      86908.868 66286.278 113947.44 57205.664 132035.03
errorForecast <- forecast(fit, h=validLength)
errorForecast
##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001    0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001    0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001    0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001    0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001    0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001    0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001    0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001    0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001    0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001    0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001    0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001    0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
adjustedForecast <- autoForecast$mean + errorForecast$mean
adjustedForecast
##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 2001  9780.13 13243.19 20441.82 15143.60 16224.67 16996.17 19894.45
##           Aug      Sep      Oct      Nov      Dec
## 2001 19591.13 21864.51 24530.31 40144.78 86908.87
plot(SouvValid, bty="l", xaxt="n", xlab="Year 2001", yaxt="n", ylab="Souvenir Sales (in thousands in Australian Dollars)", ylim=c(1600,110000))
axis(1, at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2, at=seq(1600,110000,10000), labels=format(seq(16,1100,100)), las=2)
lines(adjustedForecast, col="blue")
legend(2001,110000, c("Actuals", "Adjusted Forecast"), lty=c(1,1), col=c("black", "blue"), bty="n")

_______________________________________________________________________

Question 4. - Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985 and 1989. The series is plotted in Figure 7.13.

(a) If we compute the autocorrelation of the series, which lag (> 0) is most likely to have the largest coefficient (in absolute value)?

I would guess lag 4 would be best as there appears to be quarterly seasonality to the plot.

(b) Create an ACF plot and compare it with your answer.
AppShip <- read.csv("/Users/wendyhayes/Desktop/MBA 678-Predictive Analytics/ApplianceShipments.csv",stringsAsFactors=FALSE)
str(AppShip)
## 'data.frame':    20 obs. of  2 variables:
##  $ Quarter  : chr  "Q1-1985" "Q1-1986" "Q1-1987" "Q1-1988" ...
##  $ Shipments: int  4009 4123 4493 4595 4245 4321 4522 4806 4799 4900 ...
head(AppShip)
##   Quarter Shipments
## 1 Q1-1985      4009
## 2 Q1-1986      4123
## 3 Q1-1987      4493
## 4 Q1-1988      4595
## 5 Q1-1989      4245
## 6 Q2-1985      4321
tail(AppShip)
##    Quarter Shipments
## 15 Q3-1989      4585
## 16 Q4-1985      3944
## 17 Q4-1986      4030
## 18 Q4-1987      4485
## 19 Q4-1988      4258
## 20 Q4-1989      4533
AppShipTS <- ts(AppShip$Shipments, start=c(1985, 1), frequency=4)

yrange = range(AppShipTS)
plot(c(1985, 1989), yrange, type="n", xlab="Quarter",  ylab="Appliance Shipments", bty="l", xaxt="n", yaxt="n")
lines(AppShipTS, bty="l")
axis(1, at=seq(1985,1989,1), labels=format(seq(1985,1989,1)))
axis(2, at=seq(3500,4600,100), labels=format(seq(3500,4600,100)), las=2)

Acf(AppShipTS, lag.max=4)

appACF <- Acf(AppShipTS)

appACF
## 
## Autocorrelations of series 'AppShipTS', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.279  0.044  0.021 -0.070  0.150 -0.088 -0.471 -0.367 -0.274 
##     10     11     12     13 
##  0.011  0.002 -0.022 -0.090

Lag 1 seems to be relatively strong but still not very strong. Lag 7 seems to have a strong negative correlation.There is a pattern of sorts in the initial data but that pattern is not extremely strong. Just looking at it, it doesn’t seem that it would be a simple task to forecast and that seems to be born out with the ACF plots.