Problems 1, 2, 3, & 4, Pgs. 170-174

1. Analysis of Canadian Manufacturing Workers Work-Hours: The time series plot in Figure 7.9 describes the average annual number of weekly hours spent by Canadian manufacturing workers. The data is available in CanadianWorkHours.xls.

setwd("C:/Users/larms.LA-INSP5559/Documents/R/win-library/3.3/17_0405_assignment6")
cwhours<- read.csv("CanadianWorkHours.csv", stringsAsFactors = FALSE)
str(cwhours)
## 'data.frame':    35 obs. of  2 variables:
##  $ Year          : int  1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 ...
##  $ Hours_per_week: num  37.2 37 37.4 37.5 37.7 37.7 37.4 37.2 37.3 37.2 ...
head(cwhours)
##   Year Hours_per_week
## 1 1966           37.2
## 2 1967           37.0
## 3 1968           37.4
## 4 1969           37.5
## 5 1970           37.7
## 6 1971           37.7
library(forecast)
cworkhoursTS<- ts(cwhours$Hours_per_week, start = c(1966), frequency = 1)
yrange = (cworkhoursTS)
plot(cworkhoursTS, xlab = "Years", ylab = "Average Hours per Week", ylim = c(34.5, 38), bty = "l")
lines(cworkhoursTS, bty = "l")

1.(a) If we computed the autocorrelation of this series, would the lag-1 autocorrelation exhibit negative, positive, or no autocorrelation? How can you see this from the plot?

1.(b) Compute the autocorrelation and produce an ACF plot. Verify your answer to the previous question.

Acf(cworkhoursTS, lag.max = 1, main = "Autocorrelation of Canadian Work Hours Time Series")

It would exhibit positive autocorrelation. You can see this in the plot on the line labeled as “Lag 1”. It is showing a positive correlation of almost .10.

2. Forecasting Wal-Mart Stock: Figure 7.10 shows a time plot of Wal-Mart daily closing prices between February 2001 and February 2002. The data is available at finance.yahoo.com and in WalMartStock.xls. The ACF plots of these daily closing prices and its lag-1 differenced series are in Figure 7.11. Table 7.4 shows the output from fitting an AR(1) model to the series of closing prices and to the series of differences. Use all the information to answer the following questions.

library(lubridate)
WMstock<- read.csv("WalMartStock.csv", header = TRUE, stringsAsFactors = FALSE)

str(WMstock)
## 'data.frame':    248 obs. of  2 variables:
##  $ Date : chr  "5-Feb-01" "6-Feb-01" "7-Feb-01" "8-Feb-01" ...
##  $ Close: num  53.8 53.2 54.7 52.3 50.4 ...
head(WMstock)
##        Date Close
## 1  5-Feb-01 53.84
## 2  6-Feb-01 53.20
## 3  7-Feb-01 54.66
## 4  8-Feb-01 52.30
## 5  9-Feb-01 50.40
## 6 12-Feb-01 53.45
WMstockts<- ts(WMstock$Close, start = c(2001), frequency = 248)

yrange = (WMstockts)
plot(WMstockts, xlab = "Months", ylab = "Stock Closing Prices", bty = "l", xaxt = "n", yaxt = "n", main = "Walmart Daily Stock Closing Prices")
axis(1,at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(45,60,5),labels = format(seq(45.0,60.0,5)),las = 2)
lines(WMstockts, bty = "l")

2.(a) Create a time plot of the differenced series.

Acf(WMstockts, lag.max = 10)

Acf(diff(WMstockts, lag=1), lag.max = 10)

plot(diff(WMstockts, lag = 1), main = "Time Plot of Walmart Differenced Series")

2.(b) Which of the following is/are relevant for testing whether this stock is a random walk?

. The autocorrelations of the closing price series

. The AR(1) slope coefficient for the closing price series

. The autocorrelations of the differenced series

. The AR(1) slope coefficient for the differenced series

2.(c) Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.

WMLT<- tslm(WMstockts ~ trend + I(trend^2))
WMfit<- Arima(WMstockts, order = c(1,0,0))
WMfit
## Series: WMstockts 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1  intercept
##       0.9558    52.9497
## s.e.  0.0187     1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13
#calculate p-value from t-distribution
2*pt(-abs((1-WMfit$coef["ar1"])/0.0187),df=length(WMstockts)-1)
##        ar1 
## 0.01896261
#calculate p-vlaue from normal distribution
2*pnorm(-abs((1-WMfit$coef["ar1"])/0.0187))
##        ar1 
## 0.01818593

Per the text, a random walk is when an AR(1) model has a slope coefficient equal to 1. The difference between the values at different periods (t-1) are completely random. With stock prices, I would make a blank assumption that they would be random, since there are so many variables that can impact daily prices. However, when using an alpha = 0.01, this Wal-Mart daily stock price series would be considered a random walk, since my p-value is greater than the alpha. If I used an alpha = 0.05, the data would not be a random walk because my p-value would be less than the alpha. However, based on the trend in the time plot, I am going to stick to an alpha of .05 and state that it is not a random walk.

2.(d) What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below.

. It is impossible to obtain useful forecasts of the series.

3. Souvenir Sales: The file SouvenirSales.xls contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, between 1995 and 2001.

Back in 2001, the store wanted to use the data to forecast sales for the next 12 months (year 2002). They hired an analyst to generate forecasts. The analyst first partitioned the data into training and validation periods, with the validation set containing the last 12 months of data (year 2001). She then fit a regression model to sales, using the training period.

souvenir<- read.csv("souvenirsales.csv", stringsAsFactors=FALSE)
Souvenir_SalesTS<- ts(souvenir$Sales, start=c(1995,1), frequency=12)
validsouvLength <- 12
trainsouvLength <- length(Souvenir_SalesTS) - validsouvLength
souvenirWTrain <- window(Souvenir_SalesTS, start=c(1995,1), end=c(1995, trainsouvLength))
souvenirWValid <- window(Souvenir_SalesTS, start=c(1995,trainsouvLength+1), end=c(1995,trainsouvLength+validsouvLength))
#plot the time series
plot(Souvenir_SalesTS, type = "n", xaxt = "n", yaxt = "n", xlab = "Year", ylab = "Sales (Thousands)", main = "Souvenir Sales")
lines(Souvenir_SalesTS, lwd = 2)
axis(1,at=seq(1995, 2002, 1), labels = format(seq(1995, 2002, 1)))
axis(2, at = seq(0,105000,5000),  labels = format(seq(0, 105, 5)), las = 2)

3.(a) Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.

modela<- tslm(souvenirWTrain ~ trend + season, lambda = 0)
modela
## 
## Call:
## tslm(formula = souvenirWTrain ~ trend + season, lambda = 0)
## 
## Coefficients:
## (Intercept)        trend      season2      season3      season4  
##     7.64636      0.02112      0.28201      0.69500      0.37387  
##     season5      season6      season7      season8      season9  
##     0.42171      0.44705      0.58338      0.54690      0.63557  
##    season10     season11     season12  
##     0.72949      1.20095      1.95220
forecastA<- forecast(modela, h=12)
forecastA
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9780.022  7459.322  12822.72  6437.463  14858.16
## Feb 2001      13243.095 10100.643  17363.21  8716.947  20119.38
## Mar 2001      20441.749 15591.130  26801.46 13455.287  31055.83
## Apr 2001      15143.541 11550.133  19854.91  9967.870  23006.60
## May 2001      16224.628 12374.689  21272.34 10679.469  24649.03
## Jun 2001      16996.137 12963.127  22283.87 11187.297  25821.13
## Jul 2001      19894.424 15173.679  26083.86 13095.024  30224.31
## Aug 2001      19591.112 14942.340  25686.18 12895.376  29763.51
## Sep 2001      21864.492 16676.271  28666.84 14391.774  33217.31
## Oct 2001      24530.299 18709.509  32162.02 16146.477  37267.30
## Nov 2001      40144.775 30618.828  52634.38 26424.329  60989.36
## Dec 2001      86908.868 66286.278 113947.44 57205.664 132035.03
plot(log(Souvenir_SalesTS), type = "n", xaxt = "n", yaxt = "n", xlab = "Year", ylab = "Sales (Thousands)", main = "Log Souvenir Sales")
lines(log(Souvenir_SalesTS), lwd = 2)
axis(1,at=seq(1995, 2002, 1), labels = format(seq(1995, 2002, 1)))
axis(2, at = seq(6,12,1),  labels = format(seq(6, 12, 1)), las = 2)

Acf(modela$residuals)

3.(b) Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2 [ARIMA(2, 0, 0)] to the forecast errors.

Acf(modela$residuals, lag.max = 15)

ARmodelsouvenirsales<- Arima(forecastA$residuals, order = c(2,0,0))
ARmodelsouvenirsales
## Series: forecastA$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.3072  0.3687    -0.0025
## s.e.  0.1090  0.1102     0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95

i. Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?

#calculate the t-statistic for each coefficient
ARmodelsouvenirsales$coef["ar1"]/0.109
##      ar1 
## 2.818482
ARmodelsouvenirsales$coef["ar2"]/0.1102
##      ar2 
## 3.346186
#estimate p-values based on the normal distribution
2*pnorm(-abs(ARmodelsouvenirsales$coef["ar1"]/0.109))
##         ar1 
## 0.004825136
2*pnorm(-abs(ARmodelsouvenirsales$coef["ar2"]/0.1102))
##          ar2 
## 0.0008193151

We can learn that they are statistically significant because the p-values are less than alpha = 0.05.

ii. Use the autocorrelation information to compute a forecast for January 2002, using the regression model and the AR(2) model above.

Acf(ARmodelsouvenirsales$residuals)

ARpred<- forecast(modela, h=validsouvLength)
ARpred
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9780.022  7459.322  12822.72  6437.463  14858.16
## Feb 2001      13243.095 10100.643  17363.21  8716.947  20119.38
## Mar 2001      20441.749 15591.130  26801.46 13455.287  31055.83
## Apr 2001      15143.541 11550.133  19854.91  9967.870  23006.60
## May 2001      16224.628 12374.689  21272.34 10679.469  24649.03
## Jun 2001      16996.137 12963.127  22283.87 11187.297  25821.13
## Jul 2001      19894.424 15173.679  26083.86 13095.024  30224.31
## Aug 2001      19591.112 14942.340  25686.18 12895.376  29763.51
## Sep 2001      21864.492 16676.271  28666.84 14391.774  33217.31
## Oct 2001      24530.299 18709.509  32162.02 16146.477  37267.30
## Nov 2001      40144.775 30618.828  52634.38 26424.329  60989.36
## Dec 2001      86908.868 66286.278 113947.44 57205.664 132035.03
#forecast for error terms
errorpred<- forecast(ARmodelsouvenirsales, h=validsouvLength)
errorpred
##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001    0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001    0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001    0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001    0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001    0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001    0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001    0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001    0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001    0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001    0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001    0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001    0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
#add the two forecasts together
adjforecast<- ARpred$mean + errorpred$mean
adjforecast
##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 2001  9780.13 13243.19 20441.82 15143.60 16224.67 16996.17 19894.45
##           Aug      Sep      Oct      Nov      Dec
## 2001 19591.13 21864.51 24530.31 40144.78 86908.87
plot(souvenirWValid, type = "n", xaxt = "n", yaxt = "n", xlab = "Year", ylab = "Sales (Thousands)", main = "Souvenir Sales")
axis(1,at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "June", "July", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2, at = seq(0,105000,5000),  labels = format(seq(0, 105, 5)), las = 2)
lines(souvenirWValid, lwd = 2)
lines(adjforecast, lwd = 2, col = "blue")

4. Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985 and 1989. The series is plotted in Figure 7.13.

appliance<- read.csv("ApplianceShipments.csv", stringsAsFactors=FALSE)
str(appliance)
## 'data.frame':    20 obs. of  2 variables:
##  $ Quarter  : chr  "Q1-1985" "Q1-1986" "Q1-1987" "Q1-1988" ...
##  $ Shipments: int  4009 4123 4493 4595 4245 4321 4522 4806 4799 4900 ...
head(appliance)
##   Quarter Shipments
## 1 Q1-1985      4009
## 2 Q1-1986      4123
## 3 Q1-1987      4493
## 4 Q1-1988      4595
## 5 Q1-1989      4245
## 6 Q2-1985      4321
appliancets<- ts(appliance$Shipments, start = c(1985), frequency = 4)

4.(a) If we compute the autocorrelation of the series, which lag (> 0) is most likely to have the largest coefficient (in absolute value)? 4.(b) Create an ACF plot and compare it with your answer.

acfappliance<- Acf(appliancets)

Lag 7 would most likely be the largest coefficient of the series with the absolute value of .4.