Problems 1, 2, 3, & 4, Pgs. 170-174
1. Analysis of Canadian Manufacturing Workers Work-Hours: The time series plot in Figure 7.9 describes the average annual number of weekly hours spent by Canadian manufacturing workers. The data is available in CanadianWorkHours.xls.
setwd("C:/Users/larms.LA-INSP5559/Documents/R/win-library/3.3/17_0405_assignment6")
cwhours<- read.csv("CanadianWorkHours.csv", stringsAsFactors = FALSE)
str(cwhours)
## 'data.frame': 35 obs. of 2 variables:
## $ Year : int 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 ...
## $ Hours_per_week: num 37.2 37 37.4 37.5 37.7 37.7 37.4 37.2 37.3 37.2 ...
head(cwhours)
## Year Hours_per_week
## 1 1966 37.2
## 2 1967 37.0
## 3 1968 37.4
## 4 1969 37.5
## 5 1970 37.7
## 6 1971 37.7
library(forecast)
cworkhoursTS<- ts(cwhours$Hours_per_week, start = c(1966), frequency = 1)
yrange = (cworkhoursTS)
plot(cworkhoursTS, xlab = "Years", ylab = "Average Hours per Week", ylim = c(34.5, 38), bty = "l")
lines(cworkhoursTS, bty = "l")
1.(a) If we computed the autocorrelation of this series, would the lag-1 autocorrelation exhibit negative, positive, or no autocorrelation? How can you see this from the plot?
1.(b) Compute the autocorrelation and produce an ACF plot. Verify your answer to the previous question.
Acf(cworkhoursTS, lag.max = 1, main = "Autocorrelation of Canadian Work Hours Time Series")
It would exhibit positive autocorrelation. You can see this in the plot on the line labeled as “Lag 1”. It is showing a positive correlation of almost .10.
2. Forecasting Wal-Mart Stock: Figure 7.10 shows a time plot of Wal-Mart daily closing prices between February 2001 and February 2002. The data is available at finance.yahoo.com and in WalMartStock.xls. The ACF plots of these daily closing prices and its lag-1 differenced series are in Figure 7.11. Table 7.4 shows the output from fitting an AR(1) model to the series of closing prices and to the series of differences. Use all the information to answer the following questions.
library(lubridate)
WMstock<- read.csv("WalMartStock.csv", header = TRUE, stringsAsFactors = FALSE)
str(WMstock)
## 'data.frame': 248 obs. of 2 variables:
## $ Date : chr "5-Feb-01" "6-Feb-01" "7-Feb-01" "8-Feb-01" ...
## $ Close: num 53.8 53.2 54.7 52.3 50.4 ...
head(WMstock)
## Date Close
## 1 5-Feb-01 53.84
## 2 6-Feb-01 53.20
## 3 7-Feb-01 54.66
## 4 8-Feb-01 52.30
## 5 9-Feb-01 50.40
## 6 12-Feb-01 53.45
WMstockts<- ts(WMstock$Close, start = c(2001), frequency = 248)
yrange = (WMstockts)
plot(WMstockts, xlab = "Months", ylab = "Stock Closing Prices", bty = "l", xaxt = "n", yaxt = "n", main = "Walmart Daily Stock Closing Prices")
axis(1,at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(45,60,5),labels = format(seq(45.0,60.0,5)),las = 2)
lines(WMstockts, bty = "l")
2.(a) Create a time plot of the differenced series.
Acf(WMstockts, lag.max = 10)
Acf(diff(WMstockts, lag=1), lag.max = 10)
plot(diff(WMstockts, lag = 1), main = "Time Plot of Walmart Differenced Series")
2.(b) Which of the following is/are relevant for testing whether this stock is a random walk?
. The autocorrelations of the closing price series
. The AR(1) slope coefficient for the closing price series
. The autocorrelations of the differenced series
. The AR(1) slope coefficient for the differenced series
2.(c) Recreate the AR(1) model output for the Close price series shown in the left of Table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.
WMLT<- tslm(WMstockts ~ trend + I(trend^2))
WMfit<- Arima(WMstockts, order = c(1,0,0))
WMfit
## Series: WMstockts
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 intercept
## 0.9558 52.9497
## s.e. 0.0187 1.3280
##
## sigma^2 estimated as 0.9815: log likelihood=-349.8
## AIC=705.59 AICc=705.69 BIC=716.13
#calculate p-value from t-distribution
2*pt(-abs((1-WMfit$coef["ar1"])/0.0187),df=length(WMstockts)-1)
## ar1
## 0.01896261
#calculate p-vlaue from normal distribution
2*pnorm(-abs((1-WMfit$coef["ar1"])/0.0187))
## ar1
## 0.01818593
Per the text, a random walk is when an AR(1) model has a slope coefficient equal to 1. The difference between the values at different periods (t-1) are completely random. With stock prices, I would make a blank assumption that they would be random, since there are so many variables that can impact daily prices. However, when using an alpha = 0.01, this Wal-Mart daily stock price series would be considered a random walk, since my p-value is greater than the alpha. If I used an alpha = 0.05, the data would not be a random walk because my p-value would be less than the alpha. However, based on the trend in the time plot, I am going to stick to an alpha of .05 and state that it is not a random walk.
2.(d) What are the implications of finding that a time series is a random walk? Choose the correct statement(s) below.
. It is impossible to obtain useful forecasts of the series.
3. Souvenir Sales: The file SouvenirSales.xls contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, between 1995 and 2001.
Back in 2001, the store wanted to use the data to forecast sales for the next 12 months (year 2002). They hired an analyst to generate forecasts. The analyst first partitioned the data into training and validation periods, with the validation set containing the last 12 months of data (year 2001). She then fit a regression model to sales, using the training period.
souvenir<- read.csv("souvenirsales.csv", stringsAsFactors=FALSE)
Souvenir_SalesTS<- ts(souvenir$Sales, start=c(1995,1), frequency=12)
validsouvLength <- 12
trainsouvLength <- length(Souvenir_SalesTS) - validsouvLength
souvenirWTrain <- window(Souvenir_SalesTS, start=c(1995,1), end=c(1995, trainsouvLength))
souvenirWValid <- window(Souvenir_SalesTS, start=c(1995,trainsouvLength+1), end=c(1995,trainsouvLength+validsouvLength))
#plot the time series
plot(Souvenir_SalesTS, type = "n", xaxt = "n", yaxt = "n", xlab = "Year", ylab = "Sales (Thousands)", main = "Souvenir Sales")
lines(Souvenir_SalesTS, lwd = 2)
axis(1,at=seq(1995, 2002, 1), labels = format(seq(1995, 2002, 1)))
axis(2, at = seq(0,105000,5000), labels = format(seq(0, 105, 5)), las = 2)
3.(a) Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.
modela<- tslm(souvenirWTrain ~ trend + season, lambda = 0)
modela
##
## Call:
## tslm(formula = souvenirWTrain ~ trend + season, lambda = 0)
##
## Coefficients:
## (Intercept) trend season2 season3 season4
## 7.64636 0.02112 0.28201 0.69500 0.37387
## season5 season6 season7 season8 season9
## 0.42171 0.44705 0.58338 0.54690 0.63557
## season10 season11 season12
## 0.72949 1.20095 1.95220
forecastA<- forecast(modela, h=12)
forecastA
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9780.022 7459.322 12822.72 6437.463 14858.16
## Feb 2001 13243.095 10100.643 17363.21 8716.947 20119.38
## Mar 2001 20441.749 15591.130 26801.46 13455.287 31055.83
## Apr 2001 15143.541 11550.133 19854.91 9967.870 23006.60
## May 2001 16224.628 12374.689 21272.34 10679.469 24649.03
## Jun 2001 16996.137 12963.127 22283.87 11187.297 25821.13
## Jul 2001 19894.424 15173.679 26083.86 13095.024 30224.31
## Aug 2001 19591.112 14942.340 25686.18 12895.376 29763.51
## Sep 2001 21864.492 16676.271 28666.84 14391.774 33217.31
## Oct 2001 24530.299 18709.509 32162.02 16146.477 37267.30
## Nov 2001 40144.775 30618.828 52634.38 26424.329 60989.36
## Dec 2001 86908.868 66286.278 113947.44 57205.664 132035.03
plot(log(Souvenir_SalesTS), type = "n", xaxt = "n", yaxt = "n", xlab = "Year", ylab = "Sales (Thousands)", main = "Log Souvenir Sales")
lines(log(Souvenir_SalesTS), lwd = 2)
axis(1,at=seq(1995, 2002, 1), labels = format(seq(1995, 2002, 1)))
axis(2, at = seq(6,12,1), labels = format(seq(6, 12, 1)), las = 2)
Acf(modela$residuals)
3.(b) Create an ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2 [ARIMA(2, 0, 0)] to the forecast errors.
Acf(modela$residuals, lag.max = 15)
ARmodelsouvenirsales<- Arima(forecastA$residuals, order = c(2,0,0))
ARmodelsouvenirsales
## Series: forecastA$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 intercept
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
i. Examining the ACF plot and the estimated coefficients of the AR(2) model (and their statistical significance), what can we learn about the regression forecasts?
#calculate the t-statistic for each coefficient
ARmodelsouvenirsales$coef["ar1"]/0.109
## ar1
## 2.818482
ARmodelsouvenirsales$coef["ar2"]/0.1102
## ar2
## 3.346186
#estimate p-values based on the normal distribution
2*pnorm(-abs(ARmodelsouvenirsales$coef["ar1"]/0.109))
## ar1
## 0.004825136
2*pnorm(-abs(ARmodelsouvenirsales$coef["ar2"]/0.1102))
## ar2
## 0.0008193151
We can learn that they are statistically significant because the p-values are less than alpha = 0.05.
ii. Use the autocorrelation information to compute a forecast for January 2002, using the regression model and the AR(2) model above.
Acf(ARmodelsouvenirsales$residuals)
ARpred<- forecast(modela, h=validsouvLength)
ARpred
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9780.022 7459.322 12822.72 6437.463 14858.16
## Feb 2001 13243.095 10100.643 17363.21 8716.947 20119.38
## Mar 2001 20441.749 15591.130 26801.46 13455.287 31055.83
## Apr 2001 15143.541 11550.133 19854.91 9967.870 23006.60
## May 2001 16224.628 12374.689 21272.34 10679.469 24649.03
## Jun 2001 16996.137 12963.127 22283.87 11187.297 25821.13
## Jul 2001 19894.424 15173.679 26083.86 13095.024 30224.31
## Aug 2001 19591.112 14942.340 25686.18 12895.376 29763.51
## Sep 2001 21864.492 16676.271 28666.84 14391.774 33217.31
## Oct 2001 24530.299 18709.509 32162.02 16146.477 37267.30
## Nov 2001 40144.775 30618.828 52634.38 26424.329 60989.36
## Dec 2001 86908.868 66286.278 113947.44 57205.664 132035.03
#forecast for error terms
errorpred<- forecast(ARmodelsouvenirsales, h=validsouvLength)
errorpred
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001 0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001 0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001 0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001 0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001 0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001 0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001 0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001 0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001 0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001 0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001 0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
#add the two forecasts together
adjforecast<- ARpred$mean + errorpred$mean
adjforecast
## Jan Feb Mar Apr May Jun Jul
## 2001 9780.13 13243.19 20441.82 15143.60 16224.67 16996.17 19894.45
## Aug Sep Oct Nov Dec
## 2001 19591.13 21864.51 24530.31 40144.78 86908.87
plot(souvenirWValid, type = "n", xaxt = "n", yaxt = "n", xlab = "Year", ylab = "Sales (Thousands)", main = "Souvenir Sales")
axis(1,at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "June", "July", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2, at = seq(0,105000,5000), labels = format(seq(0, 105, 5)), las = 2)
lines(souvenirWValid, lwd = 2)
lines(adjforecast, lwd = 2, col = "blue")
4. Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985 and 1989. The series is plotted in Figure 7.13.
appliance<- read.csv("ApplianceShipments.csv", stringsAsFactors=FALSE)
str(appliance)
## 'data.frame': 20 obs. of 2 variables:
## $ Quarter : chr "Q1-1985" "Q1-1986" "Q1-1987" "Q1-1988" ...
## $ Shipments: int 4009 4123 4493 4595 4245 4321 4522 4806 4799 4900 ...
head(appliance)
## Quarter Shipments
## 1 Q1-1985 4009
## 2 Q1-1986 4123
## 3 Q1-1987 4493
## 4 Q1-1988 4595
## 5 Q1-1989 4245
## 6 Q2-1985 4321
appliancets<- ts(appliance$Shipments, start = c(1985), frequency = 4)
4.(a) If we compute the autocorrelation of the series, which lag (> 0) is most likely to have the largest coefficient (in absolute value)? 4.(b) Create an ACF plot and compare it with your answer.
acfappliance<- Acf(appliancets)
Lag 7 would most likely be the largest coefficient of the series with the absolute value of .4.