library(forecast)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.3
library(readxl)
Canada_Hours <- read_excel("~/rProjects/Assignment_6/Canada_Hours.xlsx")
Question 1
# Create a time series object out of it
hoursTS <- ts(Canada_Hours$HoursPerWeek, start=c(1966), frequency=1)
yrange = range(hoursTS)
# Set up the plot
plot(c(1966, 2000), yrange, type="n", xlab="Year", ylab="Hours Per Week", bty="l", xaxt="n", yaxt="n")
# Add the time series air
lines(hoursTS, bty="l")
# Add the x-axis
axis(1, at=seq(1965,2000,5), labels=format(seq(1965,2000,5)))
# Add the y-axis
axis(2, at=seq(34.5,40, 0.5), labels=format(seq(34.5,40, 0.5)), las=2)

A)
This is a difficult question to answer without looking at the ACF plot - my first question would be is it possible to have both positive and negative auto-correlation in one plot? I know that negative auto-correlation means that values next to one another differ greatly, while positive auto-correlation values tend to be similar. If I had to guess, I would say that this series might exhibit positive auto-correlation because the trend appears more gradual, which implies that even though the values differ from one time period to the next, they are similar. Let’s check…
Acf(hoursTS, lag.max=1)

hoursACF <- Acf(hoursTS)

hoursACF
##
## Autocorrelations of series 'hoursTS', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.928 0.839 0.752 0.665 0.571 0.473 0.369 0.265 0.164
## 10 11 12 13 14 15
## 0.047 -0.082 -0.185 -0.261 -0.310 -0.346
B)
The first plot above shows that there is a postive lag-1 auto-correlation, above .8, this is confirmed in the output of auto-correlation values; at 1 lag-1 = .928. We can also see that the correlations vary through out the series with the strongest correlations being the first 6 correlations. If we look at the second plot, the negative values imply that this series has a non-linear trend.
Question 2
# Read in the Walmart Stock data
WalMartStock <- read_excel("~/rProjects/Assignment_6/WalMartStock.xlsx")
# Create a time series
stockTS <- ts(WalMartStock$Close, start=c(2001,2), frequency=225)
plot(stockTS, xlab="Time", ylab="Closing Price ($)", main = "Daily Closing Time Series", bty="l", xaxt='n')
# Add the x-axis
axis(1, at=c(2001.0, 2001.2, 2001.4, 2001.6, 2001.8, 2002.0), labels=c("Mar-01", "May-01", "Jul-01","Sep-01", "Nov-01", "Jan-02"))

A)
To create a time plot of the differenced series, I simply plotted the lag-1 of the series; this removed the trend of the stock closing prices.
plot(diff(stockTS, lag=1), ylab="Lag-1", xlab="Time", bty="l", main="Lag-1 Difference", xaxt = 'n')
axis(1, at=c(2001.0, 2001.2, 2001.4, 2001.6, 2001.8, 2002.0), labels=c("Mar-01", "May-01", "Jul-01","Sep-01", "Nov-01", "Jan-02"))

B)
The two items that are relevant for testing the stock series for random walk are:
1. The AR(1) slope coefficient for the closing price series. “A random walk is a special case of an AR(1) model, where the slope coefficient is equal to 1.”
2. The autocorrelations of the difference series. “Another approach for evaluating predictability, is to examine the series of differences between each pair of consecutive values (lag-1 difference series). If the ACF plot [of the differenced series] indicates that the autocorrelations at lags 1, 2, 3, etc are all approximately 0, then a random walk is inferred”
stockAR1 <- Arima(stockTS, order=c(1,0,0))
stockAR1
## Series: stockTS
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 intercept
## 0.9558 52.9497
## s.e. 0.0187 1.3280
##
## sigma^2 estimated as 0.9815: log likelihood=-349.8
## AIC=705.59 AICc=705.69 BIC=716.13
C)
Using the AR table alone, I would say that this series has random walk. I came to my conclusion by looking at the slope coefficient, which is .9558. A series is said to have random walk when its slope coefficient is equal to 1, and .9558 is very close to 1. We can evaluate the series further by calculating the model’s p-value using a t-distribution and a normal distribution, then comparing the p-values to alpha. If the values are larger than alpha then random walk is implied.
# Calculate the two-tailed p-value from a t-distribution
2*pt(-abs((1 - stockAR1$coef["ar1"]) / 0.0187), df=length(stockTS)-1)
## ar1
## 0.01896261
# Now calculate it using the normal distribution
2*pnorm(-abs((1 - stockAR1$coef["ar1"]) / 0.0187))
## ar1
## 0.01818593
Assuming alpha = .01, we can see that the p-values are larger than the alpha, so random walk is implied.
D)
Each of the listed statements are true about random walks:
1. A random walk series is unpredictable therefore forcasting with any accuracy is next to impossible. Naive forecasts are best guess forecasts for random walks.
2. By definition a random walk implies that the series is random.
3. Again, random walks are defined as having their period to period changes be random.
Question 3
# Import the data
SouvenirSales <- read_excel("~/rProjects/Assignment_6/SouvenirSales.xlsx")
A)
Same as question C & Ciii from Assignment 5:
The forecast for February 2002 is $17,062.99.
# Create the time series
sales.ts <- ts(SouvenirSales$Sales, start=c(1995,1), frequency=12)
# Set up traning and validation period lengths
validLength <- 12
trainLength <- length(sales.ts) - validLength
# Create training period and validation period
salesTrain <- window(sales.ts, end=c(1995, trainLength))
salesValid <- window(sales.ts, start=c(1995,trainLength+1))
# Regression Model with log(sales)
salesRegMod <- tslm(log(salesTrain) ~ trend + season)
summary(salesRegMod)
##
## Call:
## tslm(formula = log(salesTrain) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9306
## F-statistic: 80.4 on 12 and 59 DF, p-value: < 2.2e-16
# Use the named coefficients
febForecast <- salesRegMod$coefficients["(Intercept)"] + salesRegMod$coefficients["trend"]*86 + salesRegMod$coefficients["season2"]
# Convert them back and simply echo them
exp(febForecast)
## (Intercept)
## 17062.99
Bi)
Looking at the ACF plot, it appears as though we could have used sine and cosine functions to better capture trend - I see this as a result of the wave like pattern of the lag terms. Also, the low co-efficients of .3072 and .3687 imply that this series is not a random walk, and the statisical significance of of the co-efficients shows that this model is a good fit.
# Plot only the first 15 lag-correlations
resACF <- Acf(salesRegMod$residuals[1:35])

# Output lag-correlations
resACF
##
## Autocorrelations of series 'salesRegMod$residuals[1:35]', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.322 0.287 -0.069 -0.108 0.033 -0.136 -0.057 0.086 0.116
## 10 11 12 13 14 15
## 0.257 -0.042 -0.096 -0.090 -0.207 -0.005
# Run an AR(2) on the residuals from the fitted model
ARModel <- Arima(salesRegMod$residuals, order=c(2,0,0))
ARModel
## Series: salesRegMod$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 intercept
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
AR1 t statistic = 2.818482 which is greater than 2, therefore implies significance.
# Calculate the t statistics for each: coefficient / s.e.
# Rough rule is anything > 2 (or < -2) is statistically significant
ARModel$coef["ar1"] / 0.1090
## ar1
## 2.818482
AR2 t statistic = 3.346186 which is greater than 2, therefore implies significance.
ARModel$coef["ar2"] / 0.1102
## ar2
## 3.346186
AR 1 p-value = 0.004825136 which is less than .01, therefore implies significance.
# Now estimate p-value based on the normal distribution
2*pnorm(-abs(ARModel$coef["ar1"] / 0.1090))
## ar1
## 0.004825136
AR 2 p-value = 0.0008193151 which is less than .01, therefore implies significance.
2*pnorm(-abs(ARModel$coef["ar2"] / 0.1102))
## ar2
## 0.0008193151
Bii)
# Create an ACF plot of the residuals from the AR(2) model
Acf(ARModel$residuals)

# Generate forecasts for the validation period
lrForecast <- forecast(salesRegMod, h=validLength)
lrForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9.188097 8.917220 9.458974 8.769890 9.606304
## Feb 2001 9.491232 9.220354 9.762109 9.073024 9.909439
## Mar 2001 9.925335 9.654457 10.196212 9.507127 10.343542
## Apr 2001 9.625329 9.354452 9.896207 9.207122 10.043537
## May 2001 9.694286 9.423408 9.965163 9.276078 10.112493
## Jun 2001 9.740741 9.469864 10.011619 9.322534 10.158949
## Jul 2001 9.898195 9.627318 10.169072 9.479988 10.316402
## Aug 2001 9.882831 9.611954 10.153708 9.464624 10.301038
## Sep 2001 9.992619 9.721742 10.263496 9.574412 10.410826
## Oct 2001 10.107664 9.836787 10.378542 9.689457 10.525872
## Nov 2001 10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001 11.372615 11.101738 11.643493 10.954408 11.790823
# Generate a forecast for the error terms using residuals AR(2) model
errorForecast <- forecast(ARModel, h=validLength)
errorForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001 0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001 0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001 0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001 0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001 0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001 0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001 0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001 0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001 0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001 0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001 0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
# Create the adjusted forecast by adding the two forecasts together
adjustedForecast <- lrForecast$mean + errorForecast$mean
# Single out January 2001 forecast
janForecast <- adjustedForecast[1]
# Convert January's forecast back and print the result
exp(janForecast)
## [1] 10894.13
The adjusted forecast for January 2001 is $10,894.13.