library(fpp)
library(fma)
library(forecast)
library(ggplot2)
Walmart <- read.csv("WalMartStock.csv", header = TRUE, stringsAsFactors = FALSE)
WalmartTS <- ts(Walmart$Close, c(2001), frequency = 248)
yrange <- range(WalmartTS)
plot(WalmartTS, xlab = "Months", ylab = "Stock Price", bty = "l", xaxt = "n", yaxt = "n", main = "Walmart Daily Stock Closing Prices")
axis(1,at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2,at=seq(40,70,5),labels = format(seq(40.0,70.0,5)),las = 2)
Acf(WalmartTS, lag.max = 10)
WalmartACF <- Acf(WalmartTS)
Acf(diff(WalmartTS, lag=1),lag.max=10, main="ACF Plot for Differenced Series")
plot(diff(WalmartTS, lag=1), bty="l")
Relevant
Relevant: if the slope coefficient is equal to 1, the series represent a random walk.
Not relevant
Relevant: if all the lags of a series equal to zero, it means the series is a random walk.
Not relevant
Relevant
A random walk is a series in which changes from one time period to the next are random. To test whether a series is a random walk, we fit an AR(1) model and test the hypothesis that the slope coefficient is equal to 1. If the hypothesis is rejected (reflected by a small p-value), then the series is not a random walk, and we can attempt to predict it.
Walmartfit <- Arima(WalmartTS, order=c(1,0,0))
Walmartfit
## Series: WalmartTS
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.9558 52.9497
## s.e. 0.0187 1.3280
##
## sigma^2 estimated as 0.9815: log likelihood=-349.8
## AIC=705.59 AICc=705.69 BIC=716.13
The p-value using the t distribution
2*pt(-abs((1 - Walmartfit$coef["ar1"]) / 0.0187), df=length(WalmartTS)-1)
## ar1
## 0.01896261
The p-value using a normal distribution
2*pnorm(-abs((1-Walmartfit$coef["ar1"])/ 0.0187))
## ar1
## 0.01818593
The slope coefficient (0.9558) is within 2.4 standard errors of 1 indicating that this may not be a random walk. At a significance level of 0.01, both calculated p-values are greater than alpha indicating a random walk. However, at alpha being equal to 0.05, we can conclude that there is an underlying pattern and that these series is not a random walk series. That said, based on the trend in the time plot, as well as reasoning above, I will conclude that the time series is not a random walk.
True. It is impossible to obtain useful forecasts of the series that would be more predictive than the naive forecast.
False. Series output can be influenced by external information and there might be a pattern, just not forecastable.
True. Per definition: A random walk is a series in which changes from one time period to the next are random. Economists refer to such time series as “unpreditable”.
SouvenirSales <- read.csv("SouvenirSales.csv")
SouvenirSalesTS <- ts(SouvenirSales$Sales, start = c(1995,1), freq=12)
plot(SouvenirSalesTS, type = "n", xaxt = "n", yaxt = "n", xlab = "Year", ylab = "Sales (in Thousands)", main = "Souvenir Sales")
lines(SouvenirSalesTS)
axis(1,at=seq(1995, 2002, 1), labels = format(seq(1995, 2002, 1)))
axis(2, at = seq(0,120000,20000), labels = format(seq(0, 120, 20)), las = 2)
nValid <- 12
nTrain <- length(SouvenirSalesTS) - nValid
TrainTS <- window(SouvenirSalesTS, start= c(1995,1), end= c(1995, nTrain))
ValidTS <- window(SouvenirSalesTS, start= c(1995, nTrain+1), end= c(1995, nTrain+nValid))
SouvenirLinearSeason <- tslm(TrainTS ~ trend + season, lambda =0)
summary(SouvenirLinearSeason)
##
## Call:
## tslm(formula = TrainTS ~ trend + season, lambda = 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2e+10 on 12 and 59 DF, p-value: < 2.2e-16
ForecastFeb2002 <- SouvenirLinearSeason$coefficients["(Intercept)"] + SouvenirLinearSeason$coefficients["trend"]*86 + SouvenirLinearSeason$coefficients["season2"]
ForecastFeb2002
## (Intercept)
## 9.744667
exp(ForecastFeb2002)
## (Intercept)
## 17062.99
ResidualsACF <- Acf(SouvenirLinearSeason$residuals, lag.max=15)
ResidualsACF
##
## Autocorrelations of series 'SouvenirLinearSeason$residuals', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.459 0.485 0.194 0.088 0.154 0.016 0.030 0.106 0.034
## 10 11 12 13 14 15
## 0.152 -0.055 -0.012 -0.047 -0.077 -0.023
ARModel <- Arima(SouvenirLinearSeason$residuals, order=c(2,0,0))
ARModel
## Series: SouvenirLinearSeason$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
ARModel$coef["ar1"] / 0.1090
## ar1
## 2.818482
ARModel$coef["ar2"] / 0.1102
## ar2
## 3.346186
2*pnorm(-abs(ARModel$coef["ar1"] / 0.1090))
## ar1
## 0.004825136
2*pnorm(-abs(ARModel$coef["ar2"] / 0.1102))
## ar2
## 0.0008193151
Acf(ARModel$residuals)
By examining the first plot and seeing Lag 1 and 2, I can conclude that there is a trend that was missed in the forecast. We can attempt to capture this trend in the model for sales or using a second-layer model.Lag-12 that does not go beyond the dashed lines indicates that seasonality has been captured.
The second plot indicates that the all the autocorrelation of the series was captured, as there are no lags tat go beyond the dashed lines.
Both t-statstics for AR(1) and AR(2) are statistically significant at 2.82 and 3.35 respectively. The p-values of .005 and .001 support significance, since they are both well under 0.01/ 0.05, confirming that the series is not a random walk.
Acf(ARModel$residuals)
ErrorForecast <- forecast(ARModel, h=nValid)
ErrorForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001 0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001 0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001 0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001 0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001 0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001 0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001 0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001 0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001 0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001 0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001 0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
ForecastJan2001 <- forecast(SouvenirLinearSeason, h=nValid)
ForecastJan2001
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9780.022 7459.322 12822.72 6437.463 14858.16
## Feb 2001 13243.095 10100.643 17363.21 8716.947 20119.38
## Mar 2001 20441.749 15591.130 26801.46 13455.287 31055.83
## Apr 2001 15143.541 11550.133 19854.91 9967.870 23006.60
## May 2001 16224.628 12374.689 21272.34 10679.469 24649.03
## Jun 2001 16996.137 12963.127 22283.87 11187.297 25821.13
## Jul 2001 19894.424 15173.679 26083.86 13095.024 30224.31
## Aug 2001 19591.112 14942.340 25686.18 12895.376 29763.51
## Sep 2001 21864.492 16676.271 28666.84 14391.774 33217.31
## Oct 2001 24530.299 18709.509 32162.02 16146.477 37267.30
## Nov 2001 40144.775 30618.828 52634.38 26424.329 60989.36
## Dec 2001 86908.868 66286.278 113947.44 57205.664 132035.03
AdjustedForecast <- ForecastJan2001$mean + ErrorForecast$mean
AdjustedForecast
## Jan Feb Mar Apr May Jun Jul
## 2001 9780.13 13243.19 20441.82 15143.60 16224.67 16996.17 19894.45
## Aug Sep Oct Nov Dec
## 2001 19591.13 21864.51 24530.31 40144.78 86908.87
Using a combination of the two models yields an adjusted forecast. Once this is inverted back to a sales number, we can see that the January 2001 sales are expected to be $9,780.13.
plot(ValidTS, type = "n", xaxt = "n", yaxt = "n", xlab = "Year", ylab = "Sales (in Thousands)", main = "Souvenir Sales")
axis(1,at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "June", "July", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2, at = seq(0,120000,20000), labels = format(seq(0, 120, 20)), las = 2)
lines(ValidTS)
lines(AdjustedForecast, col = "blue")
Given that this data is quarterly and appears to have a seasonal component, I would expect the largest coefficient to be that which represents the same period a year prior, in this case that would be a lag-4.
Appliances <- read.csv("ApplianceShipments.csv")
AppliancesTS <- ts(Appliances$Shipments, start = c(1985,1), end = c(1989,4), freq=4)
plot(AppliancesTS)
ApplianceACF <- Acf(AppliancesTS)
Just like in my response to the previous question, lag-4 autocorrelation has the largest coefficient.