a) Create a time plot of the differenced series
library(forecast)
library(ggplot2)
Walmart= read.csv("Walmart.csv", stringsAsFactors = FALSE)
Walmart.ts <- ts(Walmart$Close, start=c(2001, 1), frequency=225)
yrange= range(Walmart.ts)
plot(Walmart.ts, ylab="Stock Prices($)", xlab="Year", bty="l", main="WalMart Stock Prices ($)")
#Plot differenced series
plot(diff(Walmart.ts, lag=1), ylab="Lag-1", xlab="Year", bty="l", main="Lag-1 Difference")
b) Which of the following is/ are relevant for testing whether this stock is a random walk?
The autocorrelations of the closing price series
The autocorrelations of the differenced series
The AR(1) constant coefficient for the differenced series
c) Recreate the AR(1) model output for the Close price series shown in the left of table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.
fit = Arima(Walmart.ts, order= c(1,0,0))
fit
## Series: Walmart.ts
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.9558 52.9497
## s.e. 0.0187 1.3280
##
## sigma^2 estimated as 0.9815: log likelihood=-349.8
## AIC=705.59 AICc=705.69 BIC=716.13
d) What are the implications of finding that a times series is a random walk? Choose the correct statement(S) below:
The series is random
The changes in the series from one period to the other are random
souvenir = read.csv("SouvenirSales.csv", stringsAsFactors = FALSE)
a) Run a regression model with log(Sales) as the output variable with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.
souvenir.ts <- ts(souvenir$Sales, start=c(1995, 1), frequency=12)
valid_length <- 12
train_length <- length(souvenir.ts) - valid_length
souvenir_train <- window(souvenir.ts, end=c(1995, train_length))
souvenir_valid <- window(souvenir.ts, start=c(1995,train_length+1))
# Fit the model and see what it looks like
souvenir_linear_season <- tslm(log(souvenir_train) ~ trend + season)
summary(souvenir_linear_season)
##
## Call:
## tslm(formula = log(souvenir_train) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9306
## F-statistic: 80.4 on 12 and 59 DF, p-value: < 2.2e-16
feb_forecast <- souvenir_linear_season$coefficients["(Intercept)"] + souvenir_linear_season$coefficients["trend"]*86
exp(feb_forecast)
## (Intercept)
## 12869.98
b) Create and ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2 to the forecast errors.
residuals <- Acf(souvenir_linear_season$residuals, lag.max=15)
residuals
##
## Autocorrelations of series 'souvenir_linear_season$residuals', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 1.000 0.459 0.485 0.194 0.088 0.154 0.016 0.030 0.106 0.034
## 10 11 12 13 14 15
## 0.152 -0.055 -0.012 -0.047 -0.077 -0.023
ar_model = Arima(souvenir_linear_season$residuals, order = c(2, 0, 0))
ar_model
## Series: souvenir_linear_season$residuals
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.3072 0.3687 -0.0025
## s.e. 0.1090 0.1102 0.0489
##
## sigma^2 estimated as 0.0205: log likelihood=39.03
## AIC=-70.05 AICc=-69.46 BIC=-60.95
ar_model$coef["ar2"] / sqrt(diag(vcov(ar_model)))["ar2"]
## ar2
## 3.346371
2*pnorm(-abs(ar_model$coef["ar2"] / sqrt(diag(vcov(ar_model)))["ar1"]))
## ar2
## 0.0007139238
I. Examining the ACF plot and the estimated coefficients of the AR(2) model, what can we learn about the regression forecasts?
The t-statistic is fairly high, indicating it’s statistically significant, and the p-value is low, meaning that’s also significant.
II. Use the autocorrelation information to compute a forecast for January 2002, using the regression model and the AR(2) model above.
january forecast:
jan_forecast = forecast(souvenir_linear_season, h= valid_length)
jan_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 9.188097 8.917220 9.458974 8.769890 9.606304
## Feb 2001 9.491232 9.220354 9.762109 9.073024 9.909439
## Mar 2001 9.925335 9.654457 10.196212 9.507127 10.343542
## Apr 2001 9.625329 9.354452 9.896207 9.207122 10.043537
## May 2001 9.694286 9.423408 9.965163 9.276078 10.112493
## Jun 2001 9.740741 9.469864 10.011619 9.322534 10.158949
## Jul 2001 9.898195 9.627318 10.169072 9.479988 10.316402
## Aug 2001 9.882831 9.611954 10.153708 9.464624 10.301038
## Sep 2001 9.992619 9.721742 10.263496 9.574412 10.410826
## Oct 2001 10.107664 9.836787 10.378542 9.689457 10.525872
## Nov 2001 10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001 11.372615 11.101738 11.643493 10.954408 11.790823
forecast for error terms:
error = forecast(ar_model, h= valid_length)
error
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001 0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001 0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001 0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001 0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001 0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001 0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001 0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001 0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001 0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001 0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001 0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641
combining the two forecasts:
combined_forecast = jan_forecast$mean + error$mean
exp(combined_forecast)
## Jan Feb Mar Apr May Jun Jul
## 2001 10894.13 14614.70 21907.40 16028.61 16923.47 17567.92 20396.07
## Aug Sep Oct Nov Dec
## 2001 19967.68 22177.61 24791.11 40454.26 87383.49
a) If we compute the autocorrelation of the series, which lag (>0) is most likely to have the largest coefficient (in abs value)?
A lag of 4, as the data has quarterly seasonality.
b) Create an AF plot and compare it with your answer
lag 4 clearly has the largest coefficient
appliance = read.csv("Appliance Shipment.csv", stringsAsFactors = FALSE)
appliance.ts = ts(appliance$Shipments, start = c(1985,1), end= c(1989,4), frequency = 4)
appliance_acf = Acf(appliance.ts)