Question 2

a) Create a time plot of the differenced series

library(forecast)
library(ggplot2)

Walmart= read.csv("Walmart.csv", stringsAsFactors = FALSE)
Walmart.ts <- ts(Walmart$Close, start=c(2001, 1), frequency=225)

yrange= range(Walmart.ts)

plot(Walmart.ts, ylab="Stock Prices($)", xlab="Year", bty="l", main="WalMart Stock Prices ($)")

#Plot differenced series
plot(diff(Walmart.ts, lag=1), ylab="Lag-1", xlab="Year", bty="l", main="Lag-1 Difference")

b) Which of the following is/ are relevant for testing whether this stock is a random walk?

The autocorrelations of the closing price series
The autocorrelations of the differenced series
The AR(1) constant coefficient for the differenced series

c) Recreate the AR(1) model output for the Close price series shown in the left of table 7.4. Does the AR model indicate that this is a random walk? Explain how you reached your conclusion.

fit = Arima(Walmart.ts, order= c(1,0,0))
fit
## Series: Walmart.ts 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.9558  52.9497
## s.e.  0.0187   1.3280
## 
## sigma^2 estimated as 0.9815:  log likelihood=-349.8
## AIC=705.59   AICc=705.69   BIC=716.13

d) What are the implications of finding that a times series is a random walk? Choose the correct statement(S) below:

The series is random
The changes in the series from one period to the other are random

Question 3

souvenir = read.csv("SouvenirSales.csv", stringsAsFactors = FALSE)

a) Run a regression model with log(Sales) as the output variable with a linear trend and monthly predictors. Remember to fit only the training period. Use this model to forecast the sales in February 2002.

souvenir.ts <- ts(souvenir$Sales, start=c(1995, 1), frequency=12)

valid_length <- 12
train_length <- length(souvenir.ts) - valid_length

souvenir_train <- window(souvenir.ts, end=c(1995, train_length))
souvenir_valid <- window(souvenir.ts, start=c(1995,train_length+1))

# Fit the model and see what it looks like
souvenir_linear_season <- tslm(log(souvenir_train) ~ trend + season)
summary(souvenir_linear_season)
## 
## Call:
## tslm(formula = log(souvenir_train) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16
feb_forecast <- souvenir_linear_season$coefficients["(Intercept)"] + souvenir_linear_season$coefficients["trend"]*86
exp(feb_forecast)
## (Intercept) 
##    12869.98

b) Create and ACF plot until lag-15 for the forecast errors. Now fit an AR model with lag-2 to the forecast errors.

residuals <- Acf(souvenir_linear_season$residuals, lag.max=15)

residuals
## 
## Autocorrelations of series 'souvenir_linear_season$residuals', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.459  0.485  0.194  0.088  0.154  0.016  0.030  0.106  0.034 
##     10     11     12     13     14     15 
##  0.152 -0.055 -0.012 -0.047 -0.077 -0.023
ar_model = Arima(souvenir_linear_season$residuals, order = c(2, 0, 0))
ar_model
## Series: souvenir_linear_season$residuals 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.3072  0.3687  -0.0025
## s.e.  0.1090  0.1102   0.0489
## 
## sigma^2 estimated as 0.0205:  log likelihood=39.03
## AIC=-70.05   AICc=-69.46   BIC=-60.95
ar_model$coef["ar2"] / sqrt(diag(vcov(ar_model)))["ar2"]
##      ar2 
## 3.346371
2*pnorm(-abs(ar_model$coef["ar2"] / sqrt(diag(vcov(ar_model)))["ar1"]))
##          ar2 
## 0.0007139238
I. Examining the ACF plot and the estimated coefficients of the AR(2) model, what can we learn about the regression forecasts?

The t-statistic is fairly high, indicating it’s statistically significant, and the p-value is low, meaning that’s also significant.

II. Use the autocorrelation information to compute a forecast for January 2002, using the regression model and the AR(2) model above. 

january forecast:

jan_forecast = forecast(souvenir_linear_season, h= valid_length)
jan_forecast
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9.188097  8.917220  9.458974  8.769890  9.606304
## Feb 2001       9.491232  9.220354  9.762109  9.073024  9.909439
## Mar 2001       9.925335  9.654457 10.196212  9.507127 10.343542
## Apr 2001       9.625329  9.354452  9.896207  9.207122 10.043537
## May 2001       9.694286  9.423408  9.965163  9.276078 10.112493
## Jun 2001       9.740741  9.469864 10.011619  9.322534 10.158949
## Jul 2001       9.898195  9.627318 10.169072  9.479988 10.316402
## Aug 2001       9.882831  9.611954 10.153708  9.464624 10.301038
## Sep 2001       9.992619  9.721742 10.263496  9.574412 10.410826
## Oct 2001      10.107664  9.836787 10.378542  9.689457 10.525872
## Nov 2001      10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001      11.372615 11.101738 11.643493 10.954408 11.790823

forecast for error terms:

error = forecast(ar_model, h= valid_length)
error
##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Jan 2001    0.107882119 -0.07561892 0.2913832 -0.1727585 0.3885227
## Feb 2001    0.098551352 -0.09341395 0.2905167 -0.1950342 0.3921370
## Mar 2001    0.069245043 -0.14069093 0.2791810 -0.2518243 0.3903144
## Apr 2001    0.056801003 -0.15830920 0.2719112 -0.2721817 0.3857837
## May 2001    0.042171322 -0.17774923 0.2620919 -0.2941681 0.3785108
## Jun 2001    0.033088136 -0.18905521 0.2552315 -0.3066508 0.3728271
## Jul 2001    0.024902959 -0.19881529 0.2486212 -0.3172446 0.3670505
## Aug 2001    0.019038932 -0.20554500 0.2436229 -0.3244325 0.3625104
## Sep 2001    0.014219137 -0.21092154 0.2393598 -0.3301038 0.3585421
## Oct 2001    0.010576068 -0.21489090 0.2360430 -0.3342459 0.3553980
## Nov 2001    0.007679567 -0.21798999 0.2333491 -0.3374522 0.3528114
## Dec 2001    0.005446339 -0.22034478 0.2312375 -0.3398714 0.3507641

combining the two forecasts:

combined_forecast = jan_forecast$mean + error$mean

exp(combined_forecast)
##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 2001 10894.13 14614.70 21907.40 16028.61 16923.47 17567.92 20396.07
##           Aug      Sep      Oct      Nov      Dec
## 2001 19967.68 22177.61 24791.11 40454.26 87383.49

Question 4

a) If we compute the autocorrelation of the series, which lag (>0) is most likely to have the largest coefficient (in abs value)?

A lag of 4, as the data has quarterly seasonality.

b) Create an AF plot and compare it with your answer

lag 4 clearly has the largest coefficient

appliance = read.csv("Appliance Shipment.csv", stringsAsFactors = FALSE)

appliance.ts = ts(appliance$Shipments, start = c(1985,1), end= c(1989,4), frequency = 4)

appliance_acf = Acf(appliance.ts)