Question 2

library(readr)

WalMart_Data <- read_csv('WalMartStock.csv')

WalMart_TS <- ts(WalMart_Data$Close)

library(forecast)
library(ggplot2)

autoplot(WalMart_TS, 
         main = 'Wal-Mart Stock Closing Price',
         xlab = 'Days Since February 1, 2001',
         ylab = 'Closing Price')

A)

autoplot(diff(WalMart_TS, lag = 1), 
         main = 'Wal-Mart Closing Price: Lag-1 Differenced Series',
         xlab = 'Days Since February 1, 2001',
         ylab = 'Lag-1 Difference in Closing Prices')


B)

The following are relevent for testing whether the time-series of Wal-Mart stock closing price is a random walk:

  • The AR(1) slope coefficient for the closing price series
    • Should be 1 for a random walk
  • The autocorrelations of the differenced series
    • Should be 0 for a random walk


C)

WalMart_Arima1 <- arima(WalMart_TS,
                        order = c(1,0,0))

summary(WalMart_Arima1)
## 
## Call:
## arima(x = WalMart_TS, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.9558    52.9497
## s.e.  0.0187     1.3280
## 
## sigma^2 estimated as 0.9735:  log likelihood = -349.8,  aic = 705.59
## 
## Training set error measures:
##                        ME      RMSE       MAE         MPE     MAPE
## Training set -0.005900455 0.9866824 0.7687247 -0.04870259 1.483133
##                   MASE        ACF1
## Training set 0.9799494 -0.02979752

In order to test whether or not the series is a random walk, I will test whether or not the slope coefficient is significantly (at a level of 0.05) different from 1. To do this I will use a two-sided t-test:

# Calculate degrees of freedome for t-test (n - 2 since 
# two parameters, intercept and ar1, in arima model)

degrees_f <- length(WalMart_TS) - 2

t_score <- abs(1 - as.numeric(WalMart_Arima1$coef[1])) / 0.0187

p_value <- 2*pt(t_score, degrees_f, lower.tail = FALSE)

paste("The p-value of the two-tailed test is:", p_value)

[1] “The p-value of the two-tailed test is: 0.0189657850157724”

The p-value of ~0.019 indicates that, at a signifcance level of 0.05, we should reject the null hypothesis that the slope of the AR(1) model is equal to zero, and conclude that it is not equal to zero. This indicates that the series may not be a random walk.


D)

A series being a random walk implies that the changes from one period to another are random.



Question 3

SouvSales <- read_csv('SouvenirSales.csv')
SouvSales_TS <- ts(SouvSales$Sales, start(1995, 1), frequency = 12)

autoplot(SouvSales_TS,
         main = 'Monthly Souvenir Sales',
         ylab = 'Monthle Sales',
         xlab = 'Time')


A)

nValid <- 12
nTrain <- length(SouvSales_TS) - nValid

trainSS_TS <- window(SouvSales_TS, end(1995, nTrain))
validSS_TS <- window(SouvSales_TS, start(1995, nTrain + 1))

LogSales_RM <- tslm(trainSS_TS ~ trend + season, lambda = 0)

LS_RM_pred <- forecast(LogSales_RM, h = nValid)

paste('The forecasted souvenir sales for Feb 2002 is:', LS_RM_pred$mean[2])
## [1] "The forecasted souvenir sales for Feb 2002 is: 17724.4502482407"


B)

Errors <- trainSS_TS - LogSales_RM$fitted.values

Acf(Errors, lag.max = 15, main ='Autocorrelation Function of Forecast Errors')

Res_arima2 <- arima(Errors, order = c(2,0,0))


i)

summary(Res_arima2)

# Test for significance of coefficients, alpha level 0.05

# Calculate degrees of freedome for t-test (n - 3 since 
# three parameters in the arima model)

degrees_f <- length(trainSS_TS) - 3

t_score <- abs(as.numeric(Res_arima2$coef[1])) / 0.1122

p_value <- 2*pt(t_score, degrees_f, lower.tail = FALSE)

paste("The p-value of the lag-1 autocorrelation coefficient is: ", p_value)

t_score <- abs(as.numeric(Res_arima2$coef[2])) / 0.1129

p_value <- 2*pt(t_score, degrees_f, lower.tail = FALSE)

paste("The p-value of the lag-2 autocorrelation coefficient is: ", p_value)
## 
## Call:
## arima(x = Errors, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.4135  0.2662   587.4277
## s.e.  0.1122  0.1129   854.0388
## 
## sigma^2 estimated as 6580219:  log likelihood = -778.84,  aic = 1565.68
## 
## Training set error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 15.03414 2565.194 1705.687 125.8561 281.5432 0.8876781
##                     ACF1
## Training set -0.04558883
## [1] "The p-value of the lag-1 autocorrelation coefficient is:  0.000411688281892685"
## [1] "The p-value of the lag-2 autocorrelation coefficient is:  0.0207903727678758"

The ACF plot shows that there is some correlation between close periods and both of the autocorrelation coefficients from the AR(2) model of the residuals are significant (significantly non-zero) at a level of 5%. However the ACF plot of the residuals shows that there is some lag-12 autocorrelation, indicating that not all the seasonality was well captured. Despite this, the ACF plot of the residuals of the AR(2) model of the residuals shows no correlation (not even at lag-12), indicating we’ve extracted as much usable information from the residuals as possible.

Acf(Res_arima2$residuals, lag.max = 15)


ii)

Res_pred <- forecast(Res_arima2, h = nValid)

Jan <- LS_RM_pred$mean[1] + Res_pred$mean[1]

paste('The forcasted souvenir sales for January 2002 is:', Jan)

[1] “The forcasted souvenir sales for January 2002 is: 19298.426279956”



Question 4

AppShip <- read_csv('ApplianceShipments.csv')
AppShip_TS <- ts(AppShip$Shipments, start(1985, 1), frequency = 4)

autoplot(AppShip_TS,
         main = 'Quarterly Appliance Shipments',
         ylab = 'Shipments',
         xlab = 'Time')


A)

I think the lag-4 would have the highest autocorrelation because there appears to be strong quarterly seasonality.


B)

Acf(AppShip_TS)

I expected the autocorrelations to be larger, the ACF plot shows that none of them are statistically significant, but as expected the lag-4 correlation is the largest.