library(fpp2)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────── fpp2 2.4 ──
## ✓ ggplot2 3.3.2 ✓ fma 2.4
## ✓ forecast 8.13 ✓ expsmooth 2.3
##
library(gridExtra)
library(ggplot2)
Answer: All the spikes are within the bands indicating that there is no auto-correlation at any lags.
Answer: The higher the sample size the lower the critical values. The auto-correlations are different as they refer to different simulations of white noise. Sometimes by chance there will be more auto-correlation at different lags, not all white noises draws are the same. What matters though is that there are no significant auto-correlations.
ggtsdisplay(ibmclose)
Answer: The price plot shows some upward and downward trend hence it is not stationary.
The ACF shows significant auto-correlations at all lags , hence data comes from a non stationary process. The PACF of almost 1 on lag 1 implies a presence of auto-regressive term AR(1). Presence of AR terms means the process is not stationary.
ggtsdisplay(diff(ibmclose))
Answer: The differenced series is stationary as the autocorrelations (for most part) fall within the critical regions.
Box.test(usnetelec, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usnetelec
## X-squared = 52.549, df = 1, p-value = 4.196e-13
lambda <- BoxCox.lambda(usnetelec)
usnetelec_withBoxCox <- BoxCox(usnetelec,lambda)
ggtsdisplay(usnetelec_withBoxCox)
print(c("lamba", lambda))
## [1] "lamba" "0.516771443964645"
ndiffs(usnetelec_withBoxCox)
## [1] 2
usnetelec_withBoxCox %>% diff() %>% diff() -> usnetelec_withBoxCox_2diff
ggtsdisplay(usnetelec_withBoxCox_2diff)
Box.test(usnetelec_withBoxCox_2diff, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usnetelec_withBoxCox_2diff
## X-squared = 8.3324, df = 1, p-value = 0.003895
Answer: Despite ndiffs suggesting differencing the BoxCox transformed data twice to render it stationary, the twice difference data is not stationary (marginally failing stationarty).
Box.test(usgdp, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usgdp
## X-squared = 233.07, df = 1, p-value < 2.2e-16
lambda <- BoxCox.lambda(usgdp)
usgdp_withBoxCox <- BoxCox(usgdp,lambda)
ggtsdisplay(usgdp_withBoxCox)
print(c("lamba", lambda))
## [1] "lamba" "0.366352049520934"
ndiffs(usgdp_withBoxCox)
## [1] 1
nsdiffs(usgdp_withBoxCox)
## [1] 0
usgdp_withBoxCox %>% diff() -> usgdp_withBoxCox_1diff
ggtsdisplay(usgdp_withBoxCox_1diff)
Box.test(usgdp_withBoxCox_1diff, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usgdp_withBoxCox_1diff
## X-squared = 23.874, df = 1, p-value = 1.029e-06
library(urca)
usgdp_withBoxCox_1diff %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.2013
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Answer: The ndiff function tells us that we need to difference data once , which we have done. Despite this the Ljung-Box statistic p-value is still very low (H_0 : data comes from a time series which is auto-correlated) which implies that the auto-correlations remain even after differencing. However the KPSS t-statistic is lower than the critical value at 1% indicating the once differenced time series is indeed stationary. Interesting that presence of auto-correlation does not us from having a stationary process.
Box.test(mcopper, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: mcopper
## X-squared = 539.54, df = 1, p-value < 2.2e-16
lambda <- BoxCox.lambda(mcopper)
mcopper_withBoxCox <- BoxCox(mcopper,lambda)
ggtsdisplay(mcopper_withBoxCox)
print(c("lamba", lambda))
## [1] "lamba" "0.191904709003829"
ndiffs(mcopper_withBoxCox)
## [1] 1
nsdiffs(mcopper_withBoxCox)
## [1] 0
mcopper_withBoxCox %>% diff() -> mcopper_withBoxCox_1diff
ggtsdisplay(mcopper_withBoxCox_1diff)
Box.test(mcopper_withBoxCox_1diff, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: mcopper_withBoxCox_1diff
## X-squared = 57.517, df = 1, p-value = 3.353e-14
library(urca)
mcopper_withBoxCox_1diff %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0573
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Answer: The ndiff function tells us that we need to difference data once , which we have done. Despite this the Ljung-Box statistic p-value is still very low (H_0 : data comes from a time series which is auto-correlated) which implies that the auto-correlations remain even after differencing. However the KPSS t-statistic is lower than the critical value at 1% indicating the once differenced time series is indeed stationary.
Box.test(enplanements, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: enplanements
## X-squared = 249.22, df = 1, p-value < 2.2e-16
lambda <- BoxCox.lambda(enplanements)
enplanements_withBoxCox <- BoxCox(enplanements,lambda)
ggtsdisplay(enplanements_withBoxCox)
print(c("lamba", lambda))
## [1] "lamba" "-0.226946111237065"
ndiffs(enplanements_withBoxCox)
## [1] 1
nsdiffs(enplanements_withBoxCox)
## [1] 1
enplanements_withBoxCox_seasondiff <- diff(enplanements_withBoxCox,lag=3)
enplanements_withBoxCox_seasondiff %>% diff() -> enplanements_withBoxCox_seasondiff_1diff
ggtsdisplay(enplanements_withBoxCox_seasondiff_1diff)
Box.test(enplanements_withBoxCox_seasondiff_1diff, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: enplanements_withBoxCox_seasondiff_1diff
## X-squared = 31.047, df = 1, p-value = 2.519e-08
library(urca)
enplanements_withBoxCox_seasondiff_1diff %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0115
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Answer: the ndiffs and nsdiffs function tells us we need a seasonal and non-seasonal differencing. We first seasonally difference and then non seasonally difference. The series post these two transformations is stationary (KPSS test statistic is lower than the critical value , hence we cannot reject the null that the data is stationary. H_0 for KPSS is that data are stationary)
Box.test(visitors, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: visitors
## X-squared = 195.64, df = 1, p-value < 2.2e-16
lambda <- BoxCox.lambda(visitors)
visitors_withBoxCox <- BoxCox(visitors,lambda)
ggtsdisplay(visitors_withBoxCox)
print(c("lamba", lambda))
## [1] "lamba" "0.277524910835111"
ndiffs(visitors_withBoxCox)
## [1] 1
nsdiffs(visitors_withBoxCox)
## [1] 1
visitors_withBoxCox_seasondiff <- diff(visitors_withBoxCox,lag=3)
visitors_withBoxCox_seasondiff %>% diff(lag=1) -> visitors_withBoxCox_seasondiff_1diff
ggtsdisplay(visitors_withBoxCox_seasondiff_1diff)
Box.test(visitors_withBoxCox_seasondiff_1diff, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: visitors_withBoxCox_seasondiff_1diff
## X-squared = 25.08, df = 1, p-value = 5.499e-07
library(urca)
visitors_withBoxCox_seasondiff_1diff %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0092
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Answer: the ndiffs and nsdiffs function tells us we need a seasonal and non-seasonal differencing. We first seasonally difference (at lag=3) and then non seasonally difference. The series post these two transformations is stationary (KPSS test statistic is lower than the critical value , hence we cannot reject the null that the data is stationary. H_0 for KPSS is that data are stationary)
setwd("/Users/stepniak/624")
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349335T"],
frequency=12, start=c(1982,4))
series <- myts
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
ggtsdisplay(y)
Answer: The higher the value of \(\phi_1\) the more the process resembles white noise, the higher the value of \(\phi_1\) the more the process resembles a random walk.
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
ggtsdisplay(y)
Answer: As the value of $_1 = 0 $ goes down the time series resembles white noise. Increasing the value of $_1 = 1 $ yields a process with auto-correlation at lag = 1. For this case the value of the process at any time is just the error in this time period plus an error in the previous time period hence high auto-correlation observed (at lag = 1).
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1]+0.6*e[i-1] + e[i]
ggtsdisplay(y)
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
y[i] <- -0.8*y[i-1]+0.3*y[i-2]+ e[i]
ggtsdisplay(y)
y %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0728
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ndiffs(y)
## [1] 0
Answer: The ARMA(1,1) process looks stationary and hence does not require differencing to make it into one. The AR(2) process, on the other hand slowly but firmly becomes unstable and is unbound. It is interning to note that despite the process being non-stationary (as per exercise note) when ur.kpss() is run the we cannot reject the null that the data is stationary with H_0 for KPSS is that data are stationary. Hence we conclude that the data is stationary. This inconsistency is possibly due to the fact that the mu in the KPSS test is at lag = 4, possibly the test does not extend to the part of time series where the non-stationarity kicks in.
ggtsdisplay(wmurders)
ndiffs(wmurders)
## [1] 2
wmurders %>% diff() %>% diff() -> wmurders_2diff
ggtsdisplay(wmurders_2diff)
Answer: The twice differenced series (as per ndiff() function) indicates some auto-correlation at lag 1 & 2, with PACF cut-off at lag 1. Therefore a possible candidate for ARIMA would be (1,2,1) or (2,2,1).
While the data appears to have a cycle, including a constant would impose a drift. The drift is not visible in the undifferenced data , hence it is not needed.
Answer: \((1-\phi_1 B) (1-B)^2 y_t=(1+ \theta_1 B) \epsilon_t\)
auto.arima(wmurders, approximation = FALSE)
## Series: wmurders
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
wmurdsers_fit <- auto.arima(wmurders, approximation = FALSE)
plot(wmurdsers_fit)
checkresiduals(wmurdsers_fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
##
## Model df: 2. Total lags used: 10
Answer: Residuals are stationary (as per per the plot of fit function, unit roots of the characteristic equation fall inside the unit circle) and there is no auto-correlation present as per the ACF plot and Ljung-Box test (where we fail to reject the null that the data is stationary). Hence given no serial correlation and stationarity of the errors we are comfortable that the model selected in adequate.
Answer:
wmurders_forecast <- forecast(wmurdsers_fit, h = 3)
wmurders_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.470660 2.194836 2.746484 2.048824 2.892496
## 2006 2.363106 1.986351 2.739862 1.786908 2.939304
## 2007 2.252833 1.765391 2.740276 1.507354 2.998313
autoplot(wmurders_forecast)
Answer:Both auto-arima and manual selection gave the same result.