Yes, they all indicate the data are white noise. Because the auto correlation are inside the threshold levels.
I think it is because of the number of samples. As the number of sample increases, the auto correlation gets better and correct. Because each lag would get more samples/observation and gets the acf better.
For e.g lets experiment with the retail data we had before.
retaildata <- readxl::read_excel('C:\\Users\\charls.joseph\\Documents\\Cuny\\Data624\\week1\\retail.xlsx', skip=1)
myts <- ts(retaildata[,"A3349397X"],
frequency=12, start=c(1982,4))
autoplot(myts) +
xlab('Year') +
ylab("Turn Over")
acf(myts[1:10])
acf(myts[1:100])
acf(myts[1:200])
acf(myts)
As you see, when the samples increases, the ACF gets better and could determine the trend and seasonailty pattern in the data.
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
ggtsdisplay(ibmclose)
ACF is significant in the beginning and it is gradually decreasing. This means it has a trend. Since it doesnt have scallowed shape, It doesnt have seasonaility. This indicate this is not stationary. Non-stationary timeseries shouldn’t have seasoned or trend component.
ACF shows the correlation between lagged data as well as it indicate whether the time series is stationary or not. If the ACF is with in the threshold level, then it is stationary. This need a differencing to make it stationary to apply AR or MA models.
In the other side, PACF shows the correlation between current value and the previous values. When you model the timeseries data as as linear relation ship, it can be written as
y(t) = b0 + b1 * y(t-1) + b2 * y(t-2)
It shows the correlation between y(t) and y(t-1) & y(t) and y(t-2) independently.
From PACF, i dont see a big correlation between y(t) and y(t-1), but there is a correlation. y(t) and y(t-2), there is no correlation. There are intermittent correlation as the lags goes on.
PACF is used to determine the order for AR model. ACF is used to determine the order of MA model.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
ggtsdisplay(usnetelec)
This time series has a linear upward trend, hence it is not stationary. In order to make it stationary, we can apply log or box-cox transformation to reduce the variance and apply differencing as well
lambda <- BoxCox.lambda(usnetelec)
usnetelec_bc <- BoxCox(usnetelec, lambda)
ggtsdisplay(diff(usnetelec_bc) )
You can see the time box cox and differening make the time series stationary.
ggtsdisplay(usgdp)
lambda <- BoxCox.lambda(usgdp)
usgdp_bc <- BoxCox(usgdp, lambda)
ggtsdisplay(diff(usgdp_bc) )
ACF and Pacf shows that there is a still slight correlation for the first lag. We will do a null hyposis test using kpss.
library(urca)
usgdp_bc %>% diff() %>% ur.kpss %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.2013
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
test stat is > .05 and we will do one more differencing.
ggtsdisplay(diff(diff(usgdp_bc) ))
usgdp_bc %>% diff() %>% diff() %>% ur.kpss %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.012
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
We see that p-value is less than 0.05 and the data becomes stationary with 2 types differencing.
ggtsdisplay(mcopper)
lambda <- BoxCox.lambda(mcopper)
mcopper_bc <- BoxCox(mcopper, lambda)
ggtsdisplay(diff(mcopper_bc) )
Looking at the orginal plot, this data has no variance in the beginning and constant varience in the middle and a sudden change in the variance towards the end.
There is an upward trend and it is a stationary. After applying transformation and differencing, the data looks to be a stationary. But we will run the hypothesis test and confirm.
mcopper_bc %>% diff() %>% ur.kpss %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0573
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The p-value is 0.05 and we can consider this as a stationary now.
ggtsdisplay(enplanements)
lambda <- BoxCox.lambda(enplanements)
enplanements_bc <- BoxCox(enplanements, lambda)
ggtsdisplay(diff(enplanements_bc) )
enplanements_bc %>% diff() %>% ur.kpss %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0151
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The p-value shows that the data became stationary after the tranformation and differecing.
ggtsdisplay(visitors)
lambda <- BoxCox.lambda(visitors)
visitors_bc <- BoxCox(visitors, lambda)
ggtsdisplay(diff(visitors_bc) )
visitors_bc %>% diff() %>% ur.kpss %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0519
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The p-value shows the data became stationary after the transformation and differenceing.
For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
The plot shows a changing variance and thus indicate a need of transformation. We will apply a tranformation and differencing
ggtsdisplay(myts)
lambda <- BoxCox.lambda(myts)
myts_bc <- BoxCox(myts, lambda)
ggtsdisplay(diff(myts_bc) )
myts_bc %>% diff() %>% ur.kpss %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0248
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The p-value shows the differencing made the data stationary.
Use R to simulate and plot some data from simple ARIMA models.
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + e[i]
}
fn1 <- function(ph1) {
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 2:100){
y[i] <- ph1*y[i-1] + e[i]
}
return(y)
}
autoplot(fn1(0.6)) +
geom_line(colour = "blue") +
autolayer(fn1(0.9)) +
autolayer(fn1(0.4))
When ϕ1 increases the variance increases.
ma_1 <- function(ph1) {
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 2:100){
y[i] <- ph1*e[i-1] + e[i]
}
return(y)
}
autoplot(ma_1(0.6)) +
geom_line(colour = "blue") +
autolayer(ma_1(0.9)) +
autolayer(ma_1(0.4))
arma_1 <- function(ph1, theta1) {
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 2:100){
y[i] <- ph1*y[i-1] + theta1*e[i-1] + e[i]
}
return(y)
}
data_arma_1 <- arma_1(0.6, 0.6)
autoplot(data_arma_1)
ar_2 <- function(ph1, ph2) {
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 3:100){
y[i] <- ph1*y[i-1] + ph2*y[i-2] + e[i]
}
return(y)
}
data_ar_2 <- ar_2(-0.8, 0.3)
autoplot(data_ar_2)
The AR(2) generated data are obviously non-stationary and it has seasonality as well. The ARMA(1,1) generated data does not have this apparent seasonality and it appears to be random and much more stationary than the AR(2) data.
Below is the ACF and PACF for ARMA(1,1) and AR(2)
ggtsdisplay(data_arma_1)
ggtsdisplay(data_ar_2)
Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q) model for these data.
ggtsdisplay(wmurders, main='women murdered each year in USA')
Looking at the plot, we see that there is trend, but there is no seasonality. Seeing a difference in the variance, so i will perform a box cox transformation and difference to make it stationary.
ggtsdisplay(wmurders)
lambda <- BoxCox.lambda(wmurders)
wmurders_bc <- BoxCox(wmurders, lambda)
ggtsdisplay(diff(wmurders_bc) )
wmurders_bc %>% diff() %>% ur.kpss %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.5466
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ggtsdisplay(wmurders)
lambda <- BoxCox.lambda(wmurders)
wmurders_bc <- BoxCox(wmurders, lambda)
ggtsdisplay(diff(wmurders_bc) )
wmurders_bc %>% diff() %>% diff() %>% ur.kpss %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.0532
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
wmurders_bc <- BoxCox(wmurders, lambda)
wmurders_final <- diff(diff(wmurders_bc))
ggtsdisplay(wmurders_final)
There is two significant spikes in the ACF and and one significant spike in the PACF plots. The appropriate ARIMA model can be ARIMA(1,2,2).
Since it is non seasonal and d = 2. So constand is not required.
(1−ϕ1B)(1−B)2yt=c+(1-θ1B -θ2B^2 )εt, where c will be 0 if no constant included.
(fit <- Arima(wmurders, order=c(1,2,2)))
## Series: wmurders
## ARIMA(1,2,2)
##
## Coefficients:
## ar1 ma1 ma2
## -0.7677 -0.2812 -0.4977
## s.e. 0.2350 0.2879 0.2762
##
## sigma^2 estimated as 0.04552: log likelihood=7.38
## AIC=-6.75 AICc=-5.92 BIC=1.13
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,2)
## Q* = 9.6215, df = 7, p-value = 0.2111
##
## Model df: 3. Total lags used: 10
The ACF that the residuals are within critical limits. And the residuals histogram shows that the residuals are normally distributed and model is good to go.
forecast(fit, h=3)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.534015 2.260584 2.807446 2.115838 2.952192
## 2006 2.404157 2.026793 2.781521 1.827029 2.981286
## 2007 2.331482 1.829669 2.833296 1.564025 3.098940
autoplot(forecast(fit, h=3))