What is Time Series & ARIMA

Time series is data observed over time, and ARIMA is a model used to capture patterns using autoregression (AR), differencing (I), and moving average (MA). If data is not stationary, we apply differencing to stabilize it before modeling.

set.seed(123)

n <- 200
ar <- 0.7
ma <- -0.5

ts_arima <- arima.sim(
  model = list(order = c(1,1,1), ar = ar, ma = ma),
  n = n
)

ts.plot(ts_arima, main = "Simulated ARIMA(1,1,1)")

We simulate time series data using an ARIMA(1,1,1) process with predefined AR and MA parameters, ensuring reproducibility with a fixed seed.The plot shows a fluctuating time series with trends, indicating that the data is likely non-stationary. This suggests that differencing is required before fitting an ARIMA model.

Using tseries package

The Augmented Dickey-Fuller (ADF) test checks whether the time series is stationary.

library(tseries)
## Warning: package 'tseries' was built under R version 4.5.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
adf.test(ts_arima)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_arima
## Dickey-Fuller = -2.449, Lag order = 5, p-value = 0.388
## alternative hypothesis: stationary

As we can see, the p-value is 0.388 so Accept H0 or the data is non-stationary, that’s why we need differencing.

Differencing

Differencing removes trends and stabilizes the mean of the time series.

diff1 <- diff(ts_arima)

ts.plot(diff1, main = "After Differencing")

After differencing, the data fluctuate around a constant mean, which is more stationary.

ACF & PACF

ACF: shows correlation with past values PACF: shows direct effect (without intermediates)

acf(diff1)

pacf(diff1)

The ACF plot shows a gradual decay with only a few significant lags, while the PACF plot has a strong spike at lag 1 and then quickly drops off. This pattern suggests that the differenced series can be modeled with a low-order AR and/or MA component, such as ARIMA(1,1,1).

Modeling

library(forecast)
## Warning: package 'forecast' was built under R version 4.5.3
auto.arima(diff1)
## Series: diff1 
## ARIMA(2,0,2) with zero mean 
## 
## Coefficients:
##           ar1     ar2     ma1      ma2
##       -0.1116  0.6336  0.3108  -0.6250
## s.e.   0.2175  0.1701  0.2294   0.2122
## 
## sigma^2 = 0.8631:  log likelihood = -267.28
## AIC=544.57   AICc=544.88   BIC=561.06

Comparison

arima(diff1, order = c(1,1,1))
## 
## Call:
## arima(x = diff1, order = c(1, 1, 1))
## 
## Coefficients:
##          ar1      ma1
##       0.1489  -1.0000
## s.e.  0.0706   0.0164
## 
## sigma^2 estimated as 0.8926:  log likelihood = -273.56,  aic = 553.13
arima(diff1, order = c(1,1,3))
## 
## Call:
## arima(x = diff1, order = c(1, 1, 3))
## 
## Coefficients:
##           ar1     ma1      ma2      ma3
##       -0.8558  0.0334  -0.9642  -0.0692
## s.e.   0.0800  0.1017   0.0442   0.0772
## 
## sigma^2 estimated as 0.8611:  log likelihood = -270.25,  aic = 550.49
arima(diff1, order = c(0,1,1))
## 
## Call:
## arima(x = diff1, order = c(0, 1, 1))
## 
## Coefficients:
##           ma1
##       -0.9294
## s.e.   0.1078
## 
## sigma^2 estimated as 0.93:  log likelihood = -276.14,  aic = 556.28
arima(diff1, order = c(2,0,2))
## 
## Call:
## arima(x = diff1, order = c(2, 0, 2))
## 
## Coefficients:
##           ar1     ar2     ma1      ma2  intercept
##       -0.1100  0.6348  0.3091  -0.6265    -0.0214
## s.e.   0.2167  0.1695  0.2286   0.2114     0.0931
## 
## sigma^2 estimated as 0.8456:  log likelihood = -267.26,  aic = 546.51

Among the tested models, ARIMA(2,0,2) has the lowest AIC (546.51), indicating it provides the best fit to the data compared to the others. This means that even though the data was generated from ARIMA(1,1,1), the optimal model based on the sample is different due to randomness and estimation variability.