Time series is data observed over time, and ARIMA is a model used to capture patterns using autoregression (AR), differencing (I), and moving average (MA). If data is not stationary, we apply differencing to stabilize it before modeling.
set.seed(123)
n <- 200
ar <- 0.7
ma <- -0.5
ts_arima <- arima.sim(
model = list(order = c(1,1,1), ar = ar, ma = ma),
n = n
)
ts.plot(ts_arima, main = "Simulated ARIMA(1,1,1)")
We simulate time series data using an ARIMA(1,1,1) process with predefined AR and MA parameters, ensuring reproducibility with a fixed seed.The plot shows a fluctuating time series with trends, indicating that the data is likely non-stationary. This suggests that differencing is required before fitting an ARIMA model.
The Augmented Dickey-Fuller (ADF) test checks whether the time series is stationary.
library(tseries)
## Warning: package 'tseries' was built under R version 4.5.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
adf.test(ts_arima)
##
## Augmented Dickey-Fuller Test
##
## data: ts_arima
## Dickey-Fuller = -2.449, Lag order = 5, p-value = 0.388
## alternative hypothesis: stationary
As we can see, the p-value is 0.388 so Accept H0 or the data is non-stationary, that’s why we need differencing.
Differencing removes trends and stabilizes the mean of the time series.
diff1 <- diff(ts_arima)
ts.plot(diff1, main = "After Differencing")
After differencing, the data fluctuate around a constant mean, which is
more stationary.
ACF: shows correlation with past values PACF: shows direct effect (without intermediates)
acf(diff1)
pacf(diff1)
The ACF plot shows a gradual decay with only a few significant lags, while the PACF plot has a strong spike at lag 1 and then quickly drops off. This pattern suggests that the differenced series can be modeled with a low-order AR and/or MA component, such as ARIMA(1,1,1).
library(forecast)
## Warning: package 'forecast' was built under R version 4.5.3
auto.arima(diff1)
## Series: diff1
## ARIMA(2,0,2) with zero mean
##
## Coefficients:
## ar1 ar2 ma1 ma2
## -0.1116 0.6336 0.3108 -0.6250
## s.e. 0.2175 0.1701 0.2294 0.2122
##
## sigma^2 = 0.8631: log likelihood = -267.28
## AIC=544.57 AICc=544.88 BIC=561.06
arima(diff1, order = c(1,1,1))
##
## Call:
## arima(x = diff1, order = c(1, 1, 1))
##
## Coefficients:
## ar1 ma1
## 0.1489 -1.0000
## s.e. 0.0706 0.0164
##
## sigma^2 estimated as 0.8926: log likelihood = -273.56, aic = 553.13
arima(diff1, order = c(1,1,3))
##
## Call:
## arima(x = diff1, order = c(1, 1, 3))
##
## Coefficients:
## ar1 ma1 ma2 ma3
## -0.8558 0.0334 -0.9642 -0.0692
## s.e. 0.0800 0.1017 0.0442 0.0772
##
## sigma^2 estimated as 0.8611: log likelihood = -270.25, aic = 550.49
arima(diff1, order = c(0,1,1))
##
## Call:
## arima(x = diff1, order = c(0, 1, 1))
##
## Coefficients:
## ma1
## -0.9294
## s.e. 0.1078
##
## sigma^2 estimated as 0.93: log likelihood = -276.14, aic = 556.28
arima(diff1, order = c(2,0,2))
##
## Call:
## arima(x = diff1, order = c(2, 0, 2))
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept
## -0.1100 0.6348 0.3091 -0.6265 -0.0214
## s.e. 0.2167 0.1695 0.2286 0.2114 0.0931
##
## sigma^2 estimated as 0.8456: log likelihood = -267.26, aic = 546.51
Among the tested models, ARIMA(2,0,2) has the lowest AIC (546.51), indicating it provides the best fit to the data compared to the others. This means that even though the data was generated from ARIMA(1,1,1), the optimal model based on the sample is different due to randomness and estimation variability.