ARIMA is a model mostly used in predicting the behaviour of a variable based on historical data. The model has both the 1) Autoregressive part and 2) Moving Averages part
1) Autoregressive component implies the relationship between a variable and its previous lags
2) Moving Average component implies relationship between current deviation from the mean and the previous deviations
ARIMA requires that the variable is stationary, i.e. reverses to the mean without trending, and thus we find in most cases that a differencing is required to make the series stationary. Other goodness of fitness test such as autocorrelation, normality of data, constant variance and prediction error using MAPA/RMSE can be done after the model is fitted.
ARIMA(p,d,q) has p lags for the autoregressive part, d the number of differences needed to make the series stationary and q number of lags for the moving average component.
In this study, we will predict bitcoin prices for three consequitive days based on historical data from 28th June-29th August 2020.
We first load necessary packages and the data.
library(kableExtra)
library(tidyverse)
library(tseries)
library(forecast)
library(ggplot2)
setwd("D:/Arima")
data<-read.csv("bitcoin2.csv",header=T,sep=",")
attach(data)
price.ts<-ts(data$price, frequency=7)
We observed lack of stationarity from the first series, we however need to confirm whether indeed the series is non-stationary and in this case use ADF test. From the test below whose null hypothesis is a non-stationary random walk process, we confirm that the series is non-stationary and we thus need to take the first difference to make the series stationary.
adf.test(price.ts)
##
## Augmented Dickey-Fuller Test
##
## data: price.ts
## Dickey-Fuller = -0.41933, Lag order = 3, p-value = 0.9796
## alternative hypothesis: stationary
After taking the first difference the second plot indicate presence of statiority in our data and this is confirmed by ADF test. In this case our model will be of the form ARIMA(p,1,q). We need to identify p and q using autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.
price<-diff(price.ts)
plot(price)
adf.test(price)
##
## Augmented Dickey-Fuller Test
##
## data: price
## Dickey-Fuller = -3.3728, Lag order = 3, p-value = 0.07855
## alternative hypothesis: stationary
PACF plot below shows a significant spike on first order and thus we have an AR(1) process. On the other hand, the ACF plot shows a significant spike on order of two and this indicate a moving average process of order two MA(2). However, by rule of parsimony we shall select the best model from the following: ARIMA(1,1,0); ARIMA(1,1,1) and ARIMA(1,1,2)
acf(price)
pacf(price)
We will select the best model using Akaike Information Criterion (AIC), whereby we identify the model with the smallest AIC value. According to the results, model3 i.e. ARIMA(1,1,2) is the best model.
model1<-arima(price,order=c(1,1,0))
model2<-arima(price,order=c(1,1,1))
model3<-arima(price,order=c(1,1,2))
print(model1);print(model2);print(model3)
##
## Call:
## arima(x = price, order = c(1, 1, 0))
##
## Coefficients:
## ar1
## -0.6030
## s.e. 0.1356
##
## sigma^2 estimated as 61.34: log likelihood = -111.49, aic = 226.99
##
## Call:
## arima(x = price, order = c(1, 1, 1))
##
## Coefficients:
## ar1 ma1
## -0.3303 -0.9585
## s.e. 0.1684 0.1511
##
## sigma^2 estimated as 33.18: log likelihood = -102.99, aic = 211.99
##
## Call:
## arima(x = price, order = c(1, 1, 2))
##
## Coefficients:
## ar1 ma1 ma2
## 0.2289 -1.7563 1.0000
## s.e. 0.1863 0.1640 0.1813
##
## sigma^2 estimated as 26.78: log likelihood = -101.19, aic = 210.38
plot(forecast(Arima(y = price.ts, order = c(1,1,2))) )
fit<-predict(arima(price.ts,order=c(1,1,2)), n.ahead = 5)
y<-fit$pred
fit1<-data.frame(y)
We will now predict the price of Bitcoins in USD for two consequtive days i.e. 30th Aug and 31st Aug, 2020.
h=11524.24
for (i in 2:nrow(fit1))
{
dy1=fit1$y[1]+11524.24
dy2=fit1$y[2]+dy1
}
Day_1=dy1
Day_2<-dy2
Predict<-rbind(Day_1,Day_2)
colnames(Predict)=c("USD")
head(Predict,3)
## USD
## Day_1 11538.93
## Day_2 11554.48