Next-Day Bitcoin Price Forecast

Author

Nihad & Vikram

Published

June 12, 2024

Requirement

Abstract: This study analyzes forecasts of Bitcoin price using the autoregressive integrated moving average (ARIMA) and neural network autoregression (NNAR) models. Employing the static forecast approach, we forecast next-day Bitcoin price both with and without re-estimation of the forecast model for each step. For cross-validation of forecast results, we consider two different training and test samples. In the first training-sample, NNAR performs better than ARIMA, while ARIMA outperforms NNAR in the second training-sample. Additionally, ARIMA with model re-estimation at each step outperforms NNAR in the two test-sample forecast periods. The Diebold Mariano test confirms the superiority of forecast results of ARIMA model over NNAR in the test-sample periods. Forecast performance of ARIMA models with and without re-estimation are identical for the estimated test-sample periods. Despite the sophistication of NNAR, this paper demonstrates ARIMA enduring power of volatile Bitcoin price prediction.

Keywords: ARIMA; artificial neural network; Bitcoin; cryptocurrency; static forecast

Code setup

Link to code repo Github

Required libraries

Code

library(xts)
library(quantmod)
library(ggthemes)
library(dygraphs)
library(tidyverse)
library(urca)
library(tseries)
library(forecast)
library(dplyr)

Required sub source files

Code

source("../code/config.R")
source("../code/utils.R")
source("../code/model_executor.R")

Read bitcoin csv daily price

Code

quotes_bitcoin <- read_csv("../data/Bitcoindata.csv", 
                           col_select = c(Date,Close))

We can examine structure of the resulting object:

Code

head(quotes_bitcoin)

Date	Close
04/10/2018	6547.56
03/10/2018	6456.77
02/10/2018	6500.00
01/10/2018	6571.20
30/09/2018	6597.81
29/09/2018	6579.38

Code

tail(quotes_bitcoin)

Date	Close
06/01/2012	6.00
05/01/2012	6.65
04/01/2012	5.57
03/01/2012	5.29
02/01/2012	5.00
01/01/2012	5.00

Code

glimpse(quotes_bitcoin)

Rows: 2,466
Columns: 2
$ Date  <chr> "04/10/2018", "03/10/2018", "02/10/2018", "01/10/2018", "30/09/2…
$ Close <dbl> 6547.56, 6456.77, 6500.00, 6571.20, 6597.81, 6579.38, 6610.76, 6…

Let’s also check the class of the Date column:

Code

class(quotes_bitcoin$Close)

[1] "numeric"

lets check structure of the whole dataset

Code

str(quotes_bitcoin)

tibble [2,466 × 2] (S3: tbl_df/tbl/data.frame)
 $ Date : chr [1:2466] "04/10/2018" "03/10/2018" "02/10/2018" "01/10/2018" ...
 $ Close: num [1:2466] 6548 6457 6500 6571 6598 ...
 - attr(*, "spec")=
  .. cols(
  ..   ...1 = col_skip(),
  ..   Date = col_character(),
  ..   Open = col_skip(),
  ..   High = col_skip(),
  ..   Low = col_skip(),
  ..   Close = col_double(),
  ..   `Volume (BTC)` = col_skip(),
  ..   `Volume (Currency)` = col_skip(),
  ..   `Weighted Price` = col_skip()
  .. )

Let’s transform column ‘Date’ into type date:

Code

quotes_bitcoin$Date <- as.Date(quotes_bitcoin$Date, format = "%d/%m/%Y")

We have to give the format in which date is originally stored: * %y means 2-digit year, * %Y means 4-digit year * %m means a month * %d means a day

Code

class(quotes_bitcoin$Date)

[1] "Date"

Code

head(quotes_bitcoin)

Date	Close
2018-10-04	6547.56
2018-10-03	6456.77
2018-10-02	6500.00
2018-10-01	6571.20
2018-09-30	6597.81
2018-09-29	6579.38

Code

glimpse(quotes_bitcoin)

Rows: 2,466
Columns: 2
$ Date  <date> 2018-10-04, 2018-10-03, 2018-10-02, 2018-10-01, 2018-09-30, 201…
$ Close <dbl> 6547.56, 6456.77, 6500.00, 6571.20, 6597.81, 6579.38, 6610.76, 6…

Now R understands this column as dates

Creating xts objects

Code

quotes_bitcoin <- 
  xts(quotes_bitcoin[, -1], # data columns (without the first column with date)
      quotes_bitcoin$Date)  # date/time index

Lets see the result:

Code

head(quotes_bitcoin)

           Close
2012-01-01  5.00
2012-01-02  5.00
2012-01-03  5.29
2012-01-04  5.57
2012-01-05  6.65
2012-01-06  6.00

Code

str(quotes_bitcoin)

An xts object on 2012-01-01 / 2018-10-04 containing: 
  Data:    double [2466, 1]
  Columns: Close
  Index:   Date [2466] (TZ: "UTC")

Basic graphs

Finally, let’s use the ggplot2 package to produce nice visualization. The ggplot2 package expects data to be in long format, rather than wide format. Hence, first we have to convert the tibble to a long tibble:

Plotting Actual Bitcoin Price

Code

tibble(df = quotes_bitcoin) %>%
  ggplot(aes(zoo::index(quotes_bitcoin), df)) +
  geom_line() +
  theme_bw() +
  scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Actual Bitcoin Price",
    subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
    caption = "source: RR 2024",
    x="",
    y=""
  ) +
  theme(panel.background = element_rect(fill = "transparent",color = "black",linewidth = 2))

Plotting Log Transformed Bitcoin Price

Code

tibble(df = quotes_bitcoin) %>%
  ggplot(aes(zoo::index(quotes_bitcoin), log(quotes_bitcoin))) +
  geom_line() +
  theme_bw() +
  scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Log Transformed Bitcoin Price",
    subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
    caption = "source: RR 2024",
    x="",
    y=""
  ) + 
  theme(panel.background = element_rect(fill = "transparent",color = "black",linewidth = 2))

Plotting 1st Difference Log Operator

Code

tibble(df = quotes_bitcoin) %>%
  ggplot(aes(zoo::index(quotes_bitcoin), periodReturn(quotes_bitcoin, period="daily", type="log"))) +
  geom_line() +
  theme_bw() +
  scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "1st Difference Log Operator",
    subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
    caption = "source: RR 2024",
    x="",
    y=""
  ) +
  theme(panel.background = element_rect(fill = "transparent",color = "black",linewidth = 2))

Table 1. Stationary test of data.

First in-sample window (500 days)

Data	Training_Sample	ADF_Test	PP_Test
Original data	01/01/2012~14/05/2013	-1.849 ( 0.642 )	-12.235 ( 0.427 )
Log transformed data	01/01/2012~14/05/2013	-1.521 ( 0.781 )	-3.828 ( 0.896 )
1st difference log operator	01/01/2012~14/05/2013	-9.743 ( 0.010 )	-497.980 ( 0.010 )

Second in-sample window (2000 days)

Data	Training_Sample	ADF_Test	PP_Test
Original data	01/01/2012~25/06/2017	0.617 ( 0.990 )	5.162 ( 0.990 )
Log transformed data	01/01/2012~25/06/2017	-1.378 ( 0.842 )	-3.367 ( 0.918 )
1st difference log operator	01/01/2012~25/06/2017	-11.478 ( 0.010 )	-2103.646 ( 0.010 )

ADF. Augmented Dicky-Fuller test; PP. Phillips-Perron test. p-values in parenthesis, p-value less than 0.05 confirms stationary

Table 2. Training-sample forecast performance.

First training-sample window (500 days)

Forecast_Model	Training_Sample	RMSE	MAPE	MAE
ARIMA (4,1,0)	01/01/2012~14/05/2013	0.063	1.317	0.033
NNAR (2,1)	01/01/2012~14/05/2013	0.058	1.265	0.032

Second training-sample window (2000 days)

Forecast_Model	Training_Sample	RMSE	MAPE	MAE
ARIMA (4,1,1)	01/01/2012~25/06/2017	0.048	0.645	0.027
NNAR (1,2)	01/01/2012~25/06/2017	0.048	0.640	0.027

(a) Actual and forecasted Bitcoin price (training sample:500 days, test-sample:1966 days)

(b) Concentrated view on the forecast period (test-sample:1966 days)

(c) Actual and forecasted Bitcoin price (training sample:2000 days, test-sample:466 days)

(d) Concentrated view on the forecast period (test-sample:466 days)

Table 3. Test-sample static forecast performance.

First test sample

First test-sample window (1966 days) Forecast without re-estimation at each step

Forecast_Model	Training_Sample	RMSE	MAPE	MAE
ARIMA (4,1,0)	15/05/2013~04/10/2018	0.373	2.924	0.230
NNAR (2,1)	15/05/2013~04/10/2018	0.042	0.357	0.024

Forecast with re-estimation at each step

Forecast_Model	Training_Sample	RMSE	MAPE	MAE
ARIMA	15/05/2013~04/10/2018	0.312	2.668	0.205
NNAR	15/05/2013~04/10/2018	0.050	0.425	0.029

Second test sample

Second test-sample window (466 days) Forecast without re-estimation at each step

Forecast_Model	Training_Sample	RMSE	MAPE	MAE
ARIMA (4,1,1)	26/06/2017~04/10/2018	0.026	0.098	0.009
NNAR (1,2)	26/06/2017~04/10/2018	0.022	0.078	0.007

Forecast with re-estimation at each step

Forecast_Model	Training_Sample	RMSE	MAPE	MAE
ARIMA (4,1,1)	26/06/2017~04/10/2018	0.026	0.097	0.009
NNAR (1,2)	26/06/2017~04/10/2018	0.031	0.106	0.009

Table 4. DM test of forecast results.

First test-sample window (1966 days)

Models_Compared	DM_Statistics	p_Value
ARIMA vs. NNAR (re-estimation)	-37.724	3.062208e-246
ARIMA vs. NNAR (without re-estimation)	-34.225	2.223566e-210
ARIMA (re-estimation) vs. ARIMA (without re-estimation)	18.317	2.281731e-70
NNAR (re-estimation) vs. NNAR (without re-estimation)	-18.115	5.935986e-69

Second test-sample window (466 days)

Models_Compared	DM_Statistics	p_Value
ARIMA vs. NNAR (re-estimation)	1.036	3.004223e-01
ARIMA vs. NNAR (without re-estimation)	-19.023	2.136618e-75
ARIMA (re-estimation) vs. ARIMA (without re-estimation)	6.177	7.611747e-10
NNAR (re-estimation) vs. NNAR (without re-estimation)	-13.003	1.943571e-37

p < 0.05 indicates that forecast results of the first method is better than the second method.

Ljung-Box testing for used ARIMA models


    Box-Pierce test

data:  et410
X-squared = 27.863, df = 4, p-value = 0.00001329


    Box-Pierce test

data:  et411
X-squared = 27.005, df = 3, p-value = 0.000005873

Conclusion

Proposed improved solution for 500 training data set - ARIMA models (6,1,1)


    Box-Pierce test

data:  et611
X-squared = 5.5026, df = 3, p-value = 0.1385

                      ME       RMSE        MAE       MPE     MAPE     MASE
Training set 0.006948153 0.06167435 0.03395994 0.2132751 1.364582 1.020184
                    ACF1
Training set -0.02906356

Proposed improved solution for 2000 training data set - ARIMA models (5,1,1)


    Box-Pierce test

data:  et510
X-squared = 3.942, df = 2, p-value = 0.1393

                      ME       RMSE        MAE        MPE      MAPE      MASE
Training set 0.002875649 0.04777167 0.02727576 0.06537294 0.6468885 0.9993979
                     ACF1
Training set -0.007808208