Introduction

Bitcoin is a type of cryptocurrency, i.e. it is a digital currency which uses encryption techniques to generate units of the currency and verify the transfer of funds. Bitcoin is a decentralised currency, which operates independently of a central bank (Wikipedia 2018). An estimated 2.9 to 5.8 million unique users have a cryptocurrency wallet, of which most use bitcoin. The price of bitcoin has gone through various cycles of appreciation and depreciation, known as bubbles and bursts, with price fluctuations up to a magnitude of a few thousand USD in the space of a day, so that the currency has become renown for its volatility.

$\label{fig:bitcoin_graphics}Bitcoin Illustration$

Bitcoin Illustration

The bitcoin historical price data gathered from the CoinMarketCap. This time series will be modelled using regression, ARIMA and GARCH methods. The report details:

Description of the time series
Model specification
Model fitting and selection
Diagnostic checking
Predict the value of bitcoin for the following 10 days

Initial Diagnosis

# Import Libraries
library(TSA)
library(fUnitRoots)
library(forecast)
library(CombMSC)
library(lmtest)
library(fGarch)
library(rugarch)
library(zoo)
library(ggplot2)
require(readr)
require(FitAR)
require(knitr)
library(tseries)

Bitcoin <- read.csv("../data/Bitcoin_Historical_Price.csv", header=TRUE)
Bitcoin$Date = as.Date(Bitcoin$Date,'%Y-%m-%d')
Bitcoin.zoo <- zoo(Bitcoin$Close, Bitcoin$Date)
Bitcoin.raw = Bitcoin.zoo

Data is converted to time series object using zoo library. Figure shows the daily closing price of bitcoin from the 27th Apr 2013 to the 3rd Mar 2018, given in USD. Main characteristics of the time series:

Change in trend, at ~ 2017
Change in variance, large spikes in price at the end of the time series
Auto regressive behavior

A flat trend is observed from the start of the time series to early 2017. Next we'll filter series to remove the initial flat trend.

autoplot.zoo(Bitcoin.zoo) +
  ylab('Closing Price (USD)') + 
  xlab('Time (days)') +
  ggtitle("Time Series Plot for Daily Bitcoin Prices")

$\label{fig:bitcoin_orig}Time Series of Daily Bitcoin Prices$

Time Series of Daily Bitcoin Prices

Subset Series

Figure shows The flat part of the time series is removed, to better model the later part of the time series which shows a change in bitcoin price. The plot shows the daily closing price of bitcoin from the 01-Apr-2017 to the 03-Mar-2018, given in USD. Features are similar to original series

Bitcoin.2017 = Bitcoin[Bitcoin$Date > as.Date("2017-04-01"),]
Bitcoin.2017.zoo = zoo(Bitcoin.2017$Close, Bitcoin.2017$Date)
autoplot(Bitcoin.2017.zoo) +
  geom_point(size=.5) +
  ylab('Closing Price (USD)') + 
  xlab('Time (days)') +
  ggtitle("Time Series Plot for Daily Bitcoin Prices (2017-2018)")

$\label{fig:bitcoin_subset}Subset Time Series of Daily Bitcoin Prices$

Subset Time Series of Daily Bitcoin Prices

Scatter Plot and correlation

Figure shows the scatter plot of the relationship between consecutive points, to determine the presence of auto regressive behaviour. Correlation value of 0.9935 was calculated.

This is indicative of a strong positive correlation between neighboring Bitcoin values, and in turn the presence of auto correlation.

ggplot(Bitcoin.2017,aes(zlag(Close), Close)) + 
  geom_point() +
  ylab('Current Closing Price (USD)') + 
  xlab('Previous Day Closing Price (USD)') +
  ggtitle("Scatter Plot of Bitcoin Daily Closing Prices")

$\label{fig:bitcoin_scatter}Scatter Plot of Bitcoin Daily Closing Prices$

Scatter Plot of Bitcoin Daily Closing Prices

y = as.vector(Bitcoin.2017.zoo)
x = zlag(Bitcoin.2017.zoo)     
index = 2:length(x)      
cor(y[index],x[index])

## [1] 0.9935557

Regression Models

Here we'll try fitting linear and quadratic model in bitcoin closing prices. Data has no seasonality so harmonic model is not applicable.

Linear Model

model.ln = lm(Bitcoin.2017.zoo~time(Bitcoin.2017.zoo))
summary(model.ln)

## 
## Call:
## lm(formula = Bitcoin.2017.zoo ~ time(Bitcoin.2017.zoo))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4954.5 -1579.6  -668.9   881.2  9660.6 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -7.021e+05  2.461e+04  -28.53   <2e-16 ***
## time(Bitcoin.2017.zoo)  4.065e+01  1.412e+00   28.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2511 on 334 degrees of freedom
## Multiple R-squared:  0.7127, Adjusted R-squared:  0.7119 
## F-statistic: 828.6 on 1 and 334 DF,  p-value: < 2.2e-16

Figure shows the plots give the regression models for the linear are statistically significant, with the same p-value < 2.2e-16 and R-squared values 0.7119.

ggplot(Bitcoin.2017,aes(Date,Close))+ 
  geom_line()  +
  ylab('Closing Price (USD)') + 
  xlab('') +
  ggtitle('Linear Fitted Model - Bitcoin Prices') +
  geom_line(aes(y=fitted(model.ln)),color='#fc5e13')

$\label{fig:bitcoin_linear}Bitcoin - Linear Fitted Model$

Bitcoin - Linear Fitted Model

Residual Analysis - Linear Model

Below are the findings of residuals from linear model:

Figure shows the time series plot, ACF, histogram of the standardized residuals for the linear model of the Bitcoin time series
The standardized residuals show large deviations from the mean 0 in the (time series) plot, and an asymmetric distribution in the histogram, suggesting large errors. The QQ-plot in figure also shows points diverging away from the straight line at either tail, indicating the residuals are not normally distributed. The ACF shows a slow decaying pattern with many significant lags, suggestive of auto correlation.
The Shapiro-Wilk test of normality returned a p-value of 1.204e-15, so the NULL hypothesis is rejected, again suggesting that the residuals are not derived from a normally distributed population.

The linear model does not pass the diagnostic checks, thus the linear model does not capture all the information in the time series and is not suitable for forecasting.

residual_analysis_qq <- function(myresiduals, title = 'QQ Plot of Residuals') {
data=as.data.frame(qqnorm( myresiduals , plot=F))
ggplot(data,aes(x,y)) + 
  geom_point() + 
  geom_smooth(method="lm", se=FALSE, color='#e36209', size=.4)+
  xlab('Theoretical') +
  ylab('Sample') +
  ggtitle(title)
}
checkresiduals(model.ln)

$\label{fig:res_lin}Residual Analysis Linear fitted Model$

Residual Analysis Linear fitted Model

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 321.71, df = 10, p-value < 2.2e-16

residual_analysis_qq(residuals(model.ln))

$\label{fig:lin_qq}QQ plot Linear Model$

QQ plot Linear Model

shapiro.test(as.vector(residuals(model.ln)))

## 
##  Shapiro-Wilk normality test
## 
## data:  as.vector(residuals(model.ln))
## W = 0.87841, p-value = 1.204e-15

Quadratic Model

t = as.vector(time(Bitcoin.2017.zoo))
t2 = t^2
model.qa = lm(Bitcoin.2017.zoo~ t + t2) # label the quadratic trend model as model.qa
summary(model.qa)

## 
## Call:
## lm(formula = Bitcoin.2017.zoo ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5490.1 -1286.7  -408.4   497.0  9733.1 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  1.504e+07  4.874e+06   3.085  0.00221 **
## t           -1.766e+03  5.594e+02  -3.156  0.00174 **
## t2           5.183e-02  1.605e-02   3.229  0.00137 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2476 on 333 degrees of freedom
## Multiple R-squared:  0.7214, Adjusted R-squared:  0.7198 
## F-statistic: 431.2 on 2 and 333 DF,  p-value: < 2.2e-16

Figure shows the plots give the regression models for the quadratic are statistically significant, with the same p-value of 2.2e-16 and R-squared values; 0.7198.

ggplot(Bitcoin.2017,aes(Date,Close))+ 
  geom_line()  +
  ylab('Closing Price (USD)') + 
  xlab('') +
  ggtitle('Quadratic fitted Model Curve - Bitcoin Daily Prices') +
  geom_line(aes(y=fitted(model.qa)),color='#fc5e13')

$\label{fig:bitcoin_subset6}Quadratic fitted Model Curve$

Quadratic fitted Model Curve

Residual Analysis - Quadratic Model

Below are the findings of residuals from Quadratic model:

Figure shows the time series plot, ACF, and histogram of the standardized residuals for the quadratic Model
The standardized residuals show large deviations from the mean 0 in the (time series) plot, and an asymmetric distribution in the histogram, suggesting large errors. The QQ-plot in figure shows points diverging away from the straight line at either tail, indicating the residuals are not normally distributed. The ACF shows a slow decaying pattern with many significant lags, suggestive of auto correlation.
The Shapiro-Wilk test of normality (not shown) returned a p-value of 2.2e-16, so the NULL hypothesis is rejected, again suggesting that the residuals are not derived from a normally distributed population.

The quadratic model does not pass the diagnostic checks, thus the quadratic model does not capture all the information in the time series and is not suitable for forecasting.

checkresiduals(model.qa)

$\label{fig:res_quad}Residual Analysis Quadratic fitted Model$

Residual Analysis Quadratic fitted Model

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 321.7, df = 10, p-value < 2.2e-16

residual_analysis_qq(residuals(model.qa))

$\label{fig:quad_qq}QQ plot quadratic Model$

QQ plot quadratic Model

shapiro.test(as.vector(residuals(model.qa)))

## 
##  Shapiro-Wilk normality test
## 
## data:  as.vector(residuals(model.qa))
## W = 0.86085, p-value < 2.2e-16

ARIMA Models

ARIMA Initial Diagnosis

We'll analyze ACF and PACF to identify AR and MA orders. Below are the results:

Figure shows points in the time series, with a trend and change in variance.
The ACF plot shows a slowly decaying pattern with many significant lags. There is no indication of a wave/sine pattern, so that a seasonal component is not determined. Trend suggests non stationarity.
PACF plot show a large 1st significant lag, with 4 smaller significant/near significant lags.

The time series needs to be stabilized and de-trended prior to specifying a set of possible ARIMA models.

ggtsdisplay(Bitcoin.2017.zoo,
            main = 'ACF and PACF of Bitcoin Prices', 
            ylab='Closing Price (USD)')

$\label{fig:acf_pacf}ACF and PACF of Bitcoin Prices$

ACF and PACF of Bitcoin Prices

Transformation

Figure shows calculation of the log-likelihood for each lambda value, using the yule-walker method gave a lambda value 0.1-0.2. log transformation is not the best transformation according to boxcox plot.

Bitcoin.transform = BoxCox.ar(Bitcoin.2017.zoo, method = 'yule-walker')

$\label{fig:boxcox}BoxCox Transformation$

BoxCox Transformation

BoxCox transformation was applied to stabilize the time series. First differencing was applied to de-trend the time series. Figure transformation and differencing.

lambda = sum(Bitcoin.transform$ci)/length(Bitcoin.transform$ci)
Bitcoin.boxcox = (Bitcoin.2017.zoo^lambda - 1) / lambda 
Bitcoin.diff = base::diff(Bitcoin.boxcox, differences = 1)
autoplot(Bitcoin.diff) +
  ylab('Closing Price (USD)') + 
  ggtitle('Boxcox Transformed & First Differenced Series')

$\label{fig:diff_series}Boxcox Transformed & First Differenced Series$

Boxcox Transformed & First Differenced Series

Diagnosis - Transformed Series

Figure shows the results of transformed and differenced series.

The ACF plot shows 2 significant lags, i.e. q=2.
PACF plot shows 5 significant lags, however the 5th is ~lag 80, so not included, i.e. p=4, but the adjacent value p=3 may also be considered.
Dickey-Fuller Unit-Root test gave a p-value of 0.01, thus the NULL hypothesis is rejected, and in turn the time series is determined to be stationary.

ARIMA(4,1,2), ARMIA(3,1,2)} are included in the set of possible models

ggtsdisplay(Bitcoin.diff, lag.max = 96,ci.type='ma',
            main = 'Boxcox Transformed & First Differenced ACF and PACF plots', 
            ylab='')

$\label{fig:diagnosis_diff}Boxcox Transformed & First Differenced ACF and PACF plots$

Boxcox Transformed & First Differenced ACF and PACF plots

adf.test(Bitcoin.diff)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Bitcoin.diff
## Dickey-Fuller = -6.968, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary

Extended Autocorrelation Function

The eacf matrix shown below, identified possible smaller ARIMA models with a p=1, 2 and q=1, 2.

eacf(Bitcoin.diff)

## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o o o o o o o o  o  o  o 
## 1 x o o o o o o o o o o  o  o  o 
## 2 x o o o o o o o o o o  o  o  o 
## 3 x o x o o o o o o o o  o  o  o 
## 4 x x x o o o o o o o o  o  o  o 
## 5 o x o x o o o o o o o  o  o  o 
## 6 o x o o o o o o o o o  o  o  o 
## 7 x x x o x x o o o o o  o  o  o

ARMA Subsets

Figure shows the BIC plot, identified possible larger ARIMA models with a p=4, 5 and q=4. A p=10 was disregarded with smaller values identified.

Thus {ARIMA(1,1,1), ARMIA(1,1,2), ARIMA(2,1,1), ARIMA(2,1,2), ARIMA(4,1,4), ARIMA(5,1,4)} are added to the set of possible models.

res1 = armasubsets(y=Bitcoin.diff,nar=14,nma=14,y.name='test',ar.method='mle')
plot(res1)

$\label{fig:arma_subset}BIC Table$

BIC Table

Model Fitting & Significance Tests

From the decided set of ARIMA models, we'll perform coefficient significance tests using maximum likelihood and conditional sum-of-squares methods.

# ARIMA(1,1,1)
model_111_css = arima(Bitcoin.boxcox, order=c(1,1,1),method='CSS')
coeftest(model_111_css)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)
## ar1 0.021830         NA      NA       NA
## ma1 0.022332         NA      NA       NA

model_111_ml = arima(Bitcoin.boxcox, order=c(1,1,1),method='ML')
coeftest(model_111_ml)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)
## ar1 0.020106         NA      NA       NA
## ma1 0.024617         NA      NA       NA

# ARIMA(1,1,2)
model_112_css = arima(Bitcoin.boxcox,order=c(1,1,2),method='CSS')
coeftest(model_112_css)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)   
## ar1 -0.698636   0.258350 -2.7042 0.006846 **
## ma1  0.749424   0.260011  2.8823 0.003948 **
## ma2 -0.012098   0.059763 -0.2024 0.839579   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_112_ml = arima(Bitcoin.boxcox,order=c(1,1,2),method='ML')
coeftest(model_112_ml)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)   
## ar1 -0.713575   0.268293 -2.6597 0.007821 **
## ma1  0.764360   0.269563  2.8356 0.004575 **
## ma2 -0.010088   0.061085 -0.1651 0.868831   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ARIMA(2,1,1)
model_211_css = arima(Bitcoin.boxcox,order=c(2,1,1),method='CSS')
coeftest(model_211_css)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.729981   0.236531 -3.0862 0.0020274 ** 
## ar2 -0.010729   0.062640 -0.1713 0.8640056    
## ma1  0.779953   0.230417  3.3850 0.0007119 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_211_ml = arima(Bitcoin.boxcox,order=c(2,1,1),method='ML')
coeftest(model_211_ml)

## 
## z test of coefficients:
## 
##       Estimate Std. Error z value Pr(>|z|)
## ar1  0.0091928         NA      NA       NA
## ar2 -0.0035792  0.0541984  -0.066   0.9473
## ma1  0.0335949         NA      NA       NA

# ARIMA(2,1,2)
model_212_css = arima(Bitcoin.boxcox,order=c(2,1,2),method='CSS')
coeftest(model_212_css)

## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value Pr(>|z|)    
## ar1 -0.028545   0.079418  -0.3594   0.7193    
## ar2  0.906965   0.075874  11.9535   <2e-16 ***
## ma1  0.085822   0.084577   1.0147   0.3102    
## ma2 -0.913644   0.083827 -10.8991   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_212_ml = arima(Bitcoin.boxcox,order=c(2,1,2),method='ML')
coeftest(model_212_ml)

## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value Pr(>|z|)    
## ar1  0.0051321  0.0690326   0.0743   0.9407    
## ar2  0.9315260  0.0654587  14.2307   <2e-16 ***
## ma1  0.0502674  0.0857493   0.5862   0.5577    
## ma2 -0.9495800  0.0856749 -11.0835   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ARIMA(3,1,2)
model_312_css = arima(Bitcoin.boxcox,order=c(3,1,2),method='CSS')
coeftest(model_312_css)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)   
## ar1 -0.96048    0.32229 -2.9802 0.002881 **
## ar2 -0.39084    0.21351 -1.8305 0.067170 . 
## ar3  0.10504    0.06089  1.7250 0.084524 . 
## ma1  1.01485    0.32371  3.1351 0.001718 **
## ma2  0.45021    0.21280  2.1157 0.034370 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_312_ml = arima(Bitcoin.boxcox,order=c(3,1,2),method='ML')
coeftest(model_312_ml)

## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -1.612866   0.084119 -19.1737 < 2.2e-16 ***
## ar2 -0.854203   0.108453  -7.8762 3.374e-15 ***
## ar3  0.048060   0.058338   0.8238      0.41    
## ma1  1.679184   0.064867  25.8866 < 2.2e-16 ***
## ma2  0.925916   0.065388  14.1603 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ARIMA(4,1,2)
model_412_css = arima(Bitcoin.boxcox,order=c(4,1,2),method='CSS')
coeftest(model_412_css)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)  
## ar1 -0.737242   0.415004 -1.7765  0.07566 .
## ar2 -0.499844   0.577987 -0.8648  0.38715  
## ar3  0.056956   0.071481  0.7968  0.42556  
## ar4 -0.080260   0.070306 -1.1416  0.25363  
## ma1  0.793473   0.414413  1.9147  0.05553 .
## ma2  0.540068   0.603325  0.8952  0.37070  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_412_ml = arima(Bitcoin.boxcox,order=c(4,1,2),method='ML')
coeftest(model_412_ml)

## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value Pr(>|z|)    
## ar1 -1.109588   0.055297 -20.0661   <2e-16 ***
## ar2 -0.902688   0.082158 -10.9872   <2e-16 ***
## ar3  0.059431   0.082539   0.7200   0.4715    
## ar4 -0.029706   0.055097  -0.5392   0.5898    
## ma1  1.185968   0.015665  75.7078   <2e-16 ***
## ma2  0.997414   0.023943  41.6570   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ARIMA(4,1,4)
model_414_css = arima(Bitcoin.boxcox,order=c(4,1,4),method='CSS')
coeftest(model_414_css)

## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.523664         NA       NA        NA    
## ar2  0.162762   0.041831   3.8910 9.985e-05 ***
## ar3  0.917653   0.074325  12.3465 < 2.2e-16 ***
## ar4  0.415562         NA       NA        NA    
## ma1  0.552550         NA       NA        NA    
## ma2 -0.181660   0.040086  -4.5317 5.850e-06 ***
## ma3 -0.928353   0.071005 -13.0744 < 2.2e-16 ***
## ma4 -0.532390         NA       NA        NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_414_ml = arima(Bitcoin.boxcox,order=c(4,1,4),method='ML')
coeftest(model_414_ml)

## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.857298   0.256941  -3.3366 0.0008482 ***
## ar2  0.075458   0.057469   1.3130 0.1891720    
## ar3  1.120648   0.060162  18.6273 < 2.2e-16 ***
## ar4  0.637578   0.254442   2.5058 0.0122178 *  
## ma1  0.922555   0.230314   4.0056 6.185e-05 ***
## ma2 -0.031454   0.060127  -0.5231 0.6008852    
## ma3 -1.116176   0.056809 -19.6479 < 2.2e-16 ***
## ma4 -0.717939   0.228129  -3.1471 0.0016492 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ARIMA(5,1,4)
model_514_css = arima(Bitcoin.boxcox,order=c(5,1,4),method='CSS')
coeftest(model_514_css)

## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1  0.0994981  0.0424432   2.3443 0.0190647 *  
## ar2  0.1381344  0.0384323   3.5942 0.0003254 ***
## ar3  0.5069664  0.0030855 164.3061 < 2.2e-16 ***
## ar4  0.1650015  0.0261166   6.3179 2.652e-10 ***
## ar5  0.0902766  0.0577983   1.5619 0.1183062    
## ma1 -0.0794650         NA       NA        NA    
## ma2 -0.1856691  0.0430209  -4.3158 1.590e-05 ***
## ma3 -0.5132720  0.0420131 -12.2169 < 2.2e-16 ***
## ma4 -0.3071109         NA       NA        NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_514_ml = arima(Bitcoin.boxcox,order=c(5,1,4),method='ML')
coeftest(model_514_ml)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)
## ar1  0.216615   0.582131  0.3721   0.7098
## ar2  0.125234   0.255391  0.4904   0.6239
## ar3  0.472675   0.520008  0.9090   0.3634
## ar4 -0.054061   0.425997 -0.1269   0.8990
## ar5  0.109269   0.078409  1.3936   0.1634
## ma1 -0.166104   0.585688 -0.2836   0.7767
## ma2 -0.144673   0.251820 -0.5745   0.5656
## ma3 -0.438177   0.523688 -0.8367   0.4028
## ma4 -0.056627   0.413964 -0.1368   0.8912

AIC and BIC Values

From the AIC table , BIC table and coefficient significance tests, we selected ARIMA (3,1,2) model. Next we'll perform diagnostic check of selected model

source('sort.score.r')
aic.scores <- sort.score(stats::AIC(model_111_ml,model_112_ml,model_211_ml,model_212_ml,
                                   model_312_ml,model_412_ml,model_414_ml,model_514_ml), 
                        score = "aic")
kable(x=aic.scores, caption="\\label{tab:aic}AIC sorted models") %>%
  kable_styling(latex_options = "striped")

AIC sorted models
	df	AIC
model_412_ml	7	-103.15254
model_312_ml	6	-102.82566
model_212_ml	5	-102.63500
model_414_ml	9	-101.76633
model_111_ml	3	-100.97940
model_112_ml	4	-100.65054
model_211_ml	4	-98.98447
model_514_ml	10	-95.11157

bic.scores = sort.score(stats::BIC(model_111_ml,model_112_ml,model_211_ml,model_212_ml,
                                   model_312_ml,model_412_ml,model_414_ml,model_514_ml), 
                        score = "bic" )
kable(bic.scores, caption="\\label{tab:bic}BIC sorted models") %>%
  kable_styling(latex_options = "striped")

BIC sorted models
	df	BIC
model_111_ml	3	-89.53701
model_112_ml	4	-85.39402
model_211_ml	4	-83.72795
model_212_ml	5	-83.56435
model_312_ml	6	-79.94088
model_412_ml	7	-76.45362
model_414_ml	9	-67.43915
model_514_ml	10	-56.97026

Residual Analysis - ARIMA Model

Below are the findings of residuals from ARIMA(3,1,2) model:

From the time series plot Figure of standard residuals, we can see that residuals are randomly distributed with few spiles, around zero mean level.
Histogram of studentised residuals within [-2,2] range are not distributed so well. Still it can be considered.
ACF plot has 2 significant lags at 19 and 20.
qq plot in figure shows deviation between sample and theoretical values.
Shapiro-Wilk normality test fails to reject null hypothesis. Studentised residuals are coming from normally distributed population
Ljung box test (Figure ) of auto correlation for all lags fails to reject null hypothesis. Data point correlations result from randomness in the sampling process i.e. data are independently distributed

From the overall results, we can make forecasting using ARIMA(3,1,2) model.

fit <- Arima(Bitcoin.2017.zoo, order=c(3,1,2), lambda = lambda)
summary(fit)

## Series: Bitcoin.2017.zoo 
## ARIMA(3,1,2) 
## Box Cox transformation: lambda= 0.15 
## 
## Coefficients:
##           ar1      ar2     ar3     ma1     ma2
##       -1.6127  -0.8540  0.0479  1.6791  0.9257
## s.e.   0.0843   0.1085  0.0583  0.0651  0.0654
## 
## sigma^2 estimated as 0.04215:  log likelihood=57.41
## AIC=-102.83   AICc=-102.57   BIC=-79.94
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 28.56452 517.2228 289.9566 0.5060551 3.975302 0.9891755 0.06609031

checkresiduals(fit)

$\label{fig:arima_res}Residual Analysis Quadratic fitted Model$

Residual Analysis Quadratic fitted Model

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,2)
## Q* = 4.8476, df = 5, p-value = 0.4348
## 
## Model df: 5.   Total lags used: 10

residual_analysis_qq(residuals(fit))

$\label{fig:arima_qq}Residual Analysis Linear fitted Model$

Residual Analysis Linear fitted Model

shapiro.test(as.vector(residuals(fit)))

## 
##  Shapiro-Wilk normality test
## 
## data:  as.vector(residuals(fit))
## W = 0.96352, p-value = 1.919e-07

x = residuals(fit)
k=0
LBQPlot(x, lag.max = length(x)-1 , StartLag = k + 1, k = 0, SquaredQ = FALSE)
grid()

$\label{fig:boxtest_arima}Ljung-Box Test$

Ljung-Box Test

ARIMA Forecasting

Figure shows the original time series for the historical price of bitcoin. A prediction of the following 10 days is given with the ARIMA(3,1,2) model.

The forecast shows a steady bitcoin price, with a possible increase or decrease within the range of the confidence interval (80 and 95%) bounds.

autoplot(forecast(fit,h=10))

$\label{fig:arima_forecast}Forecast from ARIMA(3,1,2)$

Forecast from ARIMA(3,1,2)

MASE Error

The Model has mean absolute squared error of 0.9891814, while the forecast has error of 3.232775. Error is 3 times the original model. Here we notice that ARIMA model is not suitable for bitcoin prices forecasting. It has not able to capture variation in series. We will try fitting GARCH model for changing variance.

source('MASE.r')

Bitcoin.forecast <- read_csv("../data/Bitcoin_Prices_Forecasts.csv")
Bitcoin.forecast$Date = as.Date(Bitcoin.forecast$Date,'%d/%m/%y')

MASE(Bitcoin.forecast$`Closing price`,
     as.vector(tail(fitted(forecast(fit,h=10)),10)))

## $MASE
##       MASE
## 1 3.232567

Heteroscedasticity Models

Figure shows McLeod-Li test is significnat at 5% level of significance for all lags. This gives a strong idea about existence of volatiliy clustering. Figure of QQ plot with fat tails is in accordance with volatiliy clustering

McLeod.Li.test(y=Bitcoin.2017.zoo,main="McLeod-Li Test Statistics for Bitcoin")

$\label{fig:mcleod}McLeod Li Test Statistics for Bitcoin$

McLeod Li Test Statistics for Bitcoin

residual_analysis_qq(Bitcoin.2017.zoo, 'QQ Plot')

$\label{fig:qq_hetro}QQ plot for Bitcoin$

QQ plot for Bitcoin

We will see ACF and PACF of Absolute and squared series to see ARCH effect.

GARCH tests

Figure shows an absolute ACF and FACF plots. We observe many significant lags in both ACF and PACF. EACF do not suggest an ARMA(0,0)

We can identify ARMA(1,0) model for absolute value series. Proposed GARCH models are GARCH(1,0). Here we are estimating with GARCH(p,q) orders, where p is for AR and q for MA.

abs.bitcoin = abs(Bitcoin.2017.zoo)
sq.bitcoin = Bitcoin.2017.zoo^2

par(mfrow=c(1,2))
acf(abs.bitcoin, ci.type="ma",main="ACF plot absolute series")
pacf(abs.bitcoin, main="PACF plot absolute series")

$\label{fig:abs_garch}ACF & PACF plot for absolute bitcoin series$

ACF & PACF plot for absolute bitcoin series

eacf(abs.bitcoin)

## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x x x x x x x x x x  x  x  x 
## 1 o o o o x o o o o x o  o  x  o 
## 2 x o o o o x o o o x o  o  o  o 
## 3 o x o o o o o o o x o  o  o  o 
## 4 o x x o o o o o o x o  o  o  o 
## 5 x x x o o o o o o x o  o  o  o 
## 6 x x x o o o o o o x o  o  o  o 
## 7 x x x o o o o o o o o  o  o  o

Figure shows an squared ACF and FACF plots. We observe many significant lags in both plots. EACF Matric do not suggest an ARMA(0,0)

We can identify ARMA(1,1) model for absolute value series. Proposed GARCH models are GARCH(1,1).

par(mfrow=c(1,2))
acf(sq.bitcoin, ci.type="ma",main="ACF plot squared series")
pacf(sq.bitcoin, main="PACF plot squared series")

$\label{fig:sq_garch}ACF & PACF plot for Squared bitcoin series$

ACF & PACF plot for Squared bitcoin series

eacf(sq.bitcoin)

## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x x x x x x x x x x  x  x  x 
## 1 x o o o x x o o x x o  x  x  x 
## 2 x x o o o x x o o x o  o  o  x 
## 3 x x o o o x o o o x o  o  o  o 
## 4 x o x o o x o x o x o  o  o  o 
## 5 x x o x o o o o o x o  o  o  o 
## 6 x x o x o o o x o x o  o  o  o 
## 7 o x x x x x x o o o o  o  o  o

GARCH Models

From the predicted tentative models, we'll fit GARCH and check the parameters significance.

m.10 = garchFit(formula = ~garch(1,0), data =Bitcoin.diff, trace = FALSE, cond.dist = "QMLE" )
summary(m.10)

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 0), data = Bitcoin.diff, cond.dist = "QMLE", 
##     trace = FALSE) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 0)
## <environment: 0x000000002817ed98>
##  [data = Bitcoin.diff]
## 
## Conditional Distribution:
##  QMLE 
## 
## Coefficient(s):
##       mu     omega    alpha1  
## 0.028347  0.034290  0.168884  
## 
## Std. Errors:
##  robust 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu      0.028347    0.010885    2.604  0.00921 ** 
## omega   0.034290    0.004811    7.127 1.02e-12 ***
## alpha1  0.168884    0.079702    2.119  0.03410 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  64.48749    normalized:  0.1925 
## 
## Description:
##  Fri Sep 18 23:23:24 2020 by user: Napat 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  77.42719  0           
##  Shapiro-Wilk Test  R    W      0.9651639 3.502547e-07
##  Ljung-Box Test     R    Q(10)  10.64782  0.3856055   
##  Ljung-Box Test     R    Q(15)  15.20767  0.4365648   
##  Ljung-Box Test     R    Q(20)  35.80253  0.01622367  
##  Ljung-Box Test     R^2  Q(10)  31.42701  0.0004986272
##  Ljung-Box Test     R^2  Q(15)  39.29263  0.0005793926
##  Ljung-Box Test     R^2  Q(20)  68.45095  3.253958e-07
##  LM Arch Test       R    TR^2   21.76202  0.0402752   
## 
## Information Criterion Statistics:
##        AIC        BIC        SIC       HQIC 
## -0.3670895 -0.3329331 -0.3672480 -0.3534724

m.11 = garchFit(formula = ~garch(1,1), data =Bitcoin.diff, trace = FALSE, cond.dist = "QMLE" )
summary(m.11)

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 1), data = Bitcoin.diff, cond.dist = "QMLE", 
##     trace = FALSE) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 1)
## <environment: 0x0000000026fe1440>
##  [data = Bitcoin.diff]
## 
## Conditional Distribution:
##  QMLE 
## 
## Coefficient(s):
##        mu      omega     alpha1      beta1  
## 0.0268080  0.0014228  0.1716853  0.8138073  
## 
## Std. Errors:
##  robust 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu      0.026808    0.008741    3.067  0.00216 ** 
## omega   0.001423    0.001008    1.411  0.15812    
## alpha1  0.171685    0.068729    2.498  0.01249 *  
## beta1   0.813807    0.063644   12.787  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  89.07748    normalized:  0.2659029 
## 
## Description:
##  Fri Sep 18 23:23:24 2020 by user: Napat 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  134.2896  0           
##  Shapiro-Wilk Test  R    W      0.9633369 1.868987e-07
##  Ljung-Box Test     R    Q(10)  11.30396  0.3343319   
##  Ljung-Box Test     R    Q(15)  13.02518  0.600353    
##  Ljung-Box Test     R    Q(20)  27.27717  0.1276445   
##  Ljung-Box Test     R^2  Q(10)  6.660232  0.7570844   
##  Ljung-Box Test     R^2  Q(15)  7.45071   0.9439161   
##  Ljung-Box Test     R^2  Q(20)  11.97325  0.9169935   
##  LM Arch Test       R    TR^2   6.237726  0.9036271   
## 
## Information Criterion Statistics:
##        AIC        BIC        SIC       HQIC 
## -0.5079253 -0.4623834 -0.5082059 -0.4897691

GARCH(1,0) model has all significant parameters. Next we'll perform residual analysis on selected model.

Residual Analysis - GARCH

Results of residual from figure :

From the time series plot of standard residuals, we can see that residuals are randomly distributed with high variation in end
Histogram of studentised residuals within [-4,4] range are not distributed well but also positive skewed
ACF plot has 2 significant lags at 19 and 20. PACF has significant lag at 19
qq plot shows deviation between sample and theoretical values.
Ljung box test of auto correlation for lags up to 19 fails to reject null hypothesis. Data point correlations result from randomness in the sampling process i.e. data are independently distributed. After lag 20, there is significant correlation for few lags

From the overall results, we will go with GARCH(1,0) model.

source('residual.analysis.R')
fit = garch(Bitcoin.diff,order=c(1,0),trace = FALSE)
residual.analysis(fit,class="GARCH",start=2)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.95829, p-value = 3.746e-08

$\label{fig:garch_res}Residual Analysis GARCH(1,0)$

Residual Analysis GARCH(1,0)

Forecasting with mean and variance

Here we'll forecast bitcoin values with ARMA and standard GARCH model for next 10 values. Figure shows the next 10 forecast.

model<-ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 0)), 
                   mean.model = list(armaOrder = c(3, 2), include.mean = FALSE), 
                   distribution.model = "norm")
model.fit<-ugarchfit(spec=model,data=Bitcoin.2017.zoo)

forecast.model.fit = ugarchforecast(model.fit, n.ahead = 10)
plot(forecast.model.fit, which=1)

$\label{fig:garch_forecast}Forecast with ARMA and GARCH$

Forecast with ARMA and GARCH

Forecasting with GARCH Bootstrap

Here we'll predict 10 values with bootstrap method. Below are the summary statistics for forecast.

bootp = ugarchboot(model.fit, method = "Partial", n.ahead = 10, n.bootpred = 500)
show(bootp)

## 
## *-----------------------------------*
## *     GARCH Bootstrap Forecast      *
## *-----------------------------------*
## Model : sGARCH
## n.ahead : 10
## Bootstrap method:  partial
## Date (T[0]): 2018-03-03
## 
## Series (summary):
##             min  q.25  mean  q.75   max forecast[analytic]
## t+1     8486.19 10893 11202 11543 14619              11235
## t+2     2439.15 11051 11382 11856 18713              11428
## t+3     -729.23 11126 11562 12050 27938              11587
## t+4   -48852.17 10977 11365 12090 30949              11534
## t+5  -108759.58 11084 11306 12377 30424              11716
## t+6  -155556.59 10984 11194 12553 41350              11772
## t+7  -155879.82 11048 11272 12588 47743              11834
## t+8  -145772.74 11055 11466 12874 42494              11958
## t+9  -151093.44 11034 11505 13043 43510              12017
## t+10 -151697.01 11006 11676 13124 47258              12111
## .....................
## 
## Sigma (summary):
##         min  q0.25   mean  q0.75      max forecast[analytic]
## t+1  473.94 473.94 473.94 473.94   473.94             473.94
## t+2  147.94 222.19 505.01 582.67  3385.99             496.27
## t+3  147.94 183.02 563.80 548.01  5775.89             517.61
## t+4  147.94 177.03 546.40 504.75 13415.57             538.09
## t+5  147.94 168.39 617.22 480.86 44291.57             557.80
## t+6  147.94 166.69 696.73 478.41 52212.26             576.81
## t+7  147.94 167.72 619.00 461.75 39888.89             595.20
## t+8  147.94 165.42 503.56 466.23 12487.40             613.02
## t+9  147.97 164.86 488.18 439.39  9892.78             630.32
## t+10 147.95 172.38 510.20 437.77 16023.79             647.15
## .....................

Limitations

The series is highly volatile, regression and ARIMA models are not suitable for prediction
Other time series models can be applied like exponential state space models, distributed lag models and neural network (nnetar)

Conclusions

Regression models are not suitable for daily bitcoin closing prices.ARIMA model is good, but the mean absolute scaled error of 3.2, (three times higher than original) is not practical for prediction. ARIMA models are not suitable as they are unable to capture high volatility in series. Combination of ARMA (mean) and GARCH (variance) prediction model are better for daily bitcoin closing prices.

References

Wikipedia. 2018. “Bitcoin.” https://en.wikipedia.org/wiki/Bitcoin.

Bitcoin Closing Prices Forecasting

MATH 1318 Time Series Analysis - Final Project

Team Members: 1.Rahul k. gupta (s3635232), 2.Terrie christensen (s3664899), 3.Napapach Dechawatthanaphokin (s3613572)

27 May 2018