Bitcoin is a type of cryptocurrency, i.e. it is a digital currency which uses encryption techniques to generate units of the currency and verify the transfer of funds. Bitcoin is a decentralised currency, which operates independently of a central bank (Wikipedia 2018). An estimated 2.9 to 5.8 million unique users have a cryptocurrency wallet, of which most use bitcoin. The price of bitcoin has gone through various cycles of appreciation and depreciation, known as bubbles and bursts, with price fluctuations up to a magnitude of a few thousand USD in the space of a day, so that the currency has become renown for its volatility.
Bitcoin Illustration
The bitcoin historical price data gathered from the CoinMarketCap. This time series will be modelled using regression, ARIMA and GARCH methods. The report details:
Description of the time series
Model specification
Model fitting and selection
Diagnostic checking
Predict the value of bitcoin for the following 10 days
# Import Libraries
library(TSA)
library(fUnitRoots)
library(forecast)
library(CombMSC)
library(lmtest)
library(fGarch)
library(rugarch)
library(zoo)
library(ggplot2)
require(readr)
require(FitAR)
require(knitr)
library(tseries)
Bitcoin <- read.csv("../data/Bitcoin_Historical_Price.csv", header=TRUE)
Bitcoin$Date = as.Date(Bitcoin$Date,'%Y-%m-%d')
Bitcoin.zoo <- zoo(Bitcoin$Close, Bitcoin$Date)
Bitcoin.raw = Bitcoin.zoo
Data is converted to time series object using zoo library. Figure shows the daily closing price of bitcoin from the 27th Apr 2013 to the 3rd Mar 2018, given in USD. Main characteristics of the time series:
Change in trend, at ~ 2017
Change in variance, large spikes in price at the end of the time series
Auto regressive behavior
A flat trend is observed from the start of the time series to early 2017. Next we'll filter series to remove the initial flat trend.
autoplot.zoo(Bitcoin.zoo) +
ylab('Closing Price (USD)') +
xlab('Time (days)') +
ggtitle("Time Series Plot for Daily Bitcoin Prices")
Time Series of Daily Bitcoin Prices
Figure shows The flat part of the time series is removed, to better model the later part of the time series which shows a change in bitcoin price. The plot shows the daily closing price of bitcoin from the 01-Apr-2017 to the 03-Mar-2018, given in USD. Features are similar to original series
Bitcoin.2017 = Bitcoin[Bitcoin$Date > as.Date("2017-04-01"),]
Bitcoin.2017.zoo = zoo(Bitcoin.2017$Close, Bitcoin.2017$Date)
autoplot(Bitcoin.2017.zoo) +
geom_point(size=.5) +
ylab('Closing Price (USD)') +
xlab('Time (days)') +
ggtitle("Time Series Plot for Daily Bitcoin Prices (2017-2018)")
Subset Time Series of Daily Bitcoin Prices
Figure shows the scatter plot of the relationship between consecutive points, to determine the presence of auto regressive behaviour. Correlation value of 0.9935 was calculated.
This is indicative of a strong positive correlation between neighboring Bitcoin values, and in turn the presence of auto correlation.
ggplot(Bitcoin.2017,aes(zlag(Close), Close)) +
geom_point() +
ylab('Current Closing Price (USD)') +
xlab('Previous Day Closing Price (USD)') +
ggtitle("Scatter Plot of Bitcoin Daily Closing Prices")
Scatter Plot of Bitcoin Daily Closing Prices
y = as.vector(Bitcoin.2017.zoo)
x = zlag(Bitcoin.2017.zoo)
index = 2:length(x)
cor(y[index],x[index])
## [1] 0.9935557
Here we'll try fitting linear and quadratic model in bitcoin closing prices. Data has no seasonality so harmonic model is not applicable.
model.ln = lm(Bitcoin.2017.zoo~time(Bitcoin.2017.zoo))
summary(model.ln)
##
## Call:
## lm(formula = Bitcoin.2017.zoo ~ time(Bitcoin.2017.zoo))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4954.5 -1579.6 -668.9 881.2 9660.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.021e+05 2.461e+04 -28.53 <2e-16 ***
## time(Bitcoin.2017.zoo) 4.065e+01 1.412e+00 28.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2511 on 334 degrees of freedom
## Multiple R-squared: 0.7127, Adjusted R-squared: 0.7119
## F-statistic: 828.6 on 1 and 334 DF, p-value: < 2.2e-16
Figure shows the plots give the regression models for the linear are statistically significant, with the same p-value < 2.2e-16 and R-squared values 0.7119.
ggplot(Bitcoin.2017,aes(Date,Close))+
geom_line() +
ylab('Closing Price (USD)') +
xlab('') +
ggtitle('Linear Fitted Model - Bitcoin Prices') +
geom_line(aes(y=fitted(model.ln)),color='#fc5e13')
Bitcoin - Linear Fitted Model
Below are the findings of residuals from linear model:
Figure shows the time series plot, ACF, histogram of the standardized residuals for the linear model of the Bitcoin time series
The standardized residuals show large deviations from the mean 0 in the (time series) plot, and an asymmetric distribution in the histogram, suggesting large errors. The QQ-plot in figure also shows points diverging away from the straight line at either tail, indicating the residuals are not normally distributed. The ACF shows a slow decaying pattern with many significant lags, suggestive of auto correlation.
The Shapiro-Wilk test of normality returned a p-value of 1.204e-15, so the NULL hypothesis is rejected, again suggesting that the residuals are not derived from a normally distributed population.
The linear model does not pass the diagnostic checks, thus the linear model does not capture all the information in the time series and is not suitable for forecasting.
residual_analysis_qq <- function(myresiduals, title = 'QQ Plot of Residuals') {
data=as.data.frame(qqnorm( myresiduals , plot=F))
ggplot(data,aes(x,y)) +
geom_point() +
geom_smooth(method="lm", se=FALSE, color='#e36209', size=.4)+
xlab('Theoretical') +
ylab('Sample') +
ggtitle(title)
}
checkresiduals(model.ln)
Residual Analysis Linear fitted Model
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 321.71, df = 10, p-value < 2.2e-16
residual_analysis_qq(residuals(model.ln))
QQ plot Linear Model
shapiro.test(as.vector(residuals(model.ln)))
##
## Shapiro-Wilk normality test
##
## data: as.vector(residuals(model.ln))
## W = 0.87841, p-value = 1.204e-15
t = as.vector(time(Bitcoin.2017.zoo))
t2 = t^2
model.qa = lm(Bitcoin.2017.zoo~ t + t2) # label the quadratic trend model as model.qa
summary(model.qa)
##
## Call:
## lm(formula = Bitcoin.2017.zoo ~ t + t2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5490.1 -1286.7 -408.4 497.0 9733.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.504e+07 4.874e+06 3.085 0.00221 **
## t -1.766e+03 5.594e+02 -3.156 0.00174 **
## t2 5.183e-02 1.605e-02 3.229 0.00137 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2476 on 333 degrees of freedom
## Multiple R-squared: 0.7214, Adjusted R-squared: 0.7198
## F-statistic: 431.2 on 2 and 333 DF, p-value: < 2.2e-16
Figure shows the plots give the regression models for the quadratic are statistically significant, with the same p-value of 2.2e-16 and R-squared values; 0.7198.
ggplot(Bitcoin.2017,aes(Date,Close))+
geom_line() +
ylab('Closing Price (USD)') +
xlab('') +
ggtitle('Quadratic fitted Model Curve - Bitcoin Daily Prices') +
geom_line(aes(y=fitted(model.qa)),color='#fc5e13')
Quadratic fitted Model Curve
Below are the findings of residuals from Quadratic model:
Figure shows the time series plot, ACF, and histogram of the standardized residuals for the quadratic Model
The standardized residuals show large deviations from the mean 0 in the (time series) plot, and an asymmetric distribution in the histogram, suggesting large errors. The QQ-plot in figure shows points diverging away from the straight line at either tail, indicating the residuals are not normally distributed. The ACF shows a slow decaying pattern with many significant lags, suggestive of auto correlation.
The Shapiro-Wilk test of normality (not shown) returned a p-value of 2.2e-16, so the NULL hypothesis is rejected, again suggesting that the residuals are not derived from a normally distributed population.
The quadratic model does not pass the diagnostic checks, thus the quadratic model does not capture all the information in the time series and is not suitable for forecasting.
checkresiduals(model.qa)
Residual Analysis Quadratic fitted Model
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 321.7, df = 10, p-value < 2.2e-16
residual_analysis_qq(residuals(model.qa))
QQ plot quadratic Model
shapiro.test(as.vector(residuals(model.qa)))
##
## Shapiro-Wilk normality test
##
## data: as.vector(residuals(model.qa))
## W = 0.86085, p-value < 2.2e-16
We'll analyze ACF and PACF to identify AR and MA orders. Below are the results:
Figure shows points in the time series, with a trend and change in variance.
The ACF plot shows a slowly decaying pattern with many significant lags. There is no indication of a wave/sine pattern, so that a seasonal component is not determined. Trend suggests non stationarity.
PACF plot show a large 1st significant lag, with 4 smaller significant/near significant lags.
The time series needs to be stabilized and de-trended prior to specifying a set of possible ARIMA models.
ggtsdisplay(Bitcoin.2017.zoo,
main = 'ACF and PACF of Bitcoin Prices',
ylab='Closing Price (USD)')
ACF and PACF of Bitcoin Prices
Figure shows calculation of the log-likelihood for each lambda value, using the yule-walker method gave a lambda value 0.1-0.2. log transformation is not the best transformation according to boxcox plot.
Bitcoin.transform = BoxCox.ar(Bitcoin.2017.zoo, method = 'yule-walker')
BoxCox Transformation
BoxCox transformation was applied to stabilize the time series. First differencing was applied to de-trend the time series. Figure transformation and differencing.
lambda = sum(Bitcoin.transform$ci)/length(Bitcoin.transform$ci)
Bitcoin.boxcox = (Bitcoin.2017.zoo^lambda - 1) / lambda
Bitcoin.diff = base::diff(Bitcoin.boxcox, differences = 1)
autoplot(Bitcoin.diff) +
ylab('Closing Price (USD)') +
ggtitle('Boxcox Transformed & First Differenced Series')
Boxcox Transformed & First Differenced Series
Figure shows the results of transformed and differenced series.
The ACF plot shows 2 significant lags, i.e. q=2.
PACF plot shows 5 significant lags, however the 5th is ~lag 80, so not included, i.e. p=4, but the adjacent value p=3 may also be considered.
Dickey-Fuller Unit-Root test gave a p-value of 0.01, thus the NULL hypothesis is rejected, and in turn the time series is determined to be stationary.
ARIMA(4,1,2), ARMIA(3,1,2)} are included in the set of possible models
ggtsdisplay(Bitcoin.diff, lag.max = 96,ci.type='ma',
main = 'Boxcox Transformed & First Differenced ACF and PACF plots',
ylab='')
Boxcox Transformed & First Differenced ACF and PACF plots
adf.test(Bitcoin.diff)
##
## Augmented Dickey-Fuller Test
##
## data: Bitcoin.diff
## Dickey-Fuller = -6.968, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
The eacf matrix shown below, identified possible smaller ARIMA models with a p=1, 2 and q=1, 2.
eacf(Bitcoin.diff)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o o o o o o o o o o o
## 1 x o o o o o o o o o o o o o
## 2 x o o o o o o o o o o o o o
## 3 x o x o o o o o o o o o o o
## 4 x x x o o o o o o o o o o o
## 5 o x o x o o o o o o o o o o
## 6 o x o o o o o o o o o o o o
## 7 x x x o x x o o o o o o o o
Figure shows the BIC plot, identified possible larger ARIMA models with a p=4, 5 and q=4. A p=10 was disregarded with smaller values identified.
Thus {ARIMA(1,1,1), ARMIA(1,1,2), ARIMA(2,1,1), ARIMA(2,1,2), ARIMA(4,1,4), ARIMA(5,1,4)} are added to the set of possible models.
res1 = armasubsets(y=Bitcoin.diff,nar=14,nma=14,y.name='test',ar.method='mle')
plot(res1)
BIC Table
From the decided set of ARIMA models, we'll perform coefficient significance tests using maximum likelihood and conditional sum-of-squares methods.
# ARIMA(1,1,1)
model_111_css = arima(Bitcoin.boxcox, order=c(1,1,1),method='CSS')
coeftest(model_111_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.021830 NA NA NA
## ma1 0.022332 NA NA NA
model_111_ml = arima(Bitcoin.boxcox, order=c(1,1,1),method='ML')
coeftest(model_111_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.020106 NA NA NA
## ma1 0.024617 NA NA NA
# ARIMA(1,1,2)
model_112_css = arima(Bitcoin.boxcox,order=c(1,1,2),method='CSS')
coeftest(model_112_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.698636 0.258350 -2.7042 0.006846 **
## ma1 0.749424 0.260011 2.8823 0.003948 **
## ma2 -0.012098 0.059763 -0.2024 0.839579
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_112_ml = arima(Bitcoin.boxcox,order=c(1,1,2),method='ML')
coeftest(model_112_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.713575 0.268293 -2.6597 0.007821 **
## ma1 0.764360 0.269563 2.8356 0.004575 **
## ma2 -0.010088 0.061085 -0.1651 0.868831
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ARIMA(2,1,1)
model_211_css = arima(Bitcoin.boxcox,order=c(2,1,1),method='CSS')
coeftest(model_211_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.729981 0.236531 -3.0862 0.0020274 **
## ar2 -0.010729 0.062640 -0.1713 0.8640056
## ma1 0.779953 0.230417 3.3850 0.0007119 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_211_ml = arima(Bitcoin.boxcox,order=c(2,1,1),method='ML')
coeftest(model_211_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.0091928 NA NA NA
## ar2 -0.0035792 0.0541984 -0.066 0.9473
## ma1 0.0335949 NA NA NA
# ARIMA(2,1,2)
model_212_css = arima(Bitcoin.boxcox,order=c(2,1,2),method='CSS')
coeftest(model_212_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.028545 0.079418 -0.3594 0.7193
## ar2 0.906965 0.075874 11.9535 <2e-16 ***
## ma1 0.085822 0.084577 1.0147 0.3102
## ma2 -0.913644 0.083827 -10.8991 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_212_ml = arima(Bitcoin.boxcox,order=c(2,1,2),method='ML')
coeftest(model_212_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.0051321 0.0690326 0.0743 0.9407
## ar2 0.9315260 0.0654587 14.2307 <2e-16 ***
## ma1 0.0502674 0.0857493 0.5862 0.5577
## ma2 -0.9495800 0.0856749 -11.0835 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ARIMA(3,1,2)
model_312_css = arima(Bitcoin.boxcox,order=c(3,1,2),method='CSS')
coeftest(model_312_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.96048 0.32229 -2.9802 0.002881 **
## ar2 -0.39084 0.21351 -1.8305 0.067170 .
## ar3 0.10504 0.06089 1.7250 0.084524 .
## ma1 1.01485 0.32371 3.1351 0.001718 **
## ma2 0.45021 0.21280 2.1157 0.034370 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_312_ml = arima(Bitcoin.boxcox,order=c(3,1,2),method='ML')
coeftest(model_312_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -1.612866 0.084119 -19.1737 < 2.2e-16 ***
## ar2 -0.854203 0.108453 -7.8762 3.374e-15 ***
## ar3 0.048060 0.058338 0.8238 0.41
## ma1 1.679184 0.064867 25.8866 < 2.2e-16 ***
## ma2 0.925916 0.065388 14.1603 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ARIMA(4,1,2)
model_412_css = arima(Bitcoin.boxcox,order=c(4,1,2),method='CSS')
coeftest(model_412_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.737242 0.415004 -1.7765 0.07566 .
## ar2 -0.499844 0.577987 -0.8648 0.38715
## ar3 0.056956 0.071481 0.7968 0.42556
## ar4 -0.080260 0.070306 -1.1416 0.25363
## ma1 0.793473 0.414413 1.9147 0.05553 .
## ma2 0.540068 0.603325 0.8952 0.37070
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_412_ml = arima(Bitcoin.boxcox,order=c(4,1,2),method='ML')
coeftest(model_412_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -1.109588 0.055297 -20.0661 <2e-16 ***
## ar2 -0.902688 0.082158 -10.9872 <2e-16 ***
## ar3 0.059431 0.082539 0.7200 0.4715
## ar4 -0.029706 0.055097 -0.5392 0.5898
## ma1 1.185968 0.015665 75.7078 <2e-16 ***
## ma2 0.997414 0.023943 41.6570 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ARIMA(4,1,4)
model_414_css = arima(Bitcoin.boxcox,order=c(4,1,4),method='CSS')
coeftest(model_414_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.523664 NA NA NA
## ar2 0.162762 0.041831 3.8910 9.985e-05 ***
## ar3 0.917653 0.074325 12.3465 < 2.2e-16 ***
## ar4 0.415562 NA NA NA
## ma1 0.552550 NA NA NA
## ma2 -0.181660 0.040086 -4.5317 5.850e-06 ***
## ma3 -0.928353 0.071005 -13.0744 < 2.2e-16 ***
## ma4 -0.532390 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_414_ml = arima(Bitcoin.boxcox,order=c(4,1,4),method='ML')
coeftest(model_414_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.857298 0.256941 -3.3366 0.0008482 ***
## ar2 0.075458 0.057469 1.3130 0.1891720
## ar3 1.120648 0.060162 18.6273 < 2.2e-16 ***
## ar4 0.637578 0.254442 2.5058 0.0122178 *
## ma1 0.922555 0.230314 4.0056 6.185e-05 ***
## ma2 -0.031454 0.060127 -0.5231 0.6008852
## ma3 -1.116176 0.056809 -19.6479 < 2.2e-16 ***
## ma4 -0.717939 0.228129 -3.1471 0.0016492 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ARIMA(5,1,4)
model_514_css = arima(Bitcoin.boxcox,order=c(5,1,4),method='CSS')
coeftest(model_514_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.0994981 0.0424432 2.3443 0.0190647 *
## ar2 0.1381344 0.0384323 3.5942 0.0003254 ***
## ar3 0.5069664 0.0030855 164.3061 < 2.2e-16 ***
## ar4 0.1650015 0.0261166 6.3179 2.652e-10 ***
## ar5 0.0902766 0.0577983 1.5619 0.1183062
## ma1 -0.0794650 NA NA NA
## ma2 -0.1856691 0.0430209 -4.3158 1.590e-05 ***
## ma3 -0.5132720 0.0420131 -12.2169 < 2.2e-16 ***
## ma4 -0.3071109 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_514_ml = arima(Bitcoin.boxcox,order=c(5,1,4),method='ML')
coeftest(model_514_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.216615 0.582131 0.3721 0.7098
## ar2 0.125234 0.255391 0.4904 0.6239
## ar3 0.472675 0.520008 0.9090 0.3634
## ar4 -0.054061 0.425997 -0.1269 0.8990
## ar5 0.109269 0.078409 1.3936 0.1634
## ma1 -0.166104 0.585688 -0.2836 0.7767
## ma2 -0.144673 0.251820 -0.5745 0.5656
## ma3 -0.438177 0.523688 -0.8367 0.4028
## ma4 -0.056627 0.413964 -0.1368 0.8912
From the AIC table , BIC table and coefficient significance tests, we selected ARIMA (3,1,2) model. Next we'll perform diagnostic check of selected model
source('sort.score.r')
aic.scores <- sort.score(stats::AIC(model_111_ml,model_112_ml,model_211_ml,model_212_ml,
model_312_ml,model_412_ml,model_414_ml,model_514_ml),
score = "aic")
kable(x=aic.scores, caption="\\label{tab:aic}AIC sorted models") %>%
kable_styling(latex_options = "striped")
| df | AIC | |
|---|---|---|
| model_412_ml | 7 | -103.15254 |
| model_312_ml | 6 | -102.82566 |
| model_212_ml | 5 | -102.63500 |
| model_414_ml | 9 | -101.76633 |
| model_111_ml | 3 | -100.97940 |
| model_112_ml | 4 | -100.65054 |
| model_211_ml | 4 | -98.98447 |
| model_514_ml | 10 | -95.11157 |
bic.scores = sort.score(stats::BIC(model_111_ml,model_112_ml,model_211_ml,model_212_ml,
model_312_ml,model_412_ml,model_414_ml,model_514_ml),
score = "bic" )
kable(bic.scores, caption="\\label{tab:bic}BIC sorted models") %>%
kable_styling(latex_options = "striped")
| df | BIC | |
|---|---|---|
| model_111_ml | 3 | -89.53701 |
| model_112_ml | 4 | -85.39402 |
| model_211_ml | 4 | -83.72795 |
| model_212_ml | 5 | -83.56435 |
| model_312_ml | 6 | -79.94088 |
| model_412_ml | 7 | -76.45362 |
| model_414_ml | 9 | -67.43915 |
| model_514_ml | 10 | -56.97026 |
Below are the findings of residuals from ARIMA(3,1,2) model:
From the time series plot Figure of standard residuals, we can see that residuals are randomly distributed with few spiles, around zero mean level.
Histogram of studentised residuals within [-2,2] range are not distributed so well. Still it can be considered.
ACF plot has 2 significant lags at 19 and 20.
qq plot in figure shows deviation between sample and theoretical values.
Shapiro-Wilk normality test fails to reject null hypothesis. Studentised residuals are coming from normally distributed population
Ljung box test (Figure ) of auto correlation for all lags fails to reject null hypothesis. Data point correlations result from randomness in the sampling process i.e. data are independently distributed
From the overall results, we can make forecasting using ARIMA(3,1,2) model.
fit <- Arima(Bitcoin.2017.zoo, order=c(3,1,2), lambda = lambda)
summary(fit)
## Series: Bitcoin.2017.zoo
## ARIMA(3,1,2)
## Box Cox transformation: lambda= 0.15
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2
## -1.6127 -0.8540 0.0479 1.6791 0.9257
## s.e. 0.0843 0.1085 0.0583 0.0651 0.0654
##
## sigma^2 estimated as 0.04215: log likelihood=57.41
## AIC=-102.83 AICc=-102.57 BIC=-79.94
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 28.56452 517.2228 289.9566 0.5060551 3.975302 0.9891755 0.06609031
checkresiduals(fit)
Residual Analysis Quadratic fitted Model
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,2)
## Q* = 4.8476, df = 5, p-value = 0.4348
##
## Model df: 5. Total lags used: 10
residual_analysis_qq(residuals(fit))
Residual Analysis Linear fitted Model
shapiro.test(as.vector(residuals(fit)))
##
## Shapiro-Wilk normality test
##
## data: as.vector(residuals(fit))
## W = 0.96352, p-value = 1.919e-07
x = residuals(fit)
k=0
LBQPlot(x, lag.max = length(x)-1 , StartLag = k + 1, k = 0, SquaredQ = FALSE)
grid()
Ljung-Box Test
Figure shows the original time series for the historical price of bitcoin. A prediction of the following 10 days is given with the ARIMA(3,1,2) model.
The forecast shows a steady bitcoin price, with a possible increase or decrease within the range of the confidence interval (80 and 95%) bounds.
autoplot(forecast(fit,h=10))
Forecast from ARIMA(3,1,2)
The Model has mean absolute squared error of 0.9891814, while the forecast has error of 3.232775. Error is 3 times the original model. Here we notice that ARIMA model is not suitable for bitcoin prices forecasting. It has not able to capture variation in series. We will try fitting GARCH model for changing variance.
source('MASE.r')
Bitcoin.forecast <- read_csv("../data/Bitcoin_Prices_Forecasts.csv")
Bitcoin.forecast$Date = as.Date(Bitcoin.forecast$Date,'%d/%m/%y')
MASE(Bitcoin.forecast$`Closing price`,
as.vector(tail(fitted(forecast(fit,h=10)),10)))
## $MASE
## MASE
## 1 3.232567
Figure shows McLeod-Li test is significnat at 5% level of significance for all lags. This gives a strong idea about existence of volatiliy clustering. Figure of QQ plot with fat tails is in accordance with volatiliy clustering
McLeod.Li.test(y=Bitcoin.2017.zoo,main="McLeod-Li Test Statistics for Bitcoin")
McLeod Li Test Statistics for Bitcoin
residual_analysis_qq(Bitcoin.2017.zoo, 'QQ Plot')
QQ plot for Bitcoin
We will see ACF and PACF of Absolute and squared series to see ARCH effect.
Figure shows an absolute ACF and FACF plots. We observe many significant lags in both ACF and PACF. EACF do not suggest an ARMA(0,0)
We can identify ARMA(1,0) model for absolute value series. Proposed GARCH models are GARCH(1,0). Here we are estimating with GARCH(p,q) orders, where p is for AR and q for MA.
abs.bitcoin = abs(Bitcoin.2017.zoo)
sq.bitcoin = Bitcoin.2017.zoo^2
par(mfrow=c(1,2))
acf(abs.bitcoin, ci.type="ma",main="ACF plot absolute series")
pacf(abs.bitcoin, main="PACF plot absolute series")
ACF & PACF plot for absolute bitcoin series
eacf(abs.bitcoin)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x x x x x x x x x x x x x
## 1 o o o o x o o o o x o o x o
## 2 x o o o o x o o o x o o o o
## 3 o x o o o o o o o x o o o o
## 4 o x x o o o o o o x o o o o
## 5 x x x o o o o o o x o o o o
## 6 x x x o o o o o o x o o o o
## 7 x x x o o o o o o o o o o o
Figure shows an squared ACF and FACF plots. We observe many significant lags in both plots. EACF Matric do not suggest an ARMA(0,0)
We can identify ARMA(1,1) model for absolute value series. Proposed GARCH models are GARCH(1,1).
par(mfrow=c(1,2))
acf(sq.bitcoin, ci.type="ma",main="ACF plot squared series")
pacf(sq.bitcoin, main="PACF plot squared series")
ACF & PACF plot for Squared bitcoin series
eacf(sq.bitcoin)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x x x x x x x x x x x x x
## 1 x o o o x x o o x x o x x x
## 2 x x o o o x x o o x o o o x
## 3 x x o o o x o o o x o o o o
## 4 x o x o o x o x o x o o o o
## 5 x x o x o o o o o x o o o o
## 6 x x o x o o o x o x o o o o
## 7 o x x x x x x o o o o o o o
From the predicted tentative models, we'll fit GARCH and check the parameters significance.
m.10 = garchFit(formula = ~garch(1,0), data =Bitcoin.diff, trace = FALSE, cond.dist = "QMLE" )
summary(m.10)
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(1, 0), data = Bitcoin.diff, cond.dist = "QMLE",
## trace = FALSE)
##
## Mean and Variance Equation:
## data ~ garch(1, 0)
## <environment: 0x000000002817ed98>
## [data = Bitcoin.diff]
##
## Conditional Distribution:
## QMLE
##
## Coefficient(s):
## mu omega alpha1
## 0.028347 0.034290 0.168884
##
## Std. Errors:
## robust
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.028347 0.010885 2.604 0.00921 **
## omega 0.034290 0.004811 7.127 1.02e-12 ***
## alpha1 0.168884 0.079702 2.119 0.03410 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 64.48749 normalized: 0.1925
##
## Description:
## Fri Sep 18 23:23:24 2020 by user: Napat
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 77.42719 0
## Shapiro-Wilk Test R W 0.9651639 3.502547e-07
## Ljung-Box Test R Q(10) 10.64782 0.3856055
## Ljung-Box Test R Q(15) 15.20767 0.4365648
## Ljung-Box Test R Q(20) 35.80253 0.01622367
## Ljung-Box Test R^2 Q(10) 31.42701 0.0004986272
## Ljung-Box Test R^2 Q(15) 39.29263 0.0005793926
## Ljung-Box Test R^2 Q(20) 68.45095 3.253958e-07
## LM Arch Test R TR^2 21.76202 0.0402752
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -0.3670895 -0.3329331 -0.3672480 -0.3534724
m.11 = garchFit(formula = ~garch(1,1), data =Bitcoin.diff, trace = FALSE, cond.dist = "QMLE" )
summary(m.11)
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(1, 1), data = Bitcoin.diff, cond.dist = "QMLE",
## trace = FALSE)
##
## Mean and Variance Equation:
## data ~ garch(1, 1)
## <environment: 0x0000000026fe1440>
## [data = Bitcoin.diff]
##
## Conditional Distribution:
## QMLE
##
## Coefficient(s):
## mu omega alpha1 beta1
## 0.0268080 0.0014228 0.1716853 0.8138073
##
## Std. Errors:
## robust
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.026808 0.008741 3.067 0.00216 **
## omega 0.001423 0.001008 1.411 0.15812
## alpha1 0.171685 0.068729 2.498 0.01249 *
## beta1 0.813807 0.063644 12.787 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 89.07748 normalized: 0.2659029
##
## Description:
## Fri Sep 18 23:23:24 2020 by user: Napat
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 134.2896 0
## Shapiro-Wilk Test R W 0.9633369 1.868987e-07
## Ljung-Box Test R Q(10) 11.30396 0.3343319
## Ljung-Box Test R Q(15) 13.02518 0.600353
## Ljung-Box Test R Q(20) 27.27717 0.1276445
## Ljung-Box Test R^2 Q(10) 6.660232 0.7570844
## Ljung-Box Test R^2 Q(15) 7.45071 0.9439161
## Ljung-Box Test R^2 Q(20) 11.97325 0.9169935
## LM Arch Test R TR^2 6.237726 0.9036271
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -0.5079253 -0.4623834 -0.5082059 -0.4897691
GARCH(1,0) model has all significant parameters. Next we'll perform residual analysis on selected model.
Results of residual from figure :
From the time series plot of standard residuals, we can see that residuals are randomly distributed with high variation in end
Histogram of studentised residuals within [-4,4] range are not distributed well but also positive skewed
ACF plot has 2 significant lags at 19 and 20. PACF has significant lag at 19
qq plot shows deviation between sample and theoretical values.
Ljung box test of auto correlation for lags up to 19 fails to reject null hypothesis. Data point correlations result from randomness in the sampling process i.e. data are independently distributed. After lag 20, there is significant correlation for few lags
From the overall results, we will go with GARCH(1,0) model.
source('residual.analysis.R')
fit = garch(Bitcoin.diff,order=c(1,0),trace = FALSE)
residual.analysis(fit,class="GARCH",start=2)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.95829, p-value = 3.746e-08
Residual Analysis GARCH(1,0)
Here we'll forecast bitcoin values with ARMA and standard GARCH model for next 10 values. Figure shows the next 10 forecast.
model<-ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 0)),
mean.model = list(armaOrder = c(3, 2), include.mean = FALSE),
distribution.model = "norm")
model.fit<-ugarchfit(spec=model,data=Bitcoin.2017.zoo)
forecast.model.fit = ugarchforecast(model.fit, n.ahead = 10)
plot(forecast.model.fit, which=1)
Forecast with ARMA and GARCH
Here we'll predict 10 values with bootstrap method. Below are the summary statistics for forecast.
bootp = ugarchboot(model.fit, method = "Partial", n.ahead = 10, n.bootpred = 500)
show(bootp)
##
## *-----------------------------------*
## * GARCH Bootstrap Forecast *
## *-----------------------------------*
## Model : sGARCH
## n.ahead : 10
## Bootstrap method: partial
## Date (T[0]): 2018-03-03
##
## Series (summary):
## min q.25 mean q.75 max forecast[analytic]
## t+1 8486.19 10893 11202 11543 14619 11235
## t+2 2439.15 11051 11382 11856 18713 11428
## t+3 -729.23 11126 11562 12050 27938 11587
## t+4 -48852.17 10977 11365 12090 30949 11534
## t+5 -108759.58 11084 11306 12377 30424 11716
## t+6 -155556.59 10984 11194 12553 41350 11772
## t+7 -155879.82 11048 11272 12588 47743 11834
## t+8 -145772.74 11055 11466 12874 42494 11958
## t+9 -151093.44 11034 11505 13043 43510 12017
## t+10 -151697.01 11006 11676 13124 47258 12111
## .....................
##
## Sigma (summary):
## min q0.25 mean q0.75 max forecast[analytic]
## t+1 473.94 473.94 473.94 473.94 473.94 473.94
## t+2 147.94 222.19 505.01 582.67 3385.99 496.27
## t+3 147.94 183.02 563.80 548.01 5775.89 517.61
## t+4 147.94 177.03 546.40 504.75 13415.57 538.09
## t+5 147.94 168.39 617.22 480.86 44291.57 557.80
## t+6 147.94 166.69 696.73 478.41 52212.26 576.81
## t+7 147.94 167.72 619.00 461.75 39888.89 595.20
## t+8 147.94 165.42 503.56 466.23 12487.40 613.02
## t+9 147.97 164.86 488.18 439.39 9892.78 630.32
## t+10 147.95 172.38 510.20 437.77 16023.79 647.15
## .....................
The series is highly volatile, regression and ARIMA models are not suitable for prediction
Other time series models can be applied like exponential state space models, distributed lag models and neural network (nnetar)
Regression models are not suitable for daily bitcoin closing prices.ARIMA model is good, but the mean absolute scaled error of 3.2, (three times higher than original) is not practical for prediction. ARIMA models are not suitable as they are unable to capture high volatility in series. Combination of ARMA (mean) and GARCH (variance) prediction model are better for daily bitcoin closing prices.
Wikipedia. 2018. “Bitcoin.” https://en.wikipedia.org/wiki/Bitcoin.