R Notebook

Time series Analysis on bitcoin prices data part2

Loading packages

libraries <- c('forecast','astsa','tseries','ggplot2','xts','data.table','gridExtra','tidyquant','knitr','AnomalyDetection','cpm','ggpubr')
sapply(libraries, require, character = T)

##         forecast            astsa          tseries          ggplot2 
##             TRUE             TRUE             TRUE             TRUE 
##              xts       data.table        gridExtra        tidyquant 
##             TRUE             TRUE             TRUE             TRUE 
##            knitr AnomalyDetection              cpm           ggpubr 
##             TRUE             TRUE             TRUE             TRUE

setwd("/home/rahul/Documents/Rahul/BTP/Data")
data1 = read.table('bitcoin_price_daily.csv', sep=',', header = T)
head(data1)

data1$Open <- as.numeric(data1$Open)
data1$High <- as.numeric(data1$High)
data1$Low <- as.numeric(data1$Low)
data1$Close <- as.numeric(data1$Close)

data1$Volume <- as.numeric(data1$Volume)
data1$Date <- as.character(data1$Date)

library(stringr)

x = str_replace_all(data1$Date, fixed(","),"")
x = str_replace_all(x, fixed(" "),"")
data1$Date = as.Date(x,"%B%d%Y")

ggplot(data1, aes(x=Date, y=Close)) + geom_line(col="blue") + theme_tq()

** changepoint detection using non-parametric sequential method **

As seen from the above graph, the prices have variable fluctuations, so we take the log of price to reduce it. For fitting a sequential changepoint detection, the data should be uncorrelated. In figure, we can clearly see that the prices are correlated. To remove correlations and make series stationary, we take the first order difference of time series.

diff.data = data.frame(x=1:1654,diff1.price = diff(log(data1$Close)))
ggplot(data=diff.data, aes(x=x,y=diff1.price)) + geom_line(col='steelblue',alpha=0.8)

** Hypothesis test to check the stationariy **

adf.test(diff.data$diff1.price)

## Warning in adf.test(diff.data$diff1.price): p-value smaller than printed p-
## value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff.data$diff1.price
## Dickey-Fuller = -10.108, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary

The p-value of test confirms that the series has become stationary. Further we analyze the distribution of differenced series based on which we can apply parametric or non-parametric method.

ggplot(data=diff.data, aes(x=diff1.price, y=..density..)) + geom_histogram(fill='steelblue',color='blue') + geom_density(col='firebrick',lwd=0.7)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggqqplot(diff.data$diff1.price)

There is a strong deviation from normality. Let’s confirm it with Shapiro wiki normality test.

shapiro.test(diff.data$diff1.price)

## 
##  Shapiro-Wilk normality test
## 
## data:  diff.data$diff1.price
## W = 0.85909, p-value < 2.2e-16

This normality test confirm our visual interpretation that the data is not normally distributed, So we will use non-parametric methods for sequential changepoint detection.

we created S4 cpm object to monitor the test statistics and other details closely.

detectiontimes <- numeric()
changepoints <- numeric()
seq.data = diff.data$diff1.price
n <- length(diff.data$diff1.price)
cpm <- makeChangePointModel(cpmType = 'Mood',ARL0 = 500, startup = 20)
i <- 0
test_stats = list()

while (i < n) {
  i <- i+1
  cpm <- processObservation(cpm, seq.data[i])
  if (changeDetected(cpm)) {
    cat(sprintf("Change detected at observation %d\n",i))
    detectiontimes <- c(detectiontimes, i)
    Ds <- getStatistics(cpm)
    test_stats[[i]] = Ds
    tau <- which.max(Ds)
    if (length(changepoints) > 0)
      tau <- tau + changepoints[length(changepoints)]
    changepoints <- c(changepoints, tau)
    cpm <- cpmReset(cpm)
    i <- tau
  }
}

## Change detected at observation 56
## Change detected at observation 202
## Change detected at observation 229
## Change detected at observation 266
## Change detected at observation 308
## Change detected at observation 329
## Change detected at observation 410
## Change detected at observation 372
## Change detected at observation 405
## Change detected at observation 463
## Change detected at observation 494
## Change detected at observation 541
## Change detected at observation 643
## Change detected at observation 729
## Change detected at observation 755
## Change detected at observation 811
## Change detected at observation 999
## Change detected at observation 1030
## Change detected at observation 1051
## Change detected at observation 1065
## Change detected at observation 1160
## Change detected at observation 1178
## Change detected at observation 1195
## Change detected at observation 1246
## Change detected at observation 1308
## Change detected at observation 1345
## Change detected at observation 1421
## Change detected at observation 1485
## Change detected at observation 1540
## Change detected at observation 1651

paste0("The estimated changepoints are ", paste(changepoints, sep = ' ', collapse = ' '))

## [1] "The estimated changepoints are 53 184 215 246 305 308 322 337 382 458 486 515 610 726 740 805 996 1027 1030 1040 1149 1175 1183 1241 1298 1322 1418 1434 1477 1648"

we superimposed this changepoints location on the original series by reversing the transformation.

superimposed.changepoints <- changepoints + 1
superimposed.changepoints

##  [1]   54  185  216  247  306  309  323  338  383  459  487  516  611  727
## [15]  741  806  997 1028 1031 1041 1150 1176 1184 1242 1299 1323 1419 1435
## [29] 1478 1649

** Plotting these changepoints on the original data **

ggplot(data1, aes(x=Date, y=Close)) + geom_line(col="blue",lwd=0.8,col='blue') +  geom_vline(xintercept = rev(data1$Date)[superimposed.changepoints],col='firebrick',alpha=0.7) +
  theme_bw() +  theme(axis.line = element_line(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    panel.background = element_blank())

## Warning: The plyr::rename operation has created duplicates for the
## following name(s): (`colour`)

paste0("The time for the estimated changepoints are ", paste(rev(data1$Date)[superimposed.changepoints], collapse='  '))

## [1] "The time for the estimated changepoints are 2013-06-20  2013-10-29  2013-11-29  2013-12-30  2014-02-27  2014-03-02  2014-03-16  2014-03-31  2014-05-15  2014-07-30  2014-08-27  2014-09-25  2014-12-29  2015-04-24  2015-05-08  2015-07-12  2016-01-19  2016-02-19  2016-02-22  2016-03-03  2016-06-20  2016-07-16  2016-07-24  2016-09-20  2016-11-16  2016-12-10  2017-03-16  2017-04-01  2017-05-14  2017-11-01"