Loading packages
libraries <- c('forecast','astsa','tseries','ggplot2','xts','data.table','gridExtra','tidyquant','knitr','AnomalyDetection','cpm','ggpubr')
sapply(libraries, require, character = T)
## forecast astsa tseries ggplot2
## TRUE TRUE TRUE TRUE
## xts data.table gridExtra tidyquant
## TRUE TRUE TRUE TRUE
## knitr AnomalyDetection cpm ggpubr
## TRUE TRUE TRUE TRUE
setwd("/home/rahul/Documents/Rahul/BTP/Data")
data1 = read.table('bitcoin_price_daily.csv', sep=',', header = T)
head(data1)
data1$Open <- as.numeric(data1$Open)
data1$High <- as.numeric(data1$High)
data1$Low <- as.numeric(data1$Low)
data1$Close <- as.numeric(data1$Close)
data1$Volume <- as.numeric(data1$Volume)
data1$Date <- as.character(data1$Date)
library(stringr)
x = str_replace_all(data1$Date, fixed(","),"")
x = str_replace_all(x, fixed(" "),"")
data1$Date = as.Date(x,"%B%d%Y")
ggplot(data1, aes(x=Date, y=Close)) + geom_line(col="blue") + theme_tq()
** changepoint detection using non-parametric sequential method **
As seen from the above graph, the prices have variable fluctuations, so we take the log of price to reduce it. For fitting a sequential changepoint detection, the data should be uncorrelated. In figure, we can clearly see that the prices are correlated. To remove correlations and make series stationary, we take the first order difference of time series.
diff.data = data.frame(x=1:1654,diff1.price = diff(log(data1$Close)))
ggplot(data=diff.data, aes(x=x,y=diff1.price)) + geom_line(col='steelblue',alpha=0.8)
** Hypothesis test to check the stationariy **
adf.test(diff.data$diff1.price)
## Warning in adf.test(diff.data$diff1.price): p-value smaller than printed p-
## value
##
## Augmented Dickey-Fuller Test
##
## data: diff.data$diff1.price
## Dickey-Fuller = -10.108, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary
The p-value of test confirms that the series has become stationary. Further we analyze the distribution of differenced series based on which we can apply parametric or non-parametric method.
ggplot(data=diff.data, aes(x=diff1.price, y=..density..)) + geom_histogram(fill='steelblue',color='blue') + geom_density(col='firebrick',lwd=0.7)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggqqplot(diff.data$diff1.price)
There is a strong deviation from normality. Let’s confirm it with Shapiro wiki normality test.
shapiro.test(diff.data$diff1.price)
##
## Shapiro-Wilk normality test
##
## data: diff.data$diff1.price
## W = 0.85909, p-value < 2.2e-16
This normality test confirm our visual interpretation that the data is not normally distributed, So we will use non-parametric methods for sequential changepoint detection.
we created S4 cpm object to monitor the test statistics and other details closely.
detectiontimes <- numeric()
changepoints <- numeric()
seq.data = diff.data$diff1.price
n <- length(diff.data$diff1.price)
cpm <- makeChangePointModel(cpmType = 'Mood',ARL0 = 500, startup = 20)
i <- 0
test_stats = list()
while (i < n) {
i <- i+1
cpm <- processObservation(cpm, seq.data[i])
if (changeDetected(cpm)) {
cat(sprintf("Change detected at observation %d\n",i))
detectiontimes <- c(detectiontimes, i)
Ds <- getStatistics(cpm)
test_stats[[i]] = Ds
tau <- which.max(Ds)
if (length(changepoints) > 0)
tau <- tau + changepoints[length(changepoints)]
changepoints <- c(changepoints, tau)
cpm <- cpmReset(cpm)
i <- tau
}
}
## Change detected at observation 56
## Change detected at observation 202
## Change detected at observation 229
## Change detected at observation 266
## Change detected at observation 308
## Change detected at observation 329
## Change detected at observation 410
## Change detected at observation 372
## Change detected at observation 405
## Change detected at observation 463
## Change detected at observation 494
## Change detected at observation 541
## Change detected at observation 643
## Change detected at observation 729
## Change detected at observation 755
## Change detected at observation 811
## Change detected at observation 999
## Change detected at observation 1030
## Change detected at observation 1051
## Change detected at observation 1065
## Change detected at observation 1160
## Change detected at observation 1178
## Change detected at observation 1195
## Change detected at observation 1246
## Change detected at observation 1308
## Change detected at observation 1345
## Change detected at observation 1421
## Change detected at observation 1485
## Change detected at observation 1540
## Change detected at observation 1651
paste0("The estimated changepoints are ", paste(changepoints, sep = ' ', collapse = ' '))
## [1] "The estimated changepoints are 53 184 215 246 305 308 322 337 382 458 486 515 610 726 740 805 996 1027 1030 1040 1149 1175 1183 1241 1298 1322 1418 1434 1477 1648"
we superimposed this changepoints location on the original series by reversing the transformation.
superimposed.changepoints <- changepoints + 1
superimposed.changepoints
## [1] 54 185 216 247 306 309 323 338 383 459 487 516 611 727
## [15] 741 806 997 1028 1031 1041 1150 1176 1184 1242 1299 1323 1419 1435
## [29] 1478 1649
** Plotting these changepoints on the original data **
ggplot(data1, aes(x=Date, y=Close)) + geom_line(col="blue",lwd=0.8,col='blue') + geom_vline(xintercept = rev(data1$Date)[superimposed.changepoints],col='firebrick',alpha=0.7) +
theme_bw() + theme(axis.line = element_line(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
## Warning: The plyr::rename operation has created duplicates for the
## following name(s): (`colour`)
paste0("The time for the estimated changepoints are ", paste(rev(data1$Date)[superimposed.changepoints], collapse=' '))
## [1] "The time for the estimated changepoints are 2013-06-20 2013-10-29 2013-11-29 2013-12-30 2014-02-27 2014-03-02 2014-03-16 2014-03-31 2014-05-15 2014-07-30 2014-08-27 2014-09-25 2014-12-29 2015-04-24 2015-05-08 2015-07-12 2016-01-19 2016-02-19 2016-02-22 2016-03-03 2016-06-20 2016-07-16 2016-07-24 2016-09-20 2016-11-16 2016-12-10 2017-03-16 2017-04-01 2017-05-14 2017-11-01"