We will explore which factors best predict the likelihood of Bitcoin price closing at a price higher vs the previous week.
Firstly, we will load all the libraries we need for this exercise and start to collect our price data from Yahoo Finance API. The choice of explanatory variables are based on numerous literatures on Bitcoin, such as Bitcoin’s trading volume, as well as macro factors such as US broad indices, oil prices, gold prices, US treasury yields, inflation, dollar index, volatility and others are seen as important contributors to Bitcoin’s price movement. On top of this I will also include a categorical explanatory variable related to Bitcoin halving which many experts believe have strong impact on Bitcoin’s price performance.
library(yahoofinancer)
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(dplyr)
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:xts':
##
## first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(scales)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(corrplot)
## corrplot 0.92 loaded
library(readr)
##
## Attaching package: 'readr'
## The following object is masked from 'package:scales':
##
## col_factor
library(class)
library(ggcorrplot)
library(caret)
## Loading required package: lattice
#Set up the start and end date
start_date <- as.Date("2014-09-15")
end_date <- as.Date("2023-10-31")
#load the data from API
symbols <- c("BTC-USD", "^TNX", "^VIX", "DX-Y.NYB", "^IXIC", "^DJI", "^GSPC", "GC=F")
data_frames_list <- list()
for (symbol in symbols) {
data <- as.data.frame(getSymbols(symbol, src = "yahoo", from = start_date, to = end_date, auto.assign = FALSE, periodicity = "weekly"))
data_frames_list[[symbol]] <- data
}
#Rename list elements with the synbols
names(data_frames_list) <- symbols
#Load each symbol into separate dataframe
btc<- data_frames_list$`BTC-USD` %>%
select(`BTC-USD.Close`)
treasury<- data_frames_list$`^TNX` %>%
select(TNX.Close)
vix<- data_frames_list$`^VIX` %>%
select(VIX.Close)
dxy <- data_frames_list$`DX-Y.NYB` %>%
select(`DX-Y.NYB.Close`)
nasdaq<- data_frames_list$`^IXIC` %>%
select(IXIC.Close)
dow<- data_frames_list$`^DJI` %>%
select(DJI.Close)
sp500<- data_frames_list$`^GSPC` %>%
select(GSPC.Close)
gold <- data_frames_list$`GC=F` %>%
select(`GC=F.Close`)
#Get BTC vol data
btc_vol <- as.data.frame(getSymbols("BTC-USD", src = "yahoo", from = start_date, to = end_date, auto.assign = FALSE, periodicity="weekly")) %>%
select(`BTC-USD.Volume`)
Create Bitcoin halving dataframe
Next, we will need to create the Bitcoin halving data on weekly basis. Due to the data availability of certain predictors, we are unable to go back 10 years, so we will start at 15 September 2014. These are the dates for Bitcoin halving between 1 January 2015 and current date:
Since many believe that halving–both anticipation and post–are usually the strongest predictor of Bitcoin price, the following will be applied:
Value of 1 * 3 months before actual halving date * 6 months post actual halving date
Value of 0 * Any time periods outside of the one specified above
#Set up the halving dates and date ranges
halving_dates <- as.Date(c("2016-07-09", "2020-05-11"))
date_sequence <- seq(start_date, to = end_date, by = "weeks")
#Initializes the value column with 0 first, before we change some to 1 based on the interval
values <- rep(0, length(date_sequence))
# Define the number of weeks before and after each halving date
weeks_before <- 3*4
weeks_after <- 6*4
# Calculate the date ranges for each halving
halving_ranges <- lapply(halving_dates, function(halving_date) {
before_start <- halving_date - weeks(weeks_before)
after_end <- halving_date + weeks(weeks_after)
interval(before_start, after_end)
})
# Set the value to 1 for dates within the halving range, and 0 otherwise:
for (halving_range in halving_ranges) {
values <- ifelse(date_sequence %within% halving_range, 1, 0)
}
# Create a dataframe with Date and Value columns
df_halving <- data.frame(Date = date_sequence, Value = values)
#Calculate the proportion of halving days vs non halving days
table(df_halving$Value)
##
## 0 1
## 440 37
prop.table(table(df_halving$Value))
##
## 0 1
## 0.92243187 0.07756813
#Make sure Value is a factor
df_halving = df_halving %>%
mutate(Value=as.factor(Value))
Based on the above, ~92% of our categorical variable dataset are the dates that technically have no influence from Bitcoin halving, while ~8% are within the halving date range.
Next, we move on to ensure the following:
data[i] / data[i-1] - 1.#Bitcoin
btc$Date <- rownames(btc)
rownames(btc) <- NULL
btc<- btc %>%
select(Date, everything())
#Treasury
treasury$Date <- rownames(treasury)
rownames(treasury) <- NULL
treasury<- treasury %>%
select(Date, everything())
#VIX
vix$Date <- rownames(vix)
rownames(vix) <- NULL
vix<- vix %>%
select(Date, everything())
#DXY
dxy$Date <- rownames(dxy)
rownames(dxy) <- NULL
dxy<- dxy %>%
select(Date, everything())
#NASDAQ
nasdaq$Date <- rownames(nasdaq)
rownames(nasdaq) <- NULL
nasdaq<- nasdaq %>%
select(Date, everything())
#DOW
dow$Date <- rownames(dow)
rownames(dow) <- NULL
dow<- dow %>%
select(Date, everything())
#SP500
sp500$Date <- rownames(sp500)
rownames(sp500) <- NULL
sp500<- sp500 %>%
select(Date, everything())
#GOLD
gold$Date <- rownames(gold)
rownames(gold) <- NULL
gold<- gold %>%
select(Date, everything())
#BTC vol
btc_vol$Date <- rownames(btc_vol)
rownames(btc_vol)<- NULL
btc_vol<- btc_vol %>%
select(Date, everything())
#Coerce all datasets to a sequence of common dates
common_dates <- seq(as.Date("2014-09-15"), as.Date("2023-10-31"), by = "weeks")
bitcoin_df <- merge(data.frame(Date = common_dates), btc, by = "Date", all.x = TRUE)
halving_df <- merge(data.frame(Date = common_dates), df_halving, by = "Date", all.x = TRUE)
treasury_df <- merge(data.frame(Date = common_dates), treasury, by = "Date", all.x = TRUE)
vix_df <- merge(data.frame(Date = common_dates), vix, by = "Date", all.x = TRUE)
dxy_df <- merge(data.frame(Date = common_dates), dxy, by = "Date", all.x = TRUE)
nasdaq_df<- merge(data.frame(Date = common_dates), nasdaq, by = "Date", all.x = TRUE)
dow_df<- merge(data.frame(Date = common_dates), dow, by = "Date", all.x = TRUE)
sp500_df <- merge(data.frame(Date = common_dates), sp500, by = "Date", all.x = TRUE)
gold_df <- merge(data.frame(Date = common_dates), gold, by = "Date", all.x = TRUE)
btc_vol_df <- merge(data.frame(Date = common_dates), btc_vol, by="Date", all.x=TRUE)
#Fill the missing values
bitcoin_df$`BTC-USD.Close` <- na.locf(bitcoin_df$`BTC-USD.Close`)
halving_df$Value <- na.locf(halving_df$Value)
treasury_df$TNX.Close <- na.locf(treasury$TNX.Close)
vix_df$VIX.Close <- na.locf(vix_df$VIX.Close)
dxy_df$`DX-Y.NYB.Close`<- na.locf(dxy_df$`DX-Y.NYB.Close`)
nasdaq_df$IXIC.Close<-na.locf(nasdaq_df$IXIC.Close)
dow_df$DJI.Close<- na.locf(dow_df$DJI.Close)
sp500_df$GSPC.Close <- na.locf(sp500_df$GSPC.Close)
gold_df$`GC=F.Close`<- na.locf(gold_df$`GC=F.Close`)
btc_vol_df$`BTC-USD.Volume` <- na.locf(btc_vol_df$`BTC-USD.Volume`)
#Final check to ensure no missing values
any(is.na(bitcoin_df))
## [1] FALSE
any(is.na(treasury_df))
## [1] FALSE
any(is.na(vix_df))
## [1] FALSE
any(is.na(dxy_df))
## [1] FALSE
any(is.na(nasdaq_df))
## [1] FALSE
any(is.na(dow_df))
## [1] FALSE
any(is.na(sp500_df))
## [1] FALSE
any(is.na(gold_df))
## [1] FALSE
any(is.na(df_halving))
## [1] FALSE
any(is.na(btc_vol_df))
## [1] FALSE
#Combine all the data into a single dataframe called final_df
data_frames_list <- list(bitcoin_df$`BTC-USD.Close`, df_halving$Value, treasury_df$TNX.Close, vix_df$VIX.Close, dxy_df$`DX-Y.NYB.Close`, nasdaq_df$IXIC.Close, dow_df$DJI.Close, sp500_df$GSPC.Close, gold_df$`GC=F.Close`, btc_vol_df$`BTC-USD.Volume`)
bitcoin_date <- bitcoin_df$Date
final_df <- data.frame(Date = bitcoin_date, do.call(cbind, data_frames_list))
#Rename the new columns for readability
new_column_names <- c("Date", "Bitcoin", "Halving", "Treasury", "VIX", "DXY", "Nasdaq", "Dow", "SP500", "Gold", "BTC_Vol")
colnames(final_df) <- new_column_names
#set Halving as factor (1 = No Bitcoin Halving Effects, 2 = Bitcoin Halving Effects)
final_df$Halving = as.factor(final_df$Halving)
#Include one lagged term for BTC_vol--as this is not a variable that is known ahead of time, but we would like to check if its lagged value has any predictive power
final_df<- final_df %>%
mutate(BTC_Vol_Lagged = lag(BTC_Vol, 1))
#final check
head(final_df)
## Date Bitcoin Halving Treasury VIX DXY Nasdaq Dow SP500
## 1 2014-09-15 398.821 1 2.587 12.11 84.80 4579.79 17279.74 2010.40
## 2 2014-09-22 377.181 1 2.535 14.85 85.64 4512.19 17113.15 1982.85
## 3 2014-09-29 320.510 1 2.447 14.55 86.69 4475.62 17009.69 1967.90
## 4 2014-10-06 378.549 1 2.307 21.24 85.91 4276.24 16544.10 1906.13
## 5 2014-10-13 389.546 1 2.199 21.99 85.19 4258.44 16380.41 1886.76
## 6 2014-10-20 354.704 1 2.273 16.11 85.73 4483.72 16805.41 1964.58
## Gold BTC_Vol BTC_Vol_Lagged
## 1 1215.3 156903400 NA
## 2 1214.1 186772600 156903400
## 3 1192.2 276657896 186772600
## 4 1221.0 341152804 276657896
## 5 1238.3 156902070 341152804
## 6 1231.2 113691800 156902070
Correlation Matrix
Next, we calculate correlation for each of the variables in the dataset. Consistent with past literatures, US broad indices look like they are able to explain Bitcoin price movements better than the other variables such as treasury rates, which importance may or may not have been overinflated by the media.
#Calculate correlation
cor_matrix <- cor(final_df[, sapply(final_df, is.numeric)], use = "pairwise.complete.obs")
corrplot(cor_matrix, method = "color")
Convert dataset from absolute to relative terms
value(t) / value(t-1) - 1.#Extract the Date column to bind it back later
date_column <- final_df$Date
Halving <- final_df$Halving
numeric_columns<- final_df %>%
select_if(is.numeric)
#return function
calculate_returns <- function(column) {
return((column / lag(column, default = first(column))) - 1)
}
#apply function to all numeric columns
numeric_columns_relative<- lapply(numeric_columns, calculate_returns)
#Bind the Date column back to the resulting dataframe
weekly_returns_df <- cbind(Date = date_column, Halving = Halving, as.data.frame(numeric_columns_relative))
weekly_returns_df$Halving <- ifelse(weekly_returns_df$Halving == "1", 0, 1)
#Remove the entire first row of the resulting dataframe
weekly_returns_df <- weekly_returns_df[-c(1,2), ]
#Final check
head(weekly_returns_df)
## Date Halving Bitcoin Treasury VIX DXY
## 3 2014-09-29 0 -0.15024879 -0.034714026 -0.02020203 0.012260662
## 4 2014-10-06 0 0.18108327 -0.057212956 0.45979378 -0.008997563
## 5 2014-10-13 0 0.02905034 -0.046814063 0.03531073 -0.008380878
## 6 2014-10-20 0 -0.08944253 0.033651716 -0.26739423 0.006338783
## 7 2014-10-27 0 -0.08122832 0.027276743 -0.12911240 0.013414136
## 8 2014-11-03 0 0.11467606 -0.009850108 -0.06486100 0.008747723
## Nasdaq Dow SP500 Gold BTC_Vol
## 3 -0.0081046731 -0.006045698 -0.007539628 -0.018038073 0.48125526
## 4 -0.0445479906 -0.027372037 -0.031388800 0.024157063 0.23312152
## 5 -0.0041626036 -0.009894129 -0.010161949 0.014168754 -0.54008272
## 6 0.0529020666 0.025945626 0.041245281 -0.005733746 -0.27539643
## 7 0.0327897399 0.034816727 0.027217061 -0.048814147 -0.05819329
## 8 0.0003864503 0.010546560 0.006872969 -0.001280847 0.09075607
## BTC_Vol_Lagged
## 3 0.19036681
## 4 0.48125526
## 5 0.23312152
## 6 -0.54008272
## 7 -0.27539643
## 8 -0.05819329
#Change Bitcoin into factor with two levels i.e. 0 and 1
weekly_returns_df$Bitcoin <- ifelse(weekly_returns_df$Bitcoin>0, 1,0)
weekly_returns_df$Bitcoin <- as.factor(weekly_returns_df$Bitcoin)
#Check balance of our target variable
prop.table(table(weekly_returns_df$Bitcoin))
##
## 0 1
## 0.4463158 0.5536842
Note that as the proportion of the target variable is somewhat balanced, there is no need to apply over/undersampling procedure.At this stage, the data wrangling process is completed.
Introduction to Variables in the dataset
Cross Validation
80% of the datasets will be used for training while the remaining 20% will be used for testing. After splitting, we would also be running a check to see if target variable class proportion remains somewhat balanced.
RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(100)
index <- sample(x = nrow(weekly_returns_df), size = nrow(weekly_returns_df)*0.8)
#Splitting
weekly_returns_train <- weekly_returns_df[index,]
weekly_returns_test <- weekly_returns_df[-index,]
#Check prop table for target variable in the train dataset
prop.table(table(weekly_returns_train$Bitcoin))
##
## 0 1
## 0.4368421 0.5631579
No predictors
We will start with looking at the likelihood of Bitcoin closing weekly positive when there are no predictors involved. Bearing in mind that the dataset is from September 2016 to current date, and the train dataset contains (randomly) 80% of the entire dataset.
#The likelihood of Bitcoin going up in absence of any predictor variables
model_null<- glm(Bitcoin ~ 1, data = weekly_returns_train, family = "binomial")
summary(model_null)
##
## Call:
## glm(formula = Bitcoin ~ 1, family = "binomial", data = weekly_returns_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2540 0.1034 2.456 0.0141 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 520.71 on 379 degrees of freedom
## Residual deviance: 520.71 on 379 degrees of freedom
## AIC: 522.71
##
## Number of Fisher Scoring iterations: 3
#Likelihood of Bitcoin closing positive on weekly basis is approximately 1.3x, or 30% more likely than closing negative
exp(model_null$coefficients)
## (Intercept)
## 1.289157
#The probability of Bitcoin closing weekly positive is 78% given no changes in explanatory variables
1/(1+exp(-(exp(model_null$coefficients))))
## (Intercept)
## 0.7840044
Selected predictors
We include all the numerical and categorical predictor variables to predict Bitcoin price movement.
model_all <- glm(Bitcoin~Treasury+VIX+DXY+Nasdaq+Dow+SP500+BTC_Vol_Lagged, weekly_returns_train, family="binomial")
summary(model_all)
##
## Call:
## glm(formula = Bitcoin ~ Treasury + VIX + DXY + Nasdaq + Dow +
## SP500 + BTC_Vol_Lagged, family = "binomial", data = weekly_returns_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.23199 0.10990 2.111 0.0348 *
## Treasury 0.09682 1.77455 0.055 0.9565
## VIX -0.43605 0.98073 -0.445 0.6566
## DXY -6.32673 12.02063 -0.526 0.5987
## Nasdaq 6.87330 15.88250 0.433 0.6652
## Dow 3.70218 21.63181 0.171 0.8641
## SP500 -3.68028 35.96630 -0.102 0.9185
## BTC_Vol_Lagged 0.28096 0.35611 0.789 0.4301
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 520.71 on 379 degrees of freedom
## Residual deviance: 513.97 on 372 degrees of freedom
## AIC: 529.97
##
## Number of Fisher Scoring iterations: 4
There are a few things to note from this result:
Check Multicollinearity
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
vif(model_all)
## Treasury VIX DXY Nasdaq Dow
## 1.125943 2.452814 1.263733 16.722715 23.189644
## SP500 BTC_Vol_Lagged
## 62.438391 1.005388
We find that perhaps not surprisingly that the US broad market indices (Nasdaq, Dow and SP500) are highly correlated to one another, so including more than one broad indices in the model does not add any informational value to our target variable. There appears to be no issues with the other predictor variables (all values less than 10).
Inspect AIC values and apply stepwise method to select predictors
Comparing the AIC values between model_null and model_all shows us that the model including all the predictors do not add any significant value to the model. Next, we will utilize stepwise method to help us choose the most suitable predictor variable.
The new model, model_step, has the lowest AIC level and is therefore a better model compared to model_null and model_all
#Compare AIC score between model_null and model_all
model_null$aic
## [1] 522.7125
model_all$aic
## [1] 529.9694
#Apply stepwise method
model_step <- step(object=model_all, direction = "backward", trace=F)
#Compare AIC score between model_null, model_all and model_step
model_step$aic
## [1] 519.1228
Evidently, the best AIC score is obtained by model_step, which will be our final model for logistic regression.
What are the most significant predictors of Bitcoin’s price movement according to the stepwise model?
summary(model_step)
##
## Call:
## glm(formula = Bitcoin ~ Nasdaq, family = "binomial", data = weekly_returns_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2366 0.1044 2.267 0.0234 *
## Nasdaq 8.9826 3.8593 2.327 0.0199 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 520.71 on 379 degrees of freedom
## Residual deviance: 515.12 on 378 degrees of freedom
## AIC: 519.12
##
## Number of Fisher Scoring iterations: 4
Model insights
Turns out, Nasdaq is the most significant predictor variable for Bitcoin’s weekly prices according to the stepwise method.
Several observations regarding model_step:
model_step, we cannot rule out the fact that they may be
statistically significant over the short term, bearing in mind that this
logistic regression spans over 9 years.Interpretation of Coefficients
Interpretation of the coefficients of model_step:
#Coefficient interpretations
exp(model_step$coefficients[1])
## (Intercept)
## 1.266977
exp(model_step$coefficients[2])
## Nasdaq
## 7963.025
#Coefficient interpretations - Probability
1/(1+exp(-(exp(model_step$coefficients[1]))))
## (Intercept)
## 0.7802249
1/(1+exp(-(exp(model_step$coefficients[2]))))
## Nasdaq
## 1
Model Prediction
We will utilize the best fitting model, which is model_step to make the prediction for evaluation.
weekly_returns_test$Pred<- predict(object=model_step, newdata = weekly_returns_test, type="response")
weekly_returns_test$Pred_label<- ifelse(weekly_returns_test$Pred > 0.5, yes="1", no="0")
head(weekly_returns_test)
## Date Halving Bitcoin Treasury VIX DXY
## 3 2014-09-29 0 0 -0.034714026 -0.02020203 0.012260662
## 4 2014-10-06 0 1 -0.057212956 0.45979378 -0.008997563
## 10 2014-11-17 0 0 -0.002155119 -0.03080397 0.008911217
## 13 2014-12-08 0 0 -0.088426529 0.77291836 -0.010858628
## 19 2015-01-19 0 1 0.001101914 -0.20477331 0.026805052
## 21 2015-02-02 0 0 0.157014937 -0.17548872 -0.001054917
## Nasdaq Dow SP500 Gold BTC_Vol BTC_Vol_Lagged
## 3 -0.008104673 -0.006045698 -0.007539628 -0.01803807 0.4812553 0.1903668
## 4 -0.044547991 -0.027372037 -0.031388800 0.02415706 0.2331215 0.4812553
## 10 0.005210615 0.009941758 0.011608894 0.01054852 -0.1443458 0.7829721
## 13 -0.026598213 -0.037750818 -0.035193800 0.02680449 0.7570012 -0.3852450
## 19 0.026648657 0.009195594 0.016044222 0.01229536 -0.4790440 1.2108433
## 21 0.023549948 0.038411989 0.030315932 -0.03488461 -0.4013463 0.6220743
## Pred Pred_label
## 3 0.5408670 1
## 4 0.4592106 0
## 10 0.5703891 1
## 13 0.4994285 0
## 19 0.6168046 1
## 21 0.6102048 1
Model Evaluation
The accuracy of the model_step using Nasdaq as the only predictor variable to Bitcoin’s movement is 60%, which is better than chance but not good as well. Recall / sensitivity is very high at >90% which means the model is able to maximize capturing the positive class. Recall that the target variable proportion was quite balanced when we build the model.
From business perspective, people who care about whether or not Bitcoin closes higher or lower and use this model are most likely speculative traders who use leverage to trade Bitcoin. Speculative leverage traders do not care whether Bitcoin price will go up or down as they can make money either way as long as they correctly position themselves. As such, there is no specific need to minimize either False Positive and False Negative. From performance perspective what we will need to maximize is accuracy i.e. the ability to correctly identify both positive and negative class.
confusionMatrix(data=as.factor(weekly_returns_test$Pred_label), reference=weekly_returns_test$Bitcoin, positive= '1')
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 11 3
## 1 35 46
##
## Accuracy : 0.6
## 95% CI : (0.4944, 0.6992)
## No Information Rate : 0.5158
## P-Value [Acc > NIR] : 0.0614
##
## Kappa : 0.1818
##
## Mcnemar's Test P-Value : 4.934e-07
##
## Sensitivity : 0.9388
## Specificity : 0.2391
## Pos Pred Value : 0.5679
## Neg Pred Value : 0.7857
## Prevalence : 0.5158
## Detection Rate : 0.4842
## Detection Prevalence : 0.8526
## Balanced Accuracy : 0.5890
##
## 'Positive' Class : 1
##
Predictor variables data scaling
Next we will try to see if the KNN model can perform better than Logistic Regression for the same issue.
#exp var
weekly_returns_train_x <- weekly_returns_train %>% select_if(is.numeric)
weekly_returns_test_x <- weekly_returns_test %>%
select(-Pred) %>%
select_if(is.numeric)
# target
weekly_train_y <- weekly_returns_train[,"Bitcoin"]
weekly_test_y <- weekly_returns_test[,"Bitcoin"]
# Scaling
weekly_returns_train_xs <- scale(weekly_returns_train_x)
weekly_returns_test_xs <- scale(weekly_returns_test_x, center = attr(weekly_returns_train_xs, "scaled:center"),
scale = attr(weekly_returns_train_xs, "scaled:scale"))
Predict
# find optimum k
k <- round(sqrt(nrow(weekly_returns_train_x)),0)
k
## [1] 19
#Predict
library(class)
BTC_knn<- knn(train=weekly_returns_train_xs,
test=weekly_returns_test_xs,
cl=weekly_train_y,
k=k)
weekly_returns_test$knn_pred <- BTC_knn
head(weekly_returns_test)
## Date Halving Bitcoin Treasury VIX DXY
## 3 2014-09-29 0 0 -0.034714026 -0.02020203 0.012260662
## 4 2014-10-06 0 1 -0.057212956 0.45979378 -0.008997563
## 10 2014-11-17 0 0 -0.002155119 -0.03080397 0.008911217
## 13 2014-12-08 0 0 -0.088426529 0.77291836 -0.010858628
## 19 2015-01-19 0 1 0.001101914 -0.20477331 0.026805052
## 21 2015-02-02 0 0 0.157014937 -0.17548872 -0.001054917
## Nasdaq Dow SP500 Gold BTC_Vol BTC_Vol_Lagged
## 3 -0.008104673 -0.006045698 -0.007539628 -0.01803807 0.4812553 0.1903668
## 4 -0.044547991 -0.027372037 -0.031388800 0.02415706 0.2331215 0.4812553
## 10 0.005210615 0.009941758 0.011608894 0.01054852 -0.1443458 0.7829721
## 13 -0.026598213 -0.037750818 -0.035193800 0.02680449 0.7570012 -0.3852450
## 19 0.026648657 0.009195594 0.016044222 0.01229536 -0.4790440 1.2108433
## 21 0.023549948 0.038411989 0.030315932 -0.03488461 -0.4013463 0.6220743
## Pred Pred_label knn_pred
## 3 0.5408670 1 1
## 4 0.4592106 0 0
## 10 0.5703891 1 1
## 13 0.4994285 0 1
## 19 0.6168046 1 1
## 21 0.6102048 1 1
confusionMatrix(data=weekly_returns_test$knn_pred,
reference=weekly_returns_test$Bitcoin,
positive="1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 17 10
## 1 29 39
##
## Accuracy : 0.5895
## 95% CI : (0.4838, 0.6894)
## No Information Rate : 0.5158
## P-Value [Acc > NIR] : 0.090742
##
## Kappa : 0.1676
##
## Mcnemar's Test P-Value : 0.003948
##
## Sensitivity : 0.7959
## Specificity : 0.3696
## Pos Pred Value : 0.5735
## Neg Pred Value : 0.6296
## Prevalence : 0.5158
## Detection Rate : 0.4105
## Detection Prevalence : 0.7158
## Balanced Accuracy : 0.5827
##
## 'Positive' Class : 1
##
For this exercise, while the results are not satisfactory, logistic regression model performs slightly better at accuracy of 60% in comparison to KNN model. In conclusion, for Bitcoin leverage trading, monitoring Nasdaq will provide traders with slight advantage against the other uninformed traders.