Bitcoin was first created in 2009 with the aim of producing a currency independent of any central authority, transferable electronically, more or less instantly, with very low transaction fees. Since then Bitoin has attracted interest fromalmost all governments, banks and major tech companies around the world. Some even say that Bitcoin is the money of future. With more and more merchants accepting Bitcoin day by day and many governments legalising bitcoin, it has ushered the adoption of blockchain technology. Since then, there has been a surge in number of organizations working with blockchain. Today the blockchain technology is applicable in almost every domain ranging from payments, Cyber Security, Digital Contracts, Insurance, Healthcare to Green Technology. The unprecedented growth of cryto-Currencies is evident from the fact that the number of crypto Currencies has increased from 1 in 2009 to more than 1,000 today with a total marketcap exceeeding 2 billion dollars.
For the study of how Cryto-Currencies have grown over the year I have taken a dataset with the prices of over 1,100 crypto-currencies. I plan to study the change in number and market cap of cryto-Currencies over the years, that appears to have increased exponentially. However, not all the crypto-Currencies have been sucessful and so I will also analyse how many of them have eventually collapsed. As Bitcoin and Ethereum are the most famous crypto-Currencies, I will analyze their growth in depth and try to make a model which can predict if their price will increase or decrease in the next 24 hours.
Study of Crypto Currencies will help us to get an idea of how big the crypto currency market is and appreciate how the marketcap has increased from 0 to more than 2 billion dollars in a span of 9 years. For those who want to invest in crypto currencies, this analysis will help to get an idea of how much returns the investors have earned in the past and get a clearer picture than mere speculations.
library(readr) #For reading csv file
library(tidyverse) #For data cleaning
library(dplyr) #For Data transformation
library(ggplot2) #For plotting graphs
library(lubridate) #For date time conversions
library(wordcloud) #For generating Word Cloud
library(DT) #For diplaying data in nice format
library(ggthemes) #For adding themes to plots and graphs
library(viridis) #For adding Color Maps
library(treemap) #For plotting treemap
library(timetk) #For running time series regression
library(tidyquant) #For time data manupilation in Time series modeling
library(caret) #for making Confusion MatrixI have used the following datasets for this project:
Importing Datasets in R from csv file
crypto <- read_csv("C:/Users/AMOUL SINGHI/Downloads/BANA/Study/Fall 2017/Data Wrangling in R/R Project/all-crypto-currencies/crytpo.csv")
Bitcoin <- read_csv("C:/Users/AMOUL SINGHI/Downloads/BANA/Study/Fall 2017/Data Wrangling in R/R Project/Coinmetrics/Bitcoin.csv")
eth <- read_csv("C:/Users/AMOUL SINGHI/Downloads/BANA/Study/Fall 2017/Data Wrangling in R/R Project/Coinmetrics/eth.csv")I looked at the basic structure of all the datasets and noticed that there are some missing values in the crypto dataset (so I excluded those observations from the dataset). Also, I converted coin and symbol variables to Factor data type. 3 more variables Day, Month and Year are extracted from the date and the datatype of Date has been changed from character to Date format. I also noticed that there are some variables which are not required in my analysis so I dropped them, For ex, I dropped variance from crypto dataset, Fees and generated coins from Bitcoin and Ethereum dataset.
Cleaning crypto dataset and creating new variables
names(crypto)
str(crypto)
summary(crypto)
crypto$symbol <- as.factor(crypto$symbol)
crypto$coin <- as.factor(crypto$coin)
crypto <- crypto %>%
separate(date, c("day", "month","year"), sep = "-")
crypto$date <- make_date(crypto$year, crypto$month, crypto$day)
crypto <- crypto %>%
select(-c(variance))
crypto <- na.omit(crypto)
colSums(is.na(crypto))
crypto$day <- as.integer(crypto$day)
crypto$month <- as.integer(crypto$month)Cleaning Bitcoin Dataset and creating new variables
summary(Bitcoin)
str(Bitcoin)
Bitcoin <- Bitcoin %>%
separate(date, c("year", "month","day"), sep = "-")
Bitcoin$date <- make_date(Bitcoin$year, Bitcoin$month, Bitcoin$day)
Bitcoin <- Bitcoin %>%
select(-c(fees,generatedCoins))
colSums(is.na(Bitcoin))
colnames(Bitcoin)
colnames(Bitcoin) <- c("year", "month", "day","txvolume","txcount", "marketcap", "Price", "Volume", "date")
Bitcoin$day <- as.integer(Bitcoin$day)
Bitcoin$month <- as.integer(Bitcoin$month)Cleaning Ethereum dataset and creating new variables
summary(eth)
names(eth)
str(eth)
eth <- eth %>%
separate(date, c("year", "month","day"), sep = "-")
eth$date <- make_date(eth$year, eth$month, eth$day)
eth <- eth %>%
select(-c(fees,generatedCoins))
colSums(is.na(eth))
colnames(eth)
colnames(eth) <- c("year", "month", "day","txvolume","txcount", "marketcap", "Price", "Volume", "date")
eth$day <- as.integer(eth$day)
eth$month <- as.integer(eth$month)I will be using these additional variables in making my logistic regression model
Bitcoin <- crypto %>%
filter(coin == "Bitcoin") %>%
select(date, volatility) %>%
inner_join(Bitcoin)
eth <- crypto %>%
filter(coin == "Ethereum") %>%
select(date, volatility) %>%
inner_join(eth)Crypto dataset has 4 variables related to the price of Crypto Currency which include open, high, low and close prices for the coin. It also has Voulume of coins traded and total marketcap for that day. 3 more variables are extracted from date to get the day, month and year respectively.
| Information | Crypto Dataset |
|---|---|
| Dimention | 570276 * 13 |
| Date Range | 20-Apr-13 to 22-Sep-17 |
| Number of Numerical variables | 10 |
| Number of Factor variable | 2 |
| Number of Crypto Currency | 1170 |
| Oldest Crypto Currency | Bitcoin |
We have joined Bitcoin dataset with crypto dataset on date to get 1 more variable volatility which will help us in logistic regression to predict if the price of Bitcoin will go up or down in next 24 hours.
| Information | Bitcoin Dataset |
|---|---|
| Dimention | 1637 * 10 |
| Date Range | 28-Apr-13 to 20-Sep-17 |
| Number of Numerical variables | 8 |
| Max Price of Bitcoin | 6011 USD |
We have joined Ethereum dataset with crypto dataset on date to get 1 more variable volatility which will help us in logistic regression to predict if the price of Ethereum will go up or down in next 24 hours.
| Information | Ethereum Dataset |
|---|---|
| Dimention | 806 * 10 |
| Date Range | 7-Aug-15 to 20-Sep-17 |
| Number of Numerical variables | 7 |
| Max Price of Ethereum | 401.5 USD |
The above visualization explains the whole cryptocurrency market is primaraily dominated by two currencies primarily – Bitcoin and Etherum and even the second ranked Etherum is far behind than Bitcoin which is driving the Crypto-Currency market. But it is also fascinating (and shocking at the same time) that both Bitcoin and Etherum together create a 130 Billion Dollar (USD) market.
In last one year Crypto-curriencies has been all over the news primarily becuase the rate at which Bitcoin has grown. Above visualization shows how the Market Cap of Bitcoin, Ethereum and total Crypto market has grown over the years. The Blue area shows the combined Market-Cap of all Crypto-Currency.
The above visualization show us how the prices of Bitcoin and Ethereum have increased over the years. On first look it looks as if Bitcoin is growing at a higher rate than Ethereum. We will check this further in our analysis.
As we see that the total market cap of Crypto-Currency stands more than 2 Billion Dollar but that dosent mean all the crypto Curriencies has done well. From the above visualization we can see that out of 348 currencies present at start of 2016 around 30% percent of them have failed i.e. lost more than 50% of their value, 10 percent are struggling and rest 60% are doing well till now.
Bitcoin and other Crypto-Currencies are emerging as one of the favourate investing option especially among techies. Many early adoptors of Bitcoin have made millions from it and some Crypto-Currency analysis even say that the prices of Bitcion and other currencies are going to increase further. Here we look at how much your investment would have grown if you had invested just 100$ every month in Bitcoin and Ethereum since they got famous.
If you have invested 100 dollars every month in Bitcoin since start of 2014 your investment would have grown to 43,566 dollars as on 1st October 2017 as compared to a total investment of 4,600 Dollars.
If you have invested 100 dollars every month in Ethereum since start of 2016 your investment would have grown to 83,946 dollars as on 1st October 2017 as compared to a total investment of 2,200 Dollars. From here we can see that Ethereum would have given you better returns as compared to Bitcoin
From the above visualization we can see that since 2017 NEM has given the highest returns followed by Ethereum, Ripple and Dash.
As Crypto-Currencies are emerging as one of the favourate investing option. Here we try to make a logistic model to predict if the price of Bitcoin is going to rise or fall in next 24 hours and another a forecast of Bitcoin prices for next 1 year.
For Logistic Regression we create 8 more variable 5 of which gives the day on previous day price ratio for past 5 days and 3 gives the day on previous day Transaction Volume ratio for past 3 days.
Bitcoin_log <- Bitcoin %>%
arrange(date) %>%
select("date", "Price", "year", "txvolume")
Bitcoin_log$lag1 <- 0
Bitcoin_log$lag2 <- 0
Bitcoin_log$lag3 <- 0
Bitcoin_log$lag4 <- 0
Bitcoin_log$lag5 <- 0
Bitcoin_log$vollag1 <- 0
Bitcoin_log$vollag2 <- 0
Bitcoin_log$vollag3 <- 0
for (i in 6:nrow(Bitcoin_log))
{
Bitcoin_log$lag1[i] <- (Bitcoin_log$Price[i] / Bitcoin_log$Price[i-1])
Bitcoin_log$lag2[i] <- (Bitcoin_log$Price[i-1] / Bitcoin_log$Price[i-2])
Bitcoin_log$lag3[i] <- (Bitcoin_log$Price[i-2] / Bitcoin_log$Price[i-3])
Bitcoin_log$lag4[i] <- (Bitcoin_log$Price[i-3] / Bitcoin_log$Price[i-4])
Bitcoin_log$lag5[i] <- (Bitcoin_log$Price[i-4] / Bitcoin_log$Price[i-5])
Bitcoin_log$vollag1[i] <- (Bitcoin_log$txvolume[i] / Bitcoin_log$txvolume[i-1])
Bitcoin_log$vollag2[i] <- (Bitcoin_log$txvolume[i-2] / Bitcoin_log$txvolume[i-2])
Bitcoin_log$vollag3[i] <- (Bitcoin_log$txvolume[i-3] / Bitcoin_log$txvolume[i-3])
}
Bitcoin_log$dir <- 1
for (i in 1:nrow(Bitcoin_log))
{
if(Bitcoin_log$lag1[i] < 1)
{
Bitcoin_log$dir[i] <- 0
}else{
Bitcoin_log$dir[i] <- 1
}
}
#Logistic Regression
set.seed(1111)
trainingRowIndex <- sample(6:nrow(Bitcoin_log), 0.8*nrow(Bitcoin_log)) #Dividing Test and TrainingDataset
trainingData <- Bitcoin_log[trainingRowIndex, ] # model training data
testData <- Bitcoin_log[-trainingRowIndex, ] # model test data
glm.fit = glm(dir ~ vollag3 + vollag2 + lag4 + lag5, data = trainingData, family = binomial)
#Predict on remaining data
glm.probs <- predict(glm.fit, newdata = testData,type = "response")
glm.pred = rep(0, 328)
glm.pred[glm.probs > 0.5] = 1
#Lets see how accurate is our prediction
confusionMatrix(glm.pred, testData$dir)## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 8 2
## 1 129 189
##
## Accuracy : 0.6006
## 95% CI : (0.5454, 0.654)
## No Information Rate : 0.5823
## P-Value [Acc > NIR] : 0.2697
##
## Kappa : 0.0551
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.05839
## Specificity : 0.98953
## Pos Pred Value : 0.80000
## Neg Pred Value : 0.59434
## Prevalence : 0.41768
## Detection Rate : 0.02439
## Detection Prevalence : 0.03049
## Balanced Accuracy : 0.52396
##
## 'Positive' Class : 0
##
From the above Confustion Matrix we can see that we are able to predict 60% of the times if the price of Crypto_currency will increase or now.
Bitcoin_time_mod <- Bitcoin %>%
filter(day == 01) %>%
arrange(date) %>%
select("date", "Price")
Bitcoin_time_mod_aug <- Bitcoin_time_mod %>%
tk_augment_timeseries_signature()
fit_lm <- lm(Price ~ ., data = select(Bitcoin_time_mod_aug, -c(date, diff)))
Bitcoin_time_mod_idx <- Bitcoin_time_mod %>%
tk_index()
future_idx <- Bitcoin_time_mod_idx %>%
tk_make_future_timeseries(n_future = 12)
new_data_tbl <- future_idx %>%
tk_get_timeseries_signature()
pred <- predict(fit_lm, newdata = select(new_data_tbl, -c(index, diff)))
predictions_tbl <- tibble(date = future_idx,value = pred)
Bitcoin_time_mod %>%
ggplot(aes(x = date, y = Price)) +
# Training data
geom_line(color = "orange") +
geom_point(color = "orange") +
# Predictions
geom_line(aes(y = value), color = "Blue", data = predictions_tbl) +
geom_point(aes(y = value), color = "Blue", data = predictions_tbl) +
# Aesthetics
theme_tq() +
labs(title = "Bitcoin Price Forecast: Time Series Machine Learning",
subtitle = "Using basic multivariate linear regression can yield accurate results") The above graph shows us how the price of Crypto-Currencies is going to move for next 1 year.
We started the analysis with the aim of understanding how the overall Cyrpto-Currencies markets have grown over the years and how Investing in Bitcoin and Ethereum would have given your returns.
We see the Bitcoin has dominated the Crypto-Currency market and have a market share of more than 50% followed by Ethereum which is around 20%, togeather these to have a marketcap of around 130 Billion Dollars out of a total marketcap of around 200 Billion Dollars. We also learned that not all the Crypto-Currencies have performes well around 30% percent of currencies them have fallen since 2016.
Though risky Crypto-Currency is the higest growing investment opportunity in the market today.Bitcoin is the most famous Crypto-Currency but we see that Etherem, NEM and many other have given higher returns. For ex a monthly investment of 100 Dollars in Bitcoin since 2014 would have given us 43,566 Dollars while a monthly investment of 100 Dollars in Ethereum since 2016 would have grown to 83,948.
For making Crypto-Currency trading more profitable, we try to see if we can predict whether the price of Bitcoin will increase or decrease in next 24 hours using a logistinc regression. We are able to predict 60% of the time if the prices will go up or not. For investment purpose we try to predict the prices to Bitcoin for next 1 year using Time Series Method. Unfortunatelly, we are not able to predict the prices of Bitcoing effectivelly using the given data as the price of Bitcoin is highly volatile.