Introduction

CryptoCurrencies and their exponential growth

Bitcoin was first created in 2009 with the aim of producing a currency independent of any central authority, transferable electronically, more or less instantly, with very low transaction fees. Since then Bitoin has attracted interest fromalmost all governments, banks and major tech companies around the world. Some even say that Bitcoin is the money of future. With more and more merchants accepting Bitcoin day by day and many governments legalising bitcoin, it has ushered the adoption of blockchain technology. Since then, there has been a surge in number of organizations working with blockchain. Today the blockchain technology is applicable in almost every domain ranging from payments, Cyber Security, Digital Contracts, Insurance, Healthcare to Green Technology. The unprecedented growth of cryto-Currencies is evident from the fact that the number of crypto Currencies has increased from 1 in 2009 to more than 1,000 today with a total marketcap exceeeding 2 billion dollars.

For the study of how Cryto-Currencies have grown over the year I have taken a dataset with the prices of over 1,100 crypto-currencies. I plan to study the change in number and market cap of cryto-Currencies over the years, that appears to have increased exponentially. However, not all the crypto-Currencies have been sucessful and so I will also analyse how many of them have eventually collapsed. As Bitcoin and Ethereum are the most famous crypto-Currencies, I will analyze their growth in depth and try to make a model which can predict if their price will increase or decrease in the next 24 hours.

Study of Crypto Currencies will help us to get an idea of how big the crypto currency market is and appreciate how the marketcap has increased from 0 to more than 2 billion dollars in a span of 9 years. For those who want to invest in crypto currencies, this analysis will help to get an idea of how much returns the investors have earned in the past and get a clearer picture than mere speculations.

Packages Required

library(readr)      #For reading csv file
library(tidyverse)  #For data cleaning
library(dplyr)      #For Data transformation
library(ggplot2)    #For plotting graphs
library(lubridate)  #For date time conversions
library(wordcloud)  #For generating Word Cloud
library(DT)         #For diplaying data in nice format
library(ggthemes)   #For adding themes to plots and graphs
library(viridis)    #For adding Color Maps 
library(treemap)    #For plotting treemap
library(timetk)     #For running time series regression
library(tidyquant)  #For time data manupilation in Time series modeling
library(caret)      #for making Confusion Matrix

Data Preperation

Datasets Used

I have used the following datasets for this project:

  1. Crypto-Currencies Data (crypto)
  • This is a Kaggle Dataset which contains market prices for 1,170 crypto-Currencies till 22 Oct 2017 with open, high, low and close prices, volume of coins traded, market cap, variance and volatility in prices for each day.
  1. Bitcoin Data (Bitcoin)
  • This dataset is taken from Coinmetrics. The dataset contains transaction volume, transaction count, Marketcap, Price and Coins generated from December 2013 to October 2017 for Bitcoin.
  1. Ethereum Data (eth)
  • This dataset is also taken from Coinmetrics. The dataset contains transaction volume, transaction count, Marketcap, Price and Coins generated from August 2015 to October 2017 for Ethereum.

Data Importing

Importing Datasets in R from csv file

crypto <- read_csv("C:/Users/AMOUL SINGHI/Downloads/BANA/Study/Fall 2017/Data Wrangling in R/R Project/all-crypto-currencies/crytpo.csv")
Bitcoin <- read_csv("C:/Users/AMOUL SINGHI/Downloads/BANA/Study/Fall 2017/Data Wrangling in R/R Project/Coinmetrics/Bitcoin.csv")
eth <- read_csv("C:/Users/AMOUL SINGHI/Downloads/BANA/Study/Fall 2017/Data Wrangling in R/R Project/Coinmetrics/eth.csv")

Data Cleaning

I looked at the basic structure of all the datasets and noticed that there are some missing values in the crypto dataset (so I excluded those observations from the dataset). Also, I converted coin and symbol variables to Factor data type. 3 more variables Day, Month and Year are extracted from the date and the datatype of Date has been changed from character to Date format. I also noticed that there are some variables which are not required in my analysis so I dropped them, For ex, I dropped variance from crypto dataset, Fees and generated coins from Bitcoin and Ethereum dataset.

Cleaning crypto dataset and creating new variables

names(crypto)
str(crypto)
summary(crypto)

crypto$symbol <- as.factor(crypto$symbol)
crypto$coin <- as.factor(crypto$coin)

crypto <- crypto %>%
  separate(date, c("day", "month","year"), sep = "-")

crypto$date <- make_date(crypto$year, crypto$month, crypto$day)

crypto <- crypto %>%
  select(-c(variance))

crypto <- na.omit(crypto)
colSums(is.na(crypto))

crypto$day <- as.integer(crypto$day)
crypto$month <- as.integer(crypto$month)

Cleaning Bitcoin Dataset and creating new variables

summary(Bitcoin)
str(Bitcoin)

Bitcoin <- Bitcoin %>%
  separate(date, c("year", "month","day"), sep = "-")

Bitcoin$date <- make_date(Bitcoin$year, Bitcoin$month, Bitcoin$day)

Bitcoin <- Bitcoin %>%
  select(-c(fees,generatedCoins))

colSums(is.na(Bitcoin))
colnames(Bitcoin)
colnames(Bitcoin) <- c("year", "month", "day","txvolume","txcount", "marketcap", "Price", "Volume", "date")

Bitcoin$day <- as.integer(Bitcoin$day)
Bitcoin$month <- as.integer(Bitcoin$month)

Cleaning Ethereum dataset and creating new variables

summary(eth)
names(eth)
str(eth)

eth <- eth %>%
  separate(date, c("year", "month","day"), sep = "-")

eth$date <- make_date(eth$year, eth$month, eth$day)

eth <- eth %>%
  select(-c(fees,generatedCoins))

colSums(is.na(eth))
colnames(eth)
colnames(eth) <- c("year", "month", "day","txvolume","txcount", "marketcap", "Price", "Volume", "date")

eth$day <- as.integer(eth$day)
eth$month <- as.integer(eth$month)

Merging datasets

  • Merging Bitcoin with crypto to add Volatility in Bitcoin dataset
  • Merging eth with crypto to add Volatility in eth dataset.

I will be using these additional variables in making my logistic regression model

Bitcoin <- crypto %>%
  filter(coin == "Bitcoin") %>%
  select(date, volatility) %>%
  inner_join(Bitcoin)

eth <- crypto %>%
  filter(coin == "Ethereum") %>%
  select(date, volatility) %>%
  inner_join(eth)

Data after cleaning

Data Summary

Crypto Dataset

Crypto dataset has 4 variables related to the price of Crypto Currency which include open, high, low and close prices for the coin. It also has Voulume of coins traded and total marketcap for that day. 3 more variables are extracted from date to get the day, month and year respectively.

Information Crypto Dataset
Dimention 570276 * 13
Date Range 20-Apr-13 to 22-Sep-17
Number of Numerical variables 10
Number of Factor variable 2
Number of Crypto Currency 1170
Oldest Crypto Currency Bitcoin

Bitcoin Dataset

We have joined Bitcoin dataset with crypto dataset on date to get 1 more variable volatility which will help us in logistic regression to predict if the price of Bitcoin will go up or down in next 24 hours.

Information Bitcoin Dataset
Dimention 1637 * 10
Date Range 28-Apr-13 to 20-Sep-17
Number of Numerical variables 8
Max Price of Bitcoin 6011 USD

Ethereum Dataset

We have joined Ethereum dataset with crypto dataset on date to get 1 more variable volatility which will help us in logistic regression to predict if the price of Ethereum will go up or down in next 24 hours.

Information Ethereum Dataset
Dimention 806 * 10
Date Range 7-Aug-15 to 20-Sep-17
Number of Numerical variables 7
Max Price of Ethereum 401.5 USD

Analysis

Visualising Growth of Cryto-Curriencies

Current Makrketcap of Crypto-Curriencies

The above visualization explains the whole cryptocurrency market is primaraily dominated by two currencies primarily – Bitcoin and Etherum and even the second ranked Etherum is far behind than Bitcoin which is driving the Crypto-Currency market. But it is also fascinating (and shocking at the same time) that both Bitcoin and Etherum together create a 130 Billion Dollar (USD) market.

How total Market Cap of Crypto-Curriencies have increased

In last one year Crypto-curriencies has been all over the news primarily becuase the rate at which Bitcoin has grown. Above visualization shows how the Market Cap of Bitcoin, Ethereum and total Crypto market has grown over the years. The Blue area shows the combined Market-Cap of all Crypto-Currency.

How Bitcoin and Ethereum prices has increased over the years

The above visualization show us how the prices of Bitcoin and Ethereum have increased over the years. On first look it looks as if Bitcoin is growing at a higher rate than Ethereum. We will check this further in our analysis.

Has all Crypto-Currencies performed well?

As we see that the total market cap of Crypto-Currency stands more than 2 Billion Dollar but that dosent mean all the crypto Curriencies has done well. From the above visualization we can see that out of 348 currencies present at start of 2016 around 30% percent of them have failed i.e. lost more than 50% of their value, 10 percent are struggling and rest 60% are doing well till now.

Inverstment Analytics

Bitcoin and other Crypto-Currencies are emerging as one of the favourate investing option especially among techies. Many early adoptors of Bitcoin have made millions from it and some Crypto-Currency analysis even say that the prices of Bitcion and other currencies are going to increase further. Here we look at how much your investment would have grown if you had invested just 100$ every month in Bitcoin and Ethereum since they got famous.

Monthly Investment in Bitcoin

If you have invested 100 dollars every month in Bitcoin since start of 2014 your investment would have grown to 43,566 dollars as on 1st October 2017 as compared to a total investment of 4,600 Dollars.

Monthly Investment in Ethereum

If you have invested 100 dollars every month in Ethereum since start of 2016 your investment would have grown to 83,946 dollars as on 1st October 2017 as compared to a total investment of 2,200 Dollars. From here we can see that Ethereum would have given you better returns as compared to Bitcoin

What if you had invested equal amount in top 10 Crypto-Curriencies in start of 2017

From the above visualization we can see that since 2017 NEM has given the highest returns followed by Ethereum, Ripple and Dash.

Regression Analysis

As Crypto-Currencies are emerging as one of the favourate investing option. Here we try to make a logistic model to predict if the price of Bitcoin is going to rise or fall in next 24 hours and another a forecast of Bitcoin prices for next 1 year.

Logistic Regression

Logistic Regeression to predict if price will go up or down in next 24 hours

For Logistic Regression we create 8 more variable 5 of which gives the day on previous day price ratio for past 5 days and 3 gives the day on previous day Transaction Volume ratio for past 3 days.

Bitcoin_log <- Bitcoin %>%
  arrange(date) %>%
  select("date", "Price", "year", "txvolume")

Bitcoin_log$lag1 <- 0
Bitcoin_log$lag2 <- 0
Bitcoin_log$lag3 <- 0
Bitcoin_log$lag4 <- 0
Bitcoin_log$lag5 <- 0
Bitcoin_log$vollag1 <- 0
Bitcoin_log$vollag2 <- 0
Bitcoin_log$vollag3 <- 0

for (i in 6:nrow(Bitcoin_log))
{
  Bitcoin_log$lag1[i] <- (Bitcoin_log$Price[i] / Bitcoin_log$Price[i-1])
  Bitcoin_log$lag2[i] <- (Bitcoin_log$Price[i-1] / Bitcoin_log$Price[i-2])
  Bitcoin_log$lag3[i] <- (Bitcoin_log$Price[i-2] / Bitcoin_log$Price[i-3])
  Bitcoin_log$lag4[i] <- (Bitcoin_log$Price[i-3] / Bitcoin_log$Price[i-4])
  Bitcoin_log$lag5[i] <- (Bitcoin_log$Price[i-4] / Bitcoin_log$Price[i-5])
  Bitcoin_log$vollag1[i] <- (Bitcoin_log$txvolume[i] / Bitcoin_log$txvolume[i-1])
  Bitcoin_log$vollag2[i] <- (Bitcoin_log$txvolume[i-2] / Bitcoin_log$txvolume[i-2])
  Bitcoin_log$vollag3[i] <- (Bitcoin_log$txvolume[i-3] / Bitcoin_log$txvolume[i-3])
}

Bitcoin_log$dir <- 1
for (i in 1:nrow(Bitcoin_log))
{
  if(Bitcoin_log$lag1[i] < 1)
  {
    Bitcoin_log$dir[i] <- 0
  }else{
    Bitcoin_log$dir[i] <- 1
  } 
}

#Logistic Regression
set.seed(1111)
trainingRowIndex <- sample(6:nrow(Bitcoin_log), 0.8*nrow(Bitcoin_log)) #Dividing Test and TrainingDataset
trainingData <- Bitcoin_log[trainingRowIndex, ]  # model training data
testData  <- Bitcoin_log[-trainingRowIndex, ]    # model test data

glm.fit = glm(dir ~  vollag3 + vollag2 + lag4 + lag5, data = trainingData, family = binomial)

#Predict on remaining data
glm.probs <- predict(glm.fit, newdata = testData,type = "response")

glm.pred = rep(0, 328)
glm.pred[glm.probs > 0.5] = 1

#Lets see how accurate is our prediction
confusionMatrix(glm.pred, testData$dir)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0   8   2
##          1 129 189
##                                          
##                Accuracy : 0.6006         
##                  95% CI : (0.5454, 0.654)
##     No Information Rate : 0.5823         
##     P-Value [Acc > NIR] : 0.2697         
##                                          
##                   Kappa : 0.0551         
##  Mcnemar's Test P-Value : <2e-16         
##                                          
##             Sensitivity : 0.05839        
##             Specificity : 0.98953        
##          Pos Pred Value : 0.80000        
##          Neg Pred Value : 0.59434        
##              Prevalence : 0.41768        
##          Detection Rate : 0.02439        
##    Detection Prevalence : 0.03049        
##       Balanced Accuracy : 0.52396        
##                                          
##        'Positive' Class : 0              
## 

From the above Confustion Matrix we can see that we are able to predict 60% of the times if the price of Crypto_currency will increase or now.

Time Series Prediction

Prediction of Bitcoin Value using Time Series
Bitcoin_time_mod <- Bitcoin %>%
  filter(day == 01) %>%
  arrange(date) %>%
  select("date", "Price")


Bitcoin_time_mod_aug <- Bitcoin_time_mod %>%
    tk_augment_timeseries_signature()

fit_lm <- lm(Price ~ ., data = select(Bitcoin_time_mod_aug, -c(date, diff)))

Bitcoin_time_mod_idx <- Bitcoin_time_mod %>%
    tk_index()

future_idx <- Bitcoin_time_mod_idx %>%
    tk_make_future_timeseries(n_future = 12)

new_data_tbl <- future_idx %>%
    tk_get_timeseries_signature()

pred <- predict(fit_lm, newdata = select(new_data_tbl, -c(index, diff)))

predictions_tbl <- tibble(date  = future_idx,value = pred)

Bitcoin_time_mod %>%
    ggplot(aes(x = date, y = Price)) +
    # Training data
    geom_line(color = "orange") +
    geom_point(color = "orange") +
    # Predictions
    geom_line(aes(y = value), color = "Blue", data = predictions_tbl) +
    geom_point(aes(y = value), color = "Blue", data = predictions_tbl) +
    # Aesthetics
    theme_tq() +
    labs(title = "Bitcoin Price Forecast: Time Series Machine Learning",
         subtitle = "Using basic multivariate linear regression can yield accurate results")

The above graph shows us how the price of Crypto-Currencies is going to move for next 1 year.

Summary

Insights

We started the analysis with the aim of understanding how the overall Cyrpto-Currencies markets have grown over the years and how Investing in Bitcoin and Ethereum would have given your returns.

We see the Bitcoin has dominated the Crypto-Currency market and have a market share of more than 50% followed by Ethereum which is around 20%, togeather these to have a marketcap of around 130 Billion Dollars out of a total marketcap of around 200 Billion Dollars. We also learned that not all the Crypto-Currencies have performes well around 30% percent of currencies them have fallen since 2016.

Though risky Crypto-Currency is the higest growing investment opportunity in the market today.Bitcoin is the most famous Crypto-Currency but we see that Etherem, NEM and many other have given higher returns. For ex a monthly investment of 100 Dollars in Bitcoin since 2014 would have given us 43,566 Dollars while a monthly investment of 100 Dollars in Ethereum since 2016 would have grown to 83,948.

For making Crypto-Currency trading more profitable, we try to see if we can predict whether the price of Bitcoin will increase or decrease in next 24 hours using a logistinc regression. We are able to predict 60% of the time if the prices will go up or not. For investment purpose we try to predict the prices to Bitcoin for next 1 year using Time Series Method. Unfortunatelly, we are not able to predict the prices of Bitcoing effectivelly using the given data as the price of Bitcoin is highly volatile.

Limitations and Further Scope

  1. There are a lot of Crypto-Currency which can be analysed, we have lumited our analysis mainly for Bitcoin and Ethereum.
  2. We can predict the price of Bitoing using more complex and better model such as Random Walk, ARIMA.
  3. There were some nulls in our dataset, if we can get the data for all those observations as well our analysis will give us more accurate results.
  4. We can also compare the returns of various Crypto-Currencies keeping thier launch date as starting point to see which Crypto-Currency has given.
  5. Bitcoin was spiltted between Bitcoin and Bitcoin Cash, we can be included that as well in our investment analytics.