Analyzing Crypto Currency Data with R

---




title: "Manipulating Cryptocurrency Data with R"
output: html_notebook
  
  
---
Preface:

In this short document, we aim to perform basic data manipulations of the 
main cryptocurrencies in the market using the R language to cleanse the data
so it can be later on transformed into into meaningful findings for the user. 
As a step by step guide, the user is 
free to follow the examples with the code given as well as modify it for his own
personal preferences.</body>


Stage 1 - Accessing and Importing the Data

In order to have access to the cryptocurrencies available, we have API's which
make the job considerably easier.
There are several choices from which you can derive the data from, for example
you can use the Coinbase API database of crypto by utilizing the following scripts

(First lets install the necessary libraries - in case its 
missing you can use the install.packages("<name>") function)


```r
rm(list = ls())  # reset global variables
#import the libraries we need
library(jsonlite)
library(glue)
library(tidyverse)
library(dplyr)
library(tidyquant)
library(xts)
library(ggplot2)
library(hrbrthemes)

(Note: for any packages missing on your behalf, you can use the sample code of package.install(‘name’) to be able to obtain access to it later on.

After importing the libraries, we retrieve the data from the available JSON database of Coinbase, and we assign the column names to each pair in the file and then write it to a csv file.

# create a function to retrieve daily data
retreive_daily_data <- function(pair, filename) {
  url = glue("https://api.pro.coinbase.com/products/{pair}/candles?granularity=86400")
  columnNames <- c('unix', 'low', 'high', 'open', 'close', glue('{pair} volume'))
  mydata <- fromJSON(url) #retrieve the data from the JSON url
  df <- as.data.frame(mydata) #define the dataframe for the assigned values
  colnames(df) <- columnNames  # rename the columns
  
  write.csv(df, file = filename)
}

Next, we want to access the file locally as a csv file. For our example, we have chosen ADA (or Cardano) which is a public blockchain platform. It is open source and decentralized, with high expectations in the crypto market.

newPair <- "ADA-USD"
fileName <- glue("dailyData{newPair}.csv")  #Assigning the name for each value
runFunc <- retreive_daily_data(newPair, filename = fileName)
runFunc

## NULL

By now you can see the file in your local system in the csv format. This is how it should look like, under the name dailyDataADA.csv. The file should be in your local directory:

Image of Excel file

You can you use the following code to read the csv file manually and assign it as a dataframe:

(Note! Use Your Directory within the Brackets)

ADA <- read.csv("C:/Users/xhoni/Documents/Crypto Currency/dailyDataADA-USD.csv")

If you explored the data, it is easy to distinguish unnecessary values or columns which need to be properly adjusted. This brings us to the next stage.

Stage 2: Cleaning the Data

As you can notice, the date imported is given in UNIX Epoch format. What we need to do is convert it normal date to perform the necessary time based analysis.

If not available, install the package anytime

library(anytime)

Now you can convert the Unix column to a regular date through the anytime function as below:

Date <- anydate(ADA$unix)

Now you can replace the newly create date column with the Unix numerals:

ADA$unix <- Date

And finally, proceed to rename the unix column header to date:

ADA <- rename(ADA, 'Date'='unix')

Now that we have converted the unix date, and replaced it with a more appropriate format, we can take a look and see what else we can fix in the file.

head(ADA, 5)

##   X       Date    low   high   open  close ADA.USD.volume
## 1 1 2021-05-31 1.5260 1.6946 1.5769 1.6747      104610138
## 2 2 2021-05-30 1.3468 1.7000 1.4043 1.5769      165291552
## 3 3 2021-05-29 1.3358 1.5615 1.5133 1.4051      156390351
## 4 4 2021-05-28 1.4501 1.6800 1.6565 1.5141      196815551
## 5 5 2021-05-27 1.6100 1.7900 1.7822 1.6567      128226845

Lets remove the X column, since it is not important for our calculations:

ADA[ ,c('X')] <- list(NULL)

If you want to remove more columns simultaneously, you can extend the previous formula to including the columns you want to remove in the brackets and assign them NULL values. For our example lets remove all the columns except the date and the closing price.

ADA[ ,c('low','high','open','ADA.USD.volume')] <- list(NULL)

Finally, lets rename the close column to price for easier reference:

price <- rename(ADA, 'price'='close')

Now that the data seems OK (usually, cleaning the data will not be that easy), it’s time to perform some basic visualizations

Stage 3 - Basic Visualizations

At first we can start we a simple line chart, which shows the trend of the ADA closing price. To plot the chart we can use the following plot:

ggplot(ADA, aes(x=Date, y=close))+
         geom_line()

If we want to be more specific, we can obtain the trend for the 10 last days and plot points to each day. Lets observe how it will look like:

ADA %>%
  tail(10) %>%  #last 10 days
  ggplot( aes(x=Date, y=close))+
  geom_line() +
  geom_point()

If you’d like to get more aesthetic with the visualizations you can implement different functions in addition to the above.

Analyzing Crypto Currency Data with R

Xhoni Shollaj

6/2/2021