Project Question - Is Ethereum Stable?

Worldwide speculation is rampant in light of cryptocurrency reaching unprecedented heights in 2017.
The following report explores the stability of Ethereum using regression and time series analysis.



PART I: Data Summary


Starting Dataset

Inspired and initially pulled from Kaggle datasets, I updated the set and imputed data from ethereum.io to cover a 2-year time-span.
See file “00_dataimputatation.xls” for details. I also additionally added Bitcoin (BTC) and Swiss franc (CHF) to provide additional analysis.


Stable Defined

Cryptocurrency is not considered stable in the traditional way. However, we will use two interpretations to decide, within the scope of this project, whether Ethereum as stable.

  1. If relative price changes converge
  2. If a model can converge to minimize prediction error (accuracy)


Variables Defined

t: time in days (periods 1-732)
timestamp: epoch time
num.address unique address growth rate
blocksize: average daily blocksize
price.usd: daily closing price in USD
hashrate: measure of miner calculations occurring in network
tot.eth.growth: rank of cryptocurrency = price * circulating supply
market.cap.value: rank of cryptocurrency = price * circulating supply (same as tot.eth.growth)
transactions: # transactions that day
date: date of measurement

BTC: Bitcoin in USD
CHF: Swiss Franc in USD

us.cpi: Consumer Price Index (normalized at 100 avg 1982-84 (flattened monthly to daily)
cpi.delta: Change between preceeding month (flattened monthly to daily)
cpi.rel.delta: Relative change cpi.delta / current month cpi (flattened monthly to daily)

Additional variables derived

where: Change in value = f(t) - f(t-1)
Relative change = change in value / price.usd

eth.price.delta: change in ether value
eth.rel.delta: relative change in ether value
btc.price.delta: change in bitcoin value
btc.rel.delta: relative change in bitcoin value
chf.price.delta: change in Swiss franc value
chf.rel.delta: relative change in Swiss franc value

Additional variables derived, cont’d

log.ether: log to study compression of outliers
week: time series split in 7 day increments
month: month
month.yr: month and year
day: day of week


*Note: variables t, timestamp, and date all refer to the same information. Due to collinearity, the variables will not be used simultaneously in a regression or classification model. The varying derived formats allow for addtitionaly pattern exploration.



Data Cleaning

Changed all price values of Ethereum from 0 to 0.0000000001 to avoid dividing by 0.
Imputed February 29th, 2017 as an average between February 28 and March 1st

Data type conversions

Converted Swiss franc to numeric
Date converted to date
Flattened CPI from month to daily
Converted variables month, month.yr, and day to factors

As seen above, the price of Ether has steadily rose over time with significant highs and lows.



Overview of currency price fluctuations

For comparative purposes, the Ether is plotted against the most famous cryptocurrency Bitcoin (BTC) and one of the most known stable currencies - Swiss Franc (CHF), and the US Consumer Price Index.




PART II: Methodology and Findings

*Due to methodology consisting of several exploratory techniques and approaches, for ease of organization, findings are reported in each section according to topic.


Method I: Regression and Classification

Data is cleaned and transformed into numeric and factors values for greater easy in regression and classification.

With regression and classification, we can study Ethereum fluctuations to see if it operates similar to the other dimensions. It is expected that linear regression will not work as a valid predictor due to potential trends and fluctuations.

Data can be broken into three main categories.

  1. Currency factors. To create a baseline comparative to both stable Swiss Franc and infamous unstable Bitcoin that is speculated to be here to stay (though in the future perhaps in a diminished capacity as government regulations set in). These include sub-categories such as relative delta and price change.

  2. Ethereum attributes. Attributes such as blocksize, market cap, and hashrate are suspected to scale with the price more as informative than predictive.

  3. Time. Various time groupings to search for cyclical patterns in regression and time series.



Data Correlation Exploration

To better understand the baseline, below is a correlation grid comparing Ether, BTC, Swiss Franc, and CPI


Bitcoin and Ethereum have a correlation of 0.92, indicating a very high correlation. Bitcoin also has a relatively high correlation to consumer price index at 0.72. Conversely, Bitcoin and Ethereum have a relatively low correlation to the Swiss Franc.

This graph helps create a solid starting point.




Exploring relative price data

I had hoped that by taking the ratio currency relationships could be brought to a similar scale. However when attempting to plot, the derivative (difference in f(t) - f(t-1)) amplified the noise in outliers and became difficult to use. I tried eliminating outliers and use of logarithmic compression, however this seemed redundant since the currency variables contained similar data. Had the scope of project had more time, I might have considered using heavy smoothinig techniques.

A sample wild plot can be observed below.




Exploring Regression

As the plot curve is not linear, it also likely will require a non-linear fit. However, it is still useful to explore to gain insight to relationships, especially since some data charactaristics are correlated.

Paring down further…


After several exploratory attempts including several additional correlation tests and graphical plotting not included in full in this report, to reduce colinearity, the following starting data is selected:

Output variable: price.usd (Ether)

date: measured in days

us.cpi: CPI, in USD
BTC: Bitcoin
CHF: Swiss Franc

hashrate: measure of miner calculations occurring in network
market.cap.val rank of cryptocurrency = price * circulating supply

month: month (as factor)
day: day of week as (factor)

Items of particulare note:

Addresses and hashrate are nearly identically correlate, and blocksize is very close as well.
Ethereum addresses and BTC are correlated at 0.95, which might be unsurprising since BTC and ether are highly correlated.
*CPI and Market Cap Value are correlated at 97%, which might yield interesting dive in a future study beyond the scope of this report.

Next is to build the regression model.


Estimate Std. Error t value Pr(>|t|)
(Intercept) 8125.926 3790.120 2.144 0.032
date -0.317 0.235 -1.348 0.178
BTC 0.137 0.004 38.954 0.000
CHF -0.031 0.012 -2.676 0.008
us.cpi -15.870 2.413 -6.577 0.000
market.cap.val 0.000 0.000 1.424 0.155
month02 -10.285 3.374 -3.048 0.002
month03 -6.779 3.793 -1.787 0.074
month04 -0.992 3.843 -0.258 0.796
month05 -11.903 3.841 -3.099 0.002
month06 13.812 4.233 3.263 0.001
month07 -17.670 4.220 -4.187 0.000
month08 2.935 4.565 0.643 0.520
month09 4.458 4.585 0.972 0.331
month10 6.679 4.154 1.608 0.108
month11 3.632 3.438 1.056 0.291
month12 -8.138 3.454 -2.356 0.019
dayMonday 0.962 2.460 0.391 0.696
daySaturday 0.120 2.452 0.049 0.961
daySunday 1.109 2.458 0.451 0.652
dayThursday -0.807 2.452 -0.329 0.742
dayTuesday 0.631 2.459 0.257 0.798
dayWednesday 0.371 2.459 0.151 0.880

The fit looks pretty terrible having an intercept of 8125 and a standard error of 3790, especially considering the price of Ether is generally less than $500.

BTC, Swiss Franc, CPI, and specific months correlate. Specifically February, May, June, July, and slightly in December.

Let’s explore further, how does the model look without Bitcoin? This won’t solve any linear issues but it might make room for other factors to become more important.

Estimate Std. Error t value Pr(>|t|)
(Intercept) 119852.352 4389.562 27.304 0.000
date -7.256 0.273 -26.613 0.000
CHF -0.222 0.019 -11.837 0.000
us.cpi -78.752 3.177 -24.785 0.000
market.cap.val 0.000 0.000 27.961 0.000
month02 -29.715 5.912 -5.026 0.000
month03 -60.456 6.261 -9.657 0.000
month04 -53.014 6.385 -8.303 0.000
month05 -44.705 6.639 -6.733 0.000
month06 15.109 7.499 2.015 0.044
month07 -37.151 7.424 -5.004 0.000
month08 -83.201 7.075 -11.761 0.000
month09 -87.030 6.976 -12.476 0.000
month10 -67.947 6.530 -10.406 0.000
month11 -31.741 5.876 -5.402 0.000
month12 -43.663 5.902 -7.398 0.000
dayMonday 1.997 4.358 0.458 0.647
daySaturday 1.017 4.345 0.234 0.815
daySunday 0.940 4.355 0.216 0.829
dayThursday -0.379 4.345 -0.087 0.931
dayTuesday 1.884 4.356 0.432 0.666
dayWednesday 0.621 4.356 0.142 0.887

The problem of Intercept is only made worse. While some other variables get rated as significant, this isn’t a practical solution.



Further diagonostics using Plot


Residuals vs. Fitted Data points cluster to the left and indicate some correlation.

Normal QQ plot Indicates non-normal distribution based on significant deviance.

Scale-location plot Plot indicates trends… not good.

Resuduals vs Leverage Some outliers seem to carry significant weight.


In a final attempt to resolve issues, the data is culled to just a few variables. While the data becomes more normally distributed, all the other issues remain.



As Bender from Futurama might say… my biggest regret was not giving up sooner!

Moving on…


Method II: Time Series Forecasting

Second is a time series analysis which leverages the forecast library and time series datasets.

First, a plot of Ether in R as a time series object with a line to emphasize fit challenges.



Bear in mind Ethereum is heavily weighted by the explosive activity in 2017.

The first time series (when Ether is near or at 0) is distorted when taking the log.
This can be cleaned by removing the first part of the time series.

By compressing the data, a smoother pattern can be observed.

Time Series Analysis

The literature supporting the following R time series techniques and analysis can be found at: https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-science-tutorials

One of the big concerns for time series is assessing stationarity, defined as follows:

Data is stationary when mean, variance and autocovariance are time invariant.

While at a casual glance Ether looks non-stationary, to gain supporting evidence, the data is run using the Dickey-Fuller (ADF) test. Null hypothesis: non-stationary

## 
##  Augmented Dickey-Fuller Test
## 
## data:  log.ether.ts2
## Dickey-Fuller = -1.5134, Lag order = 8, p-value = 0.7843
## alternative hypothesis: stationary

Ether (as expected) cannot be determined as stationary, so it will be treated as non-stationary.



Try #2: Decomposition

Three components - seasonality, trend, and cycle need to be assessed (for if they exist in data), and if so, deconstructed in order to build a model.

Seasonal - patterns found to repeat over a period of time defined as a season (often occurring in a month, week, or even year)

Trend - is the overall pattern as a whole increasing or decreasing over time. In this case, one can see visually that Ethereum trends upward.

Cycle - sometimes grouped with trend, cycle refers to patterns that are not seasonal, often associated with moving averages.

Residual - leftover data not attributed to seasonal, cycle, or trend components.

Smoothing

By creating a moving average, the fluctuations can be smoothed to trace a more aggregated pattern to reduce the noice of fluctuation. This also helps with discerning predictability.



Unfortunately there is some tradeoff. Even in the annualized model, there is still a wavering trend, that cuts off and becomes unusable for significant parts of the series, and the error is large, but there isn’t a better fitting solution without a model that isn’t that smooth (and unstationary).

A final demonstration of limitations detrending the middle ground (monthly).



As can be seen above, this only smooths small fluctuations and becomes ineffectual as price fluctuations increase.



Let’s try a slightly different view of related methods and look at decomposition.

Decomposition provides four results.

Observed: the native data
Trend: a trend line across the data set increasing or decreasing
Seasonal: repeatable pattern across time
random: residuals, which is the leftover and presents as error



Of greatest interest will be the random component, which is the remainder (error). Even in decomposition, the error has a wild fluctuation, indicating instability. This is particularly bad, since in decomposition, anomalies are smoothed (averaged) into the trend.

Possible problem: Moving averages do not perform well with outliers. Cryptocurrency has wide changes both upward and downward in value.




Differencing

Differencing defined: By taking the difference between t period and t-1 period (there can be more than one) consider the generalized formula: \[Y_{d2_t} = Y_{d_t} - Y_{d_t-1} = (Y_t - Y_{t-1}) - (Y_{t-1} - Y_{t-2})\]

The goal is to attempt to transform trends out of data. This worked poorly with regression amplifying changes. However, is there a common relative change that operates within a corridor?

Below is a density plot and a few statistics:

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.348817 -0.030816  0.000000  0.003528  0.035826  0.314961


Visualized as a spread, the relative change could also be viewed as a bounded unstable, which occurs across the entire two-years, with a few “quiet periods but mostly heavily fluciating.


How does this fluctuation compare with BTC, Swiss Franc, and CPI?

Bitcoin and CPI follow within the same realm, as might be expected from high correlation. The Swiss Franc has some outliers that would need culling in a further analysis outside of this report.




Summary of fluctuations

For the purposes of visualization, the next plot wraps relative price plot for ether, BTC, and CPI.




Next is a chart showing basic statistics.

max min mean median st.dev
ether 0.3149606 -0.3488168 0.0035277 0.0000000 0.0771778
btc 0.2026107 -0.1977869 0.0025161 0.0024773 0.0334879
swiss.franc 0.9967105 -288.0000000 -0.4310985 0.0000000 10.6561239
cpi 0.0054760 -0.0028882 0.0010691 0.0013667 0.0018278

Looking at the swiss franc we can more precisely identify the outlier(s) hover around -288, though the mean is -0.43, and is likely weighted.

Ether has larger fluctuations having over twice the standard deviation. Note that Bitcoin, which has nearly three times as long in the market, has lower relative fluctiations. As I am not an expert in cryptocurrency, I can only speculate that this is caused by it being more “mature” and having a much greater value, has less room to proportinately scale.




PART III: Highlights and Conclusions




Regression, time series moving averages, and decomposition did not yield effective models. I am fairly confident in this poor performance since I tried to fit models in many different ways and the results are unsurprising as many analysts are also baffled. This is not to say there isn’t a pattern, but it might require more subtle measurements than my current dataset offers.

When studying the relative changes in currency, daily fluctuations proceed within a limited boundary of no more than ~ 1/3 the present price. Of course this is no small amount if it compounds daily.


Also, as noted earlier, Ethereium is highly correlated to the consumer price index across all combinations.

Perhaps what makes cryptocurrency most notable is not so much the fluctuations, which follow a corridor of performance, but rather the ability for it to compound day after day in either direction, creating unusual volatility that confounds speculative analyts.


In conclusion, neither parameter provided telling evidence that Ethereum is somewhat stable (as defined in the beginning). However, through investigating this data, it was discovered that relative price change happens within a limited bound from day to day, and that it has some significant correlation to the consumer price index, as pertains to inflation.