Worldwide speculation is rampant in light of cryptocurrency reaching unprecedented heights in 2017.
The following report explores the stability of Ethereum using regression and time series analysis.
Starting Dataset
Inspired and initially pulled from Kaggle datasets, I updated the set and imputed data from ethereum.io to cover a 2-year time-span.
See file “00_dataimputatation.xls” for details. I also additionally added Bitcoin (BTC) and Swiss franc (CHF) to provide additional analysis.
Stable Defined
Cryptocurrency is not considered stable in the traditional way. However, we will use two interpretations to decide, within the scope of this project, whether Ethereum as stable.
Variables Defined
t: time in days (periods 1-732)
timestamp: epoch time
num.address unique address growth rate
blocksize: average daily blocksize
price.usd: daily closing price in USD
hashrate: measure of miner calculations occurring in network
tot.eth.growth: rank of cryptocurrency = price * circulating supply
market.cap.value: rank of cryptocurrency = price * circulating supply (same as tot.eth.growth)
transactions: # transactions that day
date: date of measurement
BTC: Bitcoin in USD
CHF: Swiss Franc in USD
us.cpi: Consumer Price Index (normalized at 100 avg 1982-84 (flattened monthly to daily)
cpi.delta: Change between preceeding month (flattened monthly to daily)
cpi.rel.delta: Relative change cpi.delta / current month cpi (flattened monthly to daily)
Additional variables derived
where: Change in value = f(t) - f(t-1)
Relative change = change in value / price.usd
eth.price.delta: change in ether value
eth.rel.delta: relative change in ether value
btc.price.delta: change in bitcoin value
btc.rel.delta: relative change in bitcoin value
chf.price.delta: change in Swiss franc value
chf.rel.delta: relative change in Swiss franc value
Additional variables derived, cont’d
log.ether: log to study compression of outliers
week: time series split in 7 day increments
month: month
month.yr: month and year
day: day of week
*Note: variables t, timestamp, and date all refer to the same information. Due to collinearity, the variables will not be used simultaneously in a regression or classification model. The varying derived formats allow for addtitionaly pattern exploration.
Changed all price values of Ethereum from 0 to 0.0000000001 to avoid dividing by 0.
Imputed February 29th, 2017 as an average between February 28 and March 1st
Data type conversions
Converted Swiss franc to numeric
Date converted to date
Flattened CPI from month to daily
Converted variables month, month.yr, and day to factors
As seen above, the price of Ether has steadily rose over time with significant highs and lows.
For comparative purposes, the Ether is plotted against the most famous cryptocurrency Bitcoin (BTC) and one of the most known stable currencies - Swiss Franc (CHF), and the US Consumer Price Index.
Data is cleaned and transformed into numeric and factors values for greater easy in regression and classification.
With regression and classification, we can study Ethereum fluctuations to see if it operates similar to the other dimensions. It is expected that linear regression will not work as a valid predictor due to potential trends and fluctuations.
Data can be broken into three main categories.
Currency factors. To create a baseline comparative to both stable Swiss Franc and infamous unstable Bitcoin that is speculated to be here to stay (though in the future perhaps in a diminished capacity as government regulations set in). These include sub-categories such as relative delta and price change.
Ethereum attributes. Attributes such as blocksize, market cap, and hashrate are suspected to scale with the price more as informative than predictive.
Time. Various time groupings to search for cyclical patterns in regression and time series.
Data Correlation Exploration
To better understand the baseline, below is a correlation grid comparing Ether, BTC, Swiss Franc, and CPI
Bitcoin and Ethereum have a correlation of 0.92, indicating a very high correlation. Bitcoin also has a relatively high correlation to consumer price index at 0.72. Conversely, Bitcoin and Ethereum have a relatively low correlation to the Swiss Franc.
This graph helps create a solid starting point.
Exploring relative price data
I had hoped that by taking the ratio currency relationships could be brought to a similar scale. However when attempting to plot, the derivative (difference in f(t) - f(t-1)) amplified the noise in outliers and became difficult to use. I tried eliminating outliers and use of logarithmic compression, however this seemed redundant since the currency variables contained similar data. Had the scope of project had more time, I might have considered using heavy smoothinig techniques.
A sample wild plot can be observed below.
Exploring Regression
As the plot curve is not linear, it also likely will require a non-linear fit. However, it is still useful to explore to gain insight to relationships, especially since some data charactaristics are correlated.
Paring down further…
After several exploratory attempts including several additional correlation tests and graphical plotting not included in full in this report, to reduce colinearity, the following starting data is selected:
Output variable: price.usd (Ether)
date: measured in days
us.cpi: CPI, in USD
BTC: Bitcoin
CHF: Swiss Franc
hashrate: measure of miner calculations occurring in network
market.cap.val rank of cryptocurrency = price * circulating supply
month: month (as factor)
day: day of week as (factor)
Items of particulare note:
Addresses and hashrate are nearly identically correlate, and blocksize is very close as well.
Ethereum addresses and BTC are correlated at 0.95, which might be unsurprising since BTC and ether are highly correlated.
*CPI and Market Cap Value are correlated at 97%, which might yield interesting dive in a future study beyond the scope of this report.
Next is to build the regression model.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 8125.926 | 3790.120 | 2.144 | 0.032 |
| date | -0.317 | 0.235 | -1.348 | 0.178 |
| BTC | 0.137 | 0.004 | 38.954 | 0.000 |
| CHF | -0.031 | 0.012 | -2.676 | 0.008 |
| us.cpi | -15.870 | 2.413 | -6.577 | 0.000 |
| market.cap.val | 0.000 | 0.000 | 1.424 | 0.155 |
| month02 | -10.285 | 3.374 | -3.048 | 0.002 |
| month03 | -6.779 | 3.793 | -1.787 | 0.074 |
| month04 | -0.992 | 3.843 | -0.258 | 0.796 |
| month05 | -11.903 | 3.841 | -3.099 | 0.002 |
| month06 | 13.812 | 4.233 | 3.263 | 0.001 |
| month07 | -17.670 | 4.220 | -4.187 | 0.000 |
| month08 | 2.935 | 4.565 | 0.643 | 0.520 |
| month09 | 4.458 | 4.585 | 0.972 | 0.331 |
| month10 | 6.679 | 4.154 | 1.608 | 0.108 |
| month11 | 3.632 | 3.438 | 1.056 | 0.291 |
| month12 | -8.138 | 3.454 | -2.356 | 0.019 |
| dayMonday | 0.962 | 2.460 | 0.391 | 0.696 |
| daySaturday | 0.120 | 2.452 | 0.049 | 0.961 |
| daySunday | 1.109 | 2.458 | 0.451 | 0.652 |
| dayThursday | -0.807 | 2.452 | -0.329 | 0.742 |
| dayTuesday | 0.631 | 2.459 | 0.257 | 0.798 |
| dayWednesday | 0.371 | 2.459 | 0.151 | 0.880 |
The fit looks pretty terrible having an intercept of 8125 and a standard error of 3790, especially considering the price of Ether is generally less than $500.
BTC, Swiss Franc, CPI, and specific months correlate. Specifically February, May, June, July, and slightly in December.
Let’s explore further, how does the model look without Bitcoin? This won’t solve any linear issues but it might make room for other factors to become more important.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 119852.352 | 4389.562 | 27.304 | 0.000 |
| date | -7.256 | 0.273 | -26.613 | 0.000 |
| CHF | -0.222 | 0.019 | -11.837 | 0.000 |
| us.cpi | -78.752 | 3.177 | -24.785 | 0.000 |
| market.cap.val | 0.000 | 0.000 | 27.961 | 0.000 |
| month02 | -29.715 | 5.912 | -5.026 | 0.000 |
| month03 | -60.456 | 6.261 | -9.657 | 0.000 |
| month04 | -53.014 | 6.385 | -8.303 | 0.000 |
| month05 | -44.705 | 6.639 | -6.733 | 0.000 |
| month06 | 15.109 | 7.499 | 2.015 | 0.044 |
| month07 | -37.151 | 7.424 | -5.004 | 0.000 |
| month08 | -83.201 | 7.075 | -11.761 | 0.000 |
| month09 | -87.030 | 6.976 | -12.476 | 0.000 |
| month10 | -67.947 | 6.530 | -10.406 | 0.000 |
| month11 | -31.741 | 5.876 | -5.402 | 0.000 |
| month12 | -43.663 | 5.902 | -7.398 | 0.000 |
| dayMonday | 1.997 | 4.358 | 0.458 | 0.647 |
| daySaturday | 1.017 | 4.345 | 0.234 | 0.815 |
| daySunday | 0.940 | 4.355 | 0.216 | 0.829 |
| dayThursday | -0.379 | 4.345 | -0.087 | 0.931 |
| dayTuesday | 1.884 | 4.356 | 0.432 | 0.666 |
| dayWednesday | 0.621 | 4.356 | 0.142 | 0.887 |
The problem of Intercept is only made worse. While some other variables get rated as significant, this isn’t a practical solution.
Further diagonostics using Plot
Residuals vs. Fitted Data points cluster to the left and indicate some correlation.
Normal QQ plot Indicates non-normal distribution based on significant deviance.
Scale-location plot Plot indicates trends… not good.
Resuduals vs Leverage Some outliers seem to carry significant weight.
In a final attempt to resolve issues, the data is culled to just a few variables. While the data becomes more normally distributed, all the other issues remain.
As Bender from Futurama might say… my biggest regret was not giving up sooner!
Moving on…
Second is a time series analysis which leverages the forecast library and time series datasets.
First, a plot of Ether in R as a time series object with a line to emphasize fit challenges.
Bear in mind Ethereum is heavily weighted by the explosive activity in 2017.
The first time series (when Ether is near or at 0) is distorted when taking the log.
This can be cleaned by removing the first part of the time series.
By compressing the data, a smoother pattern can be observed.
The literature supporting the following R time series techniques and analysis can be found at: https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-science-tutorials
One of the big concerns for time series is assessing stationarity, defined as follows:
Data is stationary when mean, variance and autocovariance are time invariant.
While at a casual glance Ether looks non-stationary, to gain supporting evidence, the data is run using the Dickey-Fuller (ADF) test. Null hypothesis: non-stationary
##
## Augmented Dickey-Fuller Test
##
## data: log.ether.ts2
## Dickey-Fuller = -1.5134, Lag order = 8, p-value = 0.7843
## alternative hypothesis: stationary
Ether (as expected) cannot be determined as stationary, so it will be treated as non-stationary.
Three components - seasonality, trend, and cycle need to be assessed (for if they exist in data), and if so, deconstructed in order to build a model.
Seasonal - patterns found to repeat over a period of time defined as a season (often occurring in a month, week, or even year)
Trend - is the overall pattern as a whole increasing or decreasing over time. In this case, one can see visually that Ethereum trends upward.
Cycle - sometimes grouped with trend, cycle refers to patterns that are not seasonal, often associated with moving averages.
Residual - leftover data not attributed to seasonal, cycle, or trend components.
Smoothing
By creating a moving average, the fluctuations can be smoothed to trace a more aggregated pattern to reduce the noice of fluctuation. This also helps with discerning predictability.
Unfortunately there is some tradeoff. Even in the annualized model, there is still a wavering trend, that cuts off and becomes unusable for significant parts of the series, and the error is large, but there isn’t a better fitting solution without a model that isn’t that smooth (and unstationary).
A final demonstration of limitations detrending the middle ground (monthly).
As can be seen above, this only smooths small fluctuations and becomes ineffectual as price fluctuations increase.
Let’s try a slightly different view of related methods and look at decomposition.
Decomposition provides four results.
Observed: the native data
Trend: a trend line across the data set increasing or decreasing
Seasonal: repeatable pattern across time
random: residuals, which is the leftover and presents as error
Of greatest interest will be the random component, which is the remainder (error). Even in decomposition, the error has a wild fluctuation, indicating instability. This is particularly bad, since in decomposition, anomalies are smoothed (averaged) into the trend.
Possible problem: Moving averages do not perform well with outliers. Cryptocurrency has wide changes both upward and downward in value.
Differencing
Differencing defined: By taking the difference between t period and t-1 period (there can be more than one) consider the generalized formula: \[Y_{d2_t} = Y_{d_t} - Y_{d_t-1} = (Y_t - Y_{t-1}) - (Y_{t-1} - Y_{t-2})\]
The goal is to attempt to transform trends out of data. This worked poorly with regression amplifying changes. However, is there a common relative change that operates within a corridor?
Below is a density plot and a few statistics:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.348817 -0.030816 0.000000 0.003528 0.035826 0.314961
Visualized as a spread, the relative change could also be viewed as a bounded unstable, which occurs across the entire two-years, with a few “quiet periods but mostly heavily fluciating.
How does this fluctuation compare with BTC, Swiss Franc, and CPI?
Bitcoin and CPI follow within the same realm, as might be expected from high correlation. The Swiss Franc has some outliers that would need culling in a further analysis outside of this report.
Summary of fluctuations
For the purposes of visualization, the next plot wraps relative price plot for ether, BTC, and CPI.
Next is a chart showing basic statistics.
| max | min | mean | median | st.dev | |
|---|---|---|---|---|---|
| ether | 0.3149606 | -0.3488168 | 0.0035277 | 0.0000000 | 0.0771778 |
| btc | 0.2026107 | -0.1977869 | 0.0025161 | 0.0024773 | 0.0334879 |
| swiss.franc | 0.9967105 | -288.0000000 | -0.4310985 | 0.0000000 | 10.6561239 |
| cpi | 0.0054760 | -0.0028882 | 0.0010691 | 0.0013667 | 0.0018278 |
Looking at the swiss franc we can more precisely identify the outlier(s) hover around -288, though the mean is -0.43, and is likely weighted.
Ether has larger fluctuations having over twice the standard deviation. Note that Bitcoin, which has nearly three times as long in the market, has lower relative fluctiations. As I am not an expert in cryptocurrency, I can only speculate that this is caused by it being more “mature” and having a much greater value, has less room to proportinately scale.
Regression, time series moving averages, and decomposition did not yield effective models. I am fairly confident in this poor performance since I tried to fit models in many different ways and the results are unsurprising as many analysts are also baffled. This is not to say there isn’t a pattern, but it might require more subtle measurements than my current dataset offers.
When studying the relative changes in currency, daily fluctuations proceed within a limited boundary of no more than ~ 1/3 the present price. Of course this is no small amount if it compounds daily.
Also, as noted earlier, Ethereium is highly correlated to the consumer price index across all combinations.
Perhaps what makes cryptocurrency most notable is not so much the fluctuations, which follow a corridor of performance, but rather the ability for it to compound day after day in either direction, creating unusual volatility that confounds speculative analyts.
In conclusion, neither parameter provided telling evidence that Ethereum is somewhat stable (as defined in the beginning). However, through investigating this data, it was discovered that relative price change happens within a limited bound from day to day, and that it has some significant correlation to the consumer price index, as pertains to inflation.