ABSTRACT
This study is intended to illustrate time series modelling for analysing the price of twelve (12) major cryptocurrencies based on two major market cycles respectively, the bull and the bear markets. These cryptocurrencies are selected as the focus of this study due to the attribution to high liquidity and large market capitalisation. The software used for the modelling and pre-processing steps are Orange 3.29.1 and Python 3.9.6. The study first clusters the volatility of price for each cryptocurrency, by training and testing the following three (3) clustering algorithms: Louvain algorithm, density-based spatial clustering of applications with noise (DBSCAN) and dynamic time warping paired with hierarchical clustering (DTW-H). Subsequently, the performance of the clustering algorithms is evaluated using the silhouette index and statistical metrics. The findings show that the Louvain algorithm outperforms the rest with high silhouette values. Next, the study runs the association mining rule, namely the Apriori algorithm onto the daily price direction of the cryptocurrencies. The results indicate there are many strong association rules in the co-occurrence of different cryptocurrencies during the bull market, and less co-movement during the bear market. The findings of this study would help investors to improve the asset allocation in their portfolio for minimised risk-return as well as to understand the price co-movement of various cryptocurrencies.

1. INTRODUCTION
Cryptocurrencies are electronic coins designed to work as medium of payment without the intervention of central authorities and financial institutions. The complete record of transactions is instead maintained through a decentralised and distributed consensus on the blockchain (Hudson & Urquhart, 2019). In the context of cryptocurrency, the blockchain is a system that mandates several cryptocurrency miners to verify a single transaction before it is uploaded into its ledger (Patel et al., 2020). These miners are randomly assigned to complete the verification phase with specialised mining software and receive rewards in the form of digital coins. A cryptocurrency miner can comprise of any individual across the world, eliminating the need for traditional banks (Li et al., 2021). Hence, it is seen that the mechanism of blockchain and cryptocurrency enables the digitalisation of trust, because a function or decision performed by a centralised party such as traditional banks can be subject to bias in human judgement (Philips & Gorse, 2017). Since the introduction of the first cryptocurrency, Bitcoin in 2008 by Satoshi Nakamoto, the disruptive potential of the Bitcoin framework has attributed to the explosive interest and development of other cryptocurrencies in the past decade. These cryptocurrencies have also served as investment tools in the global financial market. The cumulative market capitalisation of the crypto space has achieved $1.8 trillion in 2020, a two-hundredfold from its value in 2013 (Cavalli and Amoretti, 2021).

Despite various statistical and machine learning approaches have been attempted to predict cryptocurrency price actions, Tavares et al. (2020) reveals the majority have failed to factor in the dynamics of multiple investment horizons and cyclical effects on the asset price. In addition, the typical investors generally failed to perform diversification in their portfolio due to the lack of understanding on the different volatility of numerous cryptocurrencies as well as their co-movement price actions (Shi et al., 2020). These resulted in the disparate outcomes in annual return rates observed within the community of cryptocurrency investors (Burggraf, 2021). This study is intended to bridge the afore gaps.

2. DATA

2.1 Target Data: Cryptocurrency Price
The focus of this study is the daily price of twelve (12) cryptocurrencies, obtained from the CoinMarketCap site and cover the period from January 2017 until June 2021. This period span witnessed several major cycles of the cryptocurrency market, namely the bull market (from 1 January 2017 until 18 December 2017) and the bear market (from 19 December 2017 until 15 December 2018) (Wheatley et al., 2018). The complete list of 12 cryptocurrencies, along with the ticker codes, is presented below. The rationale of selecting these 12 cryptocurrencies as the target data is due to each coin is attributed to the top rankings of crypto market capitalisation as at May 31st, 2021. The top market capitalisation reflects the high liquidity posed by each cryptocurrency. This study focuses on merely the top 12-ranked cryptocurrencies in terms of market capitalisation, out of over 4,000 existing cryptocurrencies in the market as at May 31st, 2021, as these suffice for consideration to build a well-diversified trading portfolio (Joel et al., 2020).

2.2 Features
This study has derived several features from the target data (daily closing price of 12 cryptocurrencies) to be incorporated in the subsequent clustering phases. The variables are as follow:

• Daily log-returns: For financial assets, the use of returns instead of prices is extensive due to the easier comparability for its statistical properties (Shi et al., 2020). The daily log-returns, r is computed as follow:

• 7-days volatility: The volatility of a cryptocurrency is a measure of its price fluctuation over a specified period. This parameter is essentially used to reflect the risk associated with a cryptocurrency, by which the higher the volatility of the asset, the higher is its risk due to the wider spread of its price in a time range. Since the price of a highly volatile cryptocurrency changes drastically in a short period, it is known to traders that volatility is a crucial consideration factor prior to partaking in any hold position. In this study, the volatility, σ is derived from the log-returns over the past 7 days and is computed as follow:

• Daily price direction: The change in daily closing price for each cryptocurrency is further categorised into three directions, whereby a positive change in price is denoted as ‘Rise’, negative change as ‘Down’ and nil change as ‘Stable’.

3. METHODOLOGY

3.1 Train-test split and normalization
The data is divided into two sets using the 80%/20% split for training and testing respectively. Next, the data variables are normalized to a notionally common scale by centering the mean and scaling the standard deviation to 1.

3.2 Principal Component Analysis (PCA)
The PCA processing is performed for dimensionality reduction while preserving as much as of the data’s variation as possible. This phase is essential to reduce noise in the data set that might distort the pattern recognition or data mining processes.

3.3 Clustering algorithms
This study focuses on the clustering of the cryptocurrencies based on the 7-days volatility of log-returns for two market cycles, the bull and the bear markets. For the purpose of clustering time series type of data points, the three clustering algorithms considered in this study are the Louvain algorithm, density-based spatial clustering of applications with noise (DBSCAN) and dynamic time warping paired with hierarchical clustering. The performance of these algorithms are then evaluated using the criteria of silhouette index and statistics.

- Louvain algorithm
The Louvain algorithm is a greedy optimisation method that extracts communities from large networks, and appears to run in time according to the number of nodes in the network. Hence, this method is well-suited to the time-series nature of the research problem described in this study. The Louvain method generally has characteristics of rapid convergence properties and hierarchical partitioning.

The Louvain algorithm optimises the value of modularity, defined to be in the range of [0.5, 1], that measures the density of links within communities in comparison to the links between communities. The value of modularity, Q is formularised as follow:

- Density-based spatial clustering of applications with noise (DBSCAN)
The DBSCAN algorithm is a density-based and non-parametric clustering method. It clusters the pints that are close to each other, given a set of points in a space and marks the points that lie alone in low-density regions as outliers. The following parameters of the DBSCAN are specified prior to the execution, namely the core point neighbours and the neighbourhood distance. The optimal value of core point neighbours can be considered using the k-distance graph. If the value is too small, a large portion of the data set will not be clustered, whilst too high of a value will see over mergence of clusters. For the second parameter, the selection of neighbourhood distance is closely knit with the choice of the first parameter. The DBCAN algorithm generally does not require the number of clusters to be specified as its parameter, as opposed to the k-means clustering method. Further, it is robust to outliers and insensitive to the ordering of points in the space.

- Dynamic Time Warping and Hierarchical Clustering
The dynamic time warping algorithm measures the distance between two temporal sequences and subsequently pairs with the hierarchical clustering method to group the data points. The sequences are warped non-linearly to measure the similarity between these variations in a time dimension, hence this algorithm is often used in time series clustering problem. Subsequently, the distance measurement from the dynamic time warping is coupled with the single-linkage hierarchical clustering to identify the resulting clusters from the data set. The time-series of centroid is identified in each cluster prior to computing the distance from other potential members to the centroid.

3.4 Clustering Evaluation
The clusters identified from the Louvain algorithm are mainly evaluated using the silhouette index. In addition, the statistics and distribution shapes of the elements within the clusters are compared against those elements from other clusters.

- Silhouette index
The silhouette index is used to interpret and validate the consistency within the clusters of data. It measures the similarity of an element to its own cluster (cohesion), compared to other clusters (separation). Its value is defined in the range [-1, 1], where a high value reflects a good matching of the element to its own cluster and poor matching to other clusters. If there are points which indicate low or negative value, the clustering configuration may have certain outliers. In this study, the Euclidean distance metric is used for the silhouette index. where i is the number of vectors and n is the number of dimensions; p_i and q_i are two observations on vector i. The computation of the Euclidean distance, d(p,q) is interpretated as follow, whereby the lower d(p,q) is, the higher is the similarity between data point p and q. The silhouette of data point i is defined as follow:



- Statistics
The central tendency (mean) and distribution shape of each element is examined and compared against the other elements within the similar cluster as well as those from different clusters. The similarity in statistical properties of the members within a similar cluster indicate the effectiveness of the clustering algorithm to identify the patterns.

3.5 Association Mining Rule

The association mining rule, namely the Apriori algorithm, is adopted to discover the frequent patterns or correlations between the elements of study. The Apriori algorithm is generally composed of two phases: mining of frequent itemsets and generation of association rules.


4. RESULTS & DISCUSSION

4.1 Clustering Algorithms
From the Louvain algorithm execution, there are a total of four (4) clusters identified for the bull market and five (5) clusters for the bear cycle. Meanwhile, for the density-based spatial clustering of applications with noise (DBSCAN), there are two (2) clusters identified respectively for the bull and bear market cycles. As for the algorithm of dynamic time warping paired with hierarchical clustering (DTW-H), there are three (3) clusters formed for bull cycle and four (4) clusters for the bear cycles. The summary of the results for number of clusters is presented below. The different clusters represent the different volatility characteristics of various cryptocurrencies, hence investors who wish to diversify their portfolio for reduced risk exposure can select coins from different clusters for allocation.

The number of clusters formed by the Louvain algorithm is the highest for both market cycles. Hence, the illustration zooms into the clustering for this algorithm.






Silhouette index

The silhouette index of the resulting clusters from the three algorithms, categorised into bull and bear market cycle, are presented below. Based on the results, it is observed that the Louvain algorithm outperforms the rest of the algorithms as it generally displays higher silhouette index across both the bull and bear market data points, indicating that the members within a specific cluster are well-matched to that cluster, and far from those of other clusters. In this study, the Louvain algorithm is hence proposed to be the best-performing model.







Statistical metrics

Figure below presents the time-series plot for central tendency (mean) of the resulting clusters from Louvain algorithm. Generally, it is observed that each specific cluster has statistical property that is unique from the others, hence indicating the effectiveness of the Louvain algorithm in clustering the data points. The different volatility levels of the cryptocurrencies are especially pertinent during the bull market.




4.2 Association Mining Rule (Apriori Algorithm)

The previous section has defined the minimum threshold for support and confidence as 50% and 80% respectively. The Apriori algorithm is run for separate data sets based on the bull and bear market cycles. For the bull cycle, there are 351 transactions (days) with 12 attributes (number of cryptocurrencies). On the other hand, for the bear market, there are 362 transactions (days) similarly with 12 attributes (number of cryptocurrencies).
The size of k-frequent itemsets (number of cryptocurrencies which appear together) is denoted in the first columns. For example, k=3 means there are three cryptocurrencies that simultaneously occur such as rise price of BTC, rise price of LTC and down price of ETH. Thus, for instance of k=2, there are 21 pairs of stocks that appear simultaneously in the bull market, which is considered as rather high number of cryptocurrencies that are associated. The insights from the association rule would help investors to evaluate their portfolio decisions.

5. CONCLUSION
The first perspective proposes to adopt the Louvain clustering algorithm to group the cryptocurrencies based on volatility. Investors could make use of the findings by incorporating the cryptocurrencies from different clusters (different volatilities) to diversify assets in their portfolio management. For example, the resulting clusters from the Louvain algorithm for bull market indicates that investors should hold more of BTC (from Cluster 1) and less of BNB (from Cluster 2) to reduce the risk exposure for similar expected return. Secondly, the frequent itemset and association rules derived from the Apriori algorithm would assist investors to determine the price co-movement relationships between various cryptocurrencies.




REFERENCES
Patel, M. M., Tanwar, S., Gupta, R., & Kumar, N. (2020). A Deep Learning-based Cryptocurrency Price Prediction Scheme for Financial Institutions. Journal of Information Security and Applications, 55(August), 102583. https://doi.org/10.1016/j.jisa.2020.102583
Sifat, I. M., Mohamad, A., & Mohamed Shariff, M. S. Bin. (2019). Lead-Lag relationship between Bitcoin and Ethereum: Evidence from hourly and daily data. Research in International Business and Finance, 50(January), 306–321. https://doi.org/10.1016/j.ribaf.2019.06.012
Cavalli, S., & Amoretti, M. (2021). CNN-based multivariate data analysis for bitcoin trend prediction. Applied Soft Computing, 101, 107065. https://doi.org/10.1016/j.asoc.2020.107065
Shah, A., Chauhan, Y., & Chaudhury, B. (2021). Principal component analysis based construction and evaluation of cryptocurrency index. Expert Systems with Applications, 163(August 2019), 113796. https://doi.org/10.1016/j.eswa.2020.113796
Demiralay, S., & Golitsis, P. (2021). On the dynamic equicorrelations in cryptocurrency market. Quarterly Review of Economics and Finance, 80, 524–533. https://doi.org/10.1016/j.qref.2021.04.002