Introduction


Classifying stock time series data is a challenging task in financial analysis, primarily due to the random and switching nature of market trends. The unpredictability of stock price movements makes it difficult to discern meaningful patterns, leading to less accurate predictions and unreliable trading signals. To address this problem, this research paper explores the application of the K-means clustering algorithm in classifying stock time series data using three widely used technical indicators: Relative Strength Index (RSI), Commodity Channel Index (CCI), and Average Directional Index (ADX).

The random and switching behavior of stock time series data introduces uncertainty, requiring more sophisticated approaches for accurate classification. By leveraging the K-means clustering algorithm’s ability to identify distinct groups and patterns within data, along with the informative insights provided by RSI, CCI, and ADX indicators, we aim to develop a robust framework for capturing dynamic shifts in market trends. The outcomes of this research have the potential to significantly contribute to financial analysis and trading strategies, enabling investors, traders, and financial professionals to make more informed decisions based on reliable classification of stock time series data.


Data Selection


RSI (Relative Strength Index), ADX (Average Directional Index), and CCI (Commodity Channel Index) are technical indicators commonly used in trading and investing to analyze time series data. These indicators provide diversified representations of the data by focusing on different aspects of price movements and market trends. Let’s explore how each indicator contributes to a diversified understanding of time series data:

By utilizing a combination of RSI, ADX, and CCI, traders can benefit from a diversified representation of time series data. These indicators offer different insights into price momentum, trend strength, and price deviations, allowing traders to evaluate multiple dimensions of market behavior. By considering these diverse perspectives, traders can make more informed trading decisions and better adapt their strategies to various market conditions.

There was no specific for the choice of using 3 inputs other than to diversity inputs to create a more robust indicator. This method captures momentum and trend well based on price but lacks a representation of volume and volatility. Future improvements of this code will allow the user to pick from a list which type of indicators they would like the algorithm to classify on.

Methodology


K-means clustering, an unsupervised learning algorithm, can be employed to cluster financial data using the RSI, CCI, and ADX indicators. Unsupervised learning aims to discover patterns or structure in data without the need for predefined labels or target values. By leveraging the intrinsic relationships between the input features, k-means clustering groups similar data points into clusters, allowing for further analysis and evaluation of price behavior within each cluster. Here’s an explanation of how k-means clustering can be applied to these indicators, followed by evaluating price within each cluster:

It’s important to note that the number of clusters (k) in k-means clustering needs to be predefined. Choosing an appropriate value for k can impact the quality and interpretability of the clusters. Additionally, other techniques such as dimensionality reduction or feature engineering may be used to enhance the clustering process and improve the overall analysis.


Choosing Cluster Amount

There are more complex methods to find the optimal amount of clusters to represent the data however only 2 clusters were chosen out of simplicity of coding due to limitations in Think script. These two clusters represent increasing and decreasing price and assumes these are the two states that the underlying price will be in. Future updates will include more clusters with more analysis on how price behaves in each cluster.


K means Distance metrics

  \[ \text{Euclidean distance} = \sqrt{{(x_2 - x_1)^2 + (y_2 - y_1)^2 + \ldots + (z_2 - z_1)^2}} \]

Example on SPY

The image below shows all 3 indicators on one chart along with the MPC indicator. looking at 3 indicators on one chart could be confusing for the trader to decipher what action they should take. This algorithm is good for combining indicators into one neat visual representation of underlying price action using machine learning. Depending on what the user selects for the training and testing bars, the shaded region could serve as a good backtest. This region is not used to estimate clusters, so this is how the indicator would perform out of sample.



Application


Application of this indicator is simple for the user. The indicator works on bar color so when the price is increasing the bar color will be colored green and when the indicator is detecting a downward move in price it will be colored red. This is a good indicator to be used with other support and resistance indicators and looking for phase changes near these levels. Below you can see where price respects these support and resistance zones and entering to reverse in the opposite direction on a phase shift with a target of the upper cluster the trader would have made a winning trade.

Note: This trade is in the shaded region which means this is out of sample suggesting that both algorithms have a relationship in detecting phase changes.

Indicator Settings


plot out sample: This algorithm is split up into training and testing. Setting this input to yes will shade the region on the right of the chart that is out of sample data, which is data that the algorithm has not been trained on.

train day: The number of days the algorithm will use to train on.

test day: The number of days to show the performance of the indicator. The more Days you have here the more you can see how the model performs out of sample.

length: Phase change indicator uses RSI, CCI and ADX as inputs. length input changes the look back parameter for each of these.

dist type: Each data point in the time series is assigned to a cluster based on a distance metric. This input has 3 distance metrics that are used to assign distance to a cluster. The user can select from Euclidean Distance, Lorentzian Distance, and Cosine Similarity.






Chart Settings



Chart Settings: right extension minimum of 10 bars

Time Frame: The algorithm needs a minimum of 2000 bars to work. For recommended settings refer to the table in trouble shooting.





Troubleshooting



If you are having trouble with the indicator you can go to the discord in the title however some basic problem solving can be done by the user. Swapping between daily and intraday time frames will create problems if you don’t switch the training and testing settings. The following are suggested settings for intraday and daily settings:


User Recommended Settings
Setting Daily Intraday
Chart 10Y 1D 30D 5m
Train 600 25
Test 30 3
Length 20 20
Distance Lorentzian Lorentzian


References