Classifying stock time series data is a challenging task in financial analysis, primarily due to the random and switching nature of market trends. The unpredictability of stock price movements makes it difficult to discern meaningful patterns, leading to less accurate predictions and unreliable trading signals. To address this problem, this research paper explores the application of the K-means clustering algorithm in classifying stock time series data using three widely used technical indicators: Relative Strength Index (RSI), Commodity Channel Index (CCI), and Average Directional Index (ADX).
The random and switching behavior of stock time series data introduces uncertainty, requiring more sophisticated approaches for accurate classification. By leveraging the K-means clustering algorithm’s ability to identify distinct groups and patterns within data, along with the informative insights provided by RSI, CCI, and ADX indicators, we aim to develop a robust framework for capturing dynamic shifts in market trends. The outcomes of this research have the potential to significantly contribute to financial analysis and trading strategies, enabling investors, traders, and financial professionals to make more informed decisions based on reliable classification of stock time series data.
RSI (Relative Strength Index), ADX (Average Directional Index), and CCI
(Commodity Channel Index) are technical indicators commonly used in
trading and investing to analyze time series data. These indicators
provide diversified representations of the data by focusing on different
aspects of price movements and market trends. Let’s explore how each
indicator contributes to a diversified understanding of time series
data:
Relative Strength Index (RSI): RSI is a momentum oscillator that measures the speed and change of price movements. It quantifies the strength of price changes over a specified period, typically 14 days, and indicates whether an asset is overbought or oversold. RSI is a valuable tool for assessing the market’s internal strength and identifying potential reversal points. By examining RSI values, traders can gain insights into the underlying strength or weakness of a trend, allowing for a diversified perspective on price dynamics.
Average Directional Index (ADX): ADX is a trend strength indicator that helps traders gauge the overall strength and sustainability of a trend. It considers both positive and negative price movements and assigns a value between 0 and 100. A high ADX reading suggests a strong trend, while a low reading indicates a weak or ranging market. By using ADX, traders can identify trending periods, measure the strength of the trend, and differentiate between trending and non-trending market conditions. This diversification of information assists traders in adapting their strategies based on the prevailing market environment.
Commodity Channel Index (CCI): CCI is an oscillator that measures the deviation of an asset’s price from its statistical mean. It provides an indication of whether an asset is overbought or oversold relative to its average price. CCI helps identify potential price reversals, overextended price movements, and potential entry or exit points. By incorporating CCI into their analysis, traders gain a diversified perspective on price dynamics, enabling them to capture both overbought and oversold conditions.
By utilizing a combination of RSI, ADX, and CCI, traders can benefit from a diversified representation of time series data. These indicators offer different insights into price momentum, trend strength, and price deviations, allowing traders to evaluate multiple dimensions of market behavior. By considering these diverse perspectives, traders can make more informed trading decisions and better adapt their strategies to various market conditions.
There was no specific for the choice of using 3 inputs other than to diversity inputs to create a more robust indicator. This method captures momentum and trend well based on price but lacks a representation of volume and volatility. Future improvements of this code will allow the user to pick from a list which type of indicators they would like the algorithm to classify on.
K-means clustering, an unsupervised learning algorithm, can be employed
to cluster financial data using the RSI, CCI, and ADX indicators.
Unsupervised learning aims to discover patterns or structure in data
without the need for predefined labels or target values. By leveraging
the intrinsic relationships between the input features, k-means
clustering groups similar data points into clusters, allowing for
further analysis and evaluation of price behavior within each cluster.
Here’s an explanation of how k-means clustering can be applied to these
indicators, followed by evaluating price within each cluster:
1. Input Features Selection:
2. Applying k-means Clustering:
3. Cluster Evaluation with Price:
It’s important to note that the number of clusters (k) in k-means clustering needs to be predefined. Choosing an appropriate value for k can impact the quality and interpretability of the clusters. Additionally, other techniques such as dimensionality reduction or feature engineering may be used to enhance the clustering process and improve the overall analysis.
Choosing Cluster Amount
There are more complex methods to find the optimal amount of clusters to represent the data however only 2 clusters were chosen out of simplicity of coding due to limitations in Think script. These two clusters represent increasing and decreasing price and assumes these are the two states that the underlying price will be in. Future updates will include more clusters with more analysis on how price behaves in each cluster.
K means Distance metrics
1. Euclidean Distance
Simplicity and Intuitiveness: Euclidean distance is straightforward and easy to understand. It measures the straight-line distance between two points in a multidimensional space, which aligns with our intuitive understanding of distance in Euclidean geometry.
Geometric Interpretation: Euclidean distance reflects the geometric relationship between points in Euclidean space. It captures the notion of proximity or similarity based on the distance between the coordinates of the points. Points that are closer together in Euclidean space tend to have smaller Euclidean distances.
Computational Efficiency: Calculating Euclidean distance is computationally efficient, especially in low-dimensional spaces. The formula involves simple arithmetic operations, such as square root and summation, which can be efficiently computed. This efficiency makes Euclidean distance a popular choice in many applications and algorithms.
 \[
\text{Euclidean distance} = \sqrt{{(x_2 - x_1)^2 + (y_2 - y_1)^2 +
\ldots + (z_2 - z_1)^2}}
\]
2. Lorentzian Distance
Suitable for Non-Euclidean Spaces: The Lorentzian distance is particularly useful in non-Euclidean spaces, such as hyperbolic geometry or space time in physics. Unlike the Euclidean distance, which assumes a flat geometry, the Lorentzian distance accounts for the curvature or hyperbolic nature of the space. It allows for accurate distance calculations in these specialized spaces.
Considers Negative Curvature: In spaces with negative curvature, such as hyperbolic space, the Lorentzian distance can capture the unique properties of such spaces. Negative curvature implies that the shortest path between two points is not a straight line but a hyperbolic curve. The Lorentzian distance takes this curvature into account and provides a meaningful measure of distance in these contexts.
Preserves Metric Properties: The Lorentzian distance retains essential metric properties like non-negativity, identity of indiscernibles, symmetry, and the triangle inequality, similar to the Euclidean distance. This property ensures that the Lorentzian distance can be used as a valid metric in mathematical and statistical contexts, allowing for the application of established algorithms and methodologies.
It’s worth noting that the Lorentzian distance is not as widely used
as the Euclidean distance, as it is mainly applicable to specialized
domains such as hyperbolic geometry, general relativity, and some
specific data analysis tasks. Its benefits are significant in these
contexts where non-Euclidean spaces or negative curvature play a crucial
role.
 \[
\text{Lorentzian distance} = \log(1 + |x_2 - x_1| + |y_2 - y_1| + \ldots
+ |z_2 - z_1|)
\]
3. Cosine Similarity
Insensitivity to Vector Magnitude: Cosine similarity is unaffected by the magnitude or length of vectors, making it suitable for comparing vectors with varying magnitudes.
Angle Measure of Similarity: Cosine similarity quantifies similarity based on the angle between vectors, capturing their directionality and providing a measure of similarity between them.
Effective for Textual Data and Document Comparison: Cosine similarity is extensively used in text mining and natural language processing tasks, enabling efficient document comparison and capturing semantic similarities between documents.
These benefits make cosine similarity a valuable metric in various
applications, including recommendation systems, document clustering,
information retrieval, and more.
\[
\text{Cosine similarity} = \frac{{\mathbf{X} \cdot
\mathbf{Y}}}{{\|\mathbf{X}\| \|\mathbf{Y}\|}}
\]
Example on SPY
The image below shows all 3 indicators on one chart along with the MPC indicator. looking at 3 indicators on one chart could be confusing for the trader to decipher what action they should take. This algorithm is good for combining indicators into one neat visual representation of underlying price action using machine learning. Depending on what the user selects for the training and testing bars, the shaded region could serve as a good backtest. This region is not used to estimate clusters, so this is how the indicator would perform out of sample.
Application of this indicator is simple for the user. The indicator works on bar color so when the price is increasing the bar color will be colored green and when the indicator is detecting a downward move in price it will be colored red. This is a good indicator to be used with other support and resistance indicators and looking for phase changes near these levels. Below you can see where price respects these support and resistance zones and entering to reverse in the opposite direction on a phase shift with a target of the upper cluster the trader would have made a winning trade.
Note: This trade is in the shaded region which means this is out of sample suggesting that both algorithms have a relationship in detecting phase changes.
plot out sample: This algorithm is split up into training and testing. Setting this input to yes will shade the region on the right of the chart that is out of sample data, which is data that the algorithm has not been trained on.
train day: The number of days the algorithm will use to train on.
test day: The number of days to show the performance of the indicator. The more Days you have here the more you can see how the model performs out of sample.
length: Phase change indicator uses RSI, CCI and ADX as inputs. length input changes the look back parameter for each of these.
dist type: Each data point in the time series is assigned to a cluster based on a distance metric. This input has 3 distance metrics that are used to assign distance to a cluster. The user can select from Euclidean Distance, Lorentzian Distance, and Cosine Similarity.
Chart Settings: right extension minimum of 10 bars
Time Frame: The algorithm needs a minimum of 2000 bars to work. For recommended settings refer to the table in trouble shooting.
If you are having trouble with the indicator you can go to the discord
in the title however some basic problem solving can be done by the user.
Swapping between daily and intraday time frames will create problems if
you don’t switch the training and testing settings. The following are
suggested settings for intraday and daily settings:
| User Recommended Settings |
|---|
| Setting | Daily | Intraday |
|---|---|---|
| Chart | 10Y 1D | 30D 5m |
| Train | 600 | 25 |
| Test | 30 | 3 |
| Length | 20 | 20 |
| Distance | Lorentzian | Lorentzian |
Maarten Grootendorst, 2021, https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa
Saad Ahmed, 2019, https://towardsdatascience.com/machine-learning-for-stock-clustering-using-k-means-algorithm-126bc1ace4e1
Justin Dehorty, 2023, https://www.tradingview.com/script/WhBzgfDu-Machine-Learning-Lorentzian-Classification/