Clustering Returns

clustering the Returns

I would like to cluster Year 1 to Year 3 returns and determine the number of clusters for exploratory purpose

setwd("D:/DATACAMP/interim/FUNDS_YEAR_2020")
CART_data <- read.csv("data.csv")


myvars <- names(CART_data) %in% c("Price", "Date", "X5.YR", "X10.YR", "X") 
newdata <- CART_data[!myvars]

#remove NA
newdata <- newdata[complete.cases(newdata),]

#remove fund name
myvars <- names(newdata) %in% c("Fund.Name") 
newdata <- newdata[!myvars]

#select each yearly returns and cluster them
myvars <- names(newdata) %in% c("X1.YR", "X2.YR",  "X3.YR") 
newdata <- newdata[myvars]

#create a dataset of returns
returnsdata <- newdata
head(returnsdata)

##   X2.YR X1.YR X3.YR
## 1 12.49  7.96  3.54
## 3  9.41  0.13  9.04
## 4  2.77 -3.33  3.89
## 5 18.83  8.47 13.81
## 6  7.04  4.65  3.43
## 7  8.81  3.98  6.01

Determine the number of clusters using NbClust library

Let’s use the library below to quickly determine the number of clusters

library("NbClust")
set.seed(123)
res.nbclust <- NbClust(returnsdata, distance = "euclidean", min.nc = 2, max.nc = 10, method = "complete", index ="all")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 2 as the best number of clusters 
## * 8 proposed 3 as the best number of clusters 
## * 3 proposed 5 as the best number of clusters 
## * 6 proposed 6 as the best number of clusters 
## * 1 proposed 9 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  3 
##  
##  
## *******************************************************************

library("factoextra")

## Warning: package 'factoextra' was built under R version 3.6.2

## Loading required package: ggplot2

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

#visualize
factoextra::fviz_nbclust(res.nbclust) + theme_minimal()

## Among all indices: 
## ===================
## * 2 proposed  0 as the best number of clusters
## * 6 proposed  2 as the best number of clusters
## * 8 proposed  3 as the best number of clusters
## * 3 proposed  5 as the best number of clusters
## * 6 proposed  6 as the best number of clusters
## * 1 proposed  9 as the best number of clusters
## 
## Conclusion
## =========================
## * According to the majority rule, the best number of clusters is  3 .

Clustering Returns

James Lim

3/9/2020

clustering the Returns

Determine the number of clusters using NbClust library

Determine the number of cluster PAM