Normalization is an important data transformation provess in many ML algorithms and statistical analysis. Does normalization matter in clustering? Let’s see using k-means clustering with a very simple 9-observation dataset (I made it up), which has 3 features: annual income, age and hometown of 9 residents.

This is what the dataset with 9 observations looks like

##   income age  city
## 1  50000  61 South
## 2  52000  55 South
## 3  55000  49 South
## 4  90000  38 North
## 5  91000  42 North
## 6  99000  40 North

This is how the data points were clustered without data normalized

##        
##         1 2 3
##   North 1 0 2
##   South 0 3 0
##   West  0 1 2

And this is how the output looks like with normalization

##        
##         1 2 3
##   North 0 3 0
##   South 3 0 0
##   West  0 1 2


About
https://twitter.com/DataEnthus