Normalization is an important data transformation provess in many ML algorithms and statistical analysis. Does normalization matter in clustering? Let’s see using k-means clustering with a very simple 9-observation dataset (I made it up), which has 3 features: annual income, age and hometown of 9 residents.
This is what the dataset with 9 observations looks like
## income age city
## 1 50000 61 South
## 2 52000 55 South
## 3 55000 49 South
## 4 90000 38 North
## 5 91000 42 North
## 6 99000 40 North
This is how the data points were clustered without data normalized
##
## 1 2 3
## North 1 0 2
## South 0 3 0
## West 0 1 2
And this is how the output looks like with normalization
##
## 1 2 3
## North 0 3 0
## South 3 0 0
## West 0 1 2