February 13, 2017
About the Iris Data Set
## vars n mean sd median trimmed mad min max range skew ## Sepal.Length 1 120 5.81 0.80 5.70 5.79 0.89 4.3 7.9 3.6 0.26 ## Sepal.Width 2 120 3.04 0.43 3.00 3.03 0.37 2.0 4.2 2.2 0.22 ## Petal.Length 3 120 3.73 1.74 4.35 3.75 1.70 1.0 6.9 5.9 -0.32 ## Petal.Width 4 120 1.20 0.78 1.30 1.19 1.04 0.1 2.5 2.4 -0.09 ## Species* 5 120 2.00 0.82 2.00 2.00 1.48 1.0 3.0 2.0 0.00 ## kurtosis se ## Sepal.Length -0.66 0.07 ## Sepal.Width -0.03 0.04 ## Petal.Length -1.45 0.16 ## Petal.Width -1.38 0.07 ## Species* -1.52 0.07
1. Clustering technique comes under unsupervised learning 2. It is widely used for exploratory data analysis 3. K-means clustering : K - number of clusters; data should be separated in means : using the means of the Euclidean distance to decide the centroid for each cluster
Step 1 Randomly assign any k-points as the centroids
Step 2 Calculate the Euclidean Distance between each data point and centroid and assign a cluster
Step 3 Calculate the mean of data points in each cluster and make it as a centroid
Step 4 Repeat Steps 2 & 3 till no changes in centroid or some other stopping condition has been met