cluster size ave.sil.width
1 1 5249 0.42
2 2 4993 0.23
3 3 2952 0.04
4 4 4225 0.21
Data was explored and clean as follows:
One Hot Encoded - “trany”, “VClass”, “fuelType1”, “drive”, “guzzler”
Converted columns to numeric and scaled
Removed Highly Correlated Columns
Scaled numeric data is now ran with PCA. We want to take an explained variance close to 80% so we choose 7 dimensions for our clustering:
Next we plot the Total Within Sum of Square for each potential number of clusters and select the “elbow” which is 4 or 5:
Now with 4 clusters we can run K-Means and explore the clustering:
cluster size ave.sil.width
1 1 5249 0.42
2 2 4993 0.23
3 3 2952 0.04
4 4 4225 0.21
Top 15 Rules:
Top 15 Rules:
Top 15 Rules:
Top 15 Rules: