Vehicles Association Rules

Trey Sholes

Variables Chosen

Data was explored and clean as follows:

One Hot Encoded - “trany”, “VClass”, “fuelType1”, “drive”, “guzzler”
- Then Removed - “trany”, “VClass”, “fuelType1”, “drive”, “guzzler”, “guzzler_N”
Converted columns to numeric and scaled
Removed Highly Correlated Columns

Scaled numeric data is now ran with PCA. We want to take an explained variance close to 80% so we choose 7 dimensions for our clustering:

Next we plot the Total Within Sum of Square for each potential number of clusters and select the “elbow” which is 4 or 5:

Now with 4 clusters we can run K-Means and explore the clustering:

  cluster size ave.sil.width
1       1 5249          0.42
2       2 4993          0.23
3       3 2952          0.04
4       4 4225          0.21

Top 15 Rules:

Top 15 Rules:

Top 15 Rules:

Top 15 Rules: