Vehicles Association Rules

Trey Sholes

Variables Chosen

Cleaning

Data was explored and clean as follows:

  • Replaced NA Values
  • Filter Data to only Vehicles that had City AND Highway mpg higher than 0
  • Uniform Guzzler indicator of “G” and filled NA values with “N”

Cluster Preparation

  • One Hot Encoded - “trany”, “VClass”, “fuelType1”, “drive”, “guzzler”

    • Then Removed - “trany”, “VClass”, “fuelType1”, “drive”, “guzzler”, “guzzler_N”
  • Converted columns to numeric and scaled

  • Removed Highly Correlated Columns

PCA

Scaled numeric data is now ran with PCA. We want to take an explained variance close to 80% so we choose 7 dimensions for our clustering:

Elbow Plot

Next we plot the Total Within Sum of Square for each potential number of clusters and select the “elbow” which is 4 or 5:

Clustering

Now with 4 clusters we can run K-Means and explore the clustering:

  cluster size ave.sil.width
1       1 5249          0.42
2       2 4993          0.23
3       3 2952          0.04
4       4 4225          0.21

Association Rules - Cluster 1 (Cars/Sedans)

Top 15 Rules:

Association Rules - Cluster 2 (SUVs & EVs)

Top 15 Rules:

Association Rules - Cluster 3 (Pick um up Trucks)

Top 15 Rules:

Association Rules - Cluster 4 (Guzzlers & Sports)

Top 15 Rules: