Sample size is 9,470
Clusters
## Currently generic markdown table using pandoc is not supported.
| 25-34 |
White/Other |
High School |
Male |
No |
100-125k |
Married |
1393 |
| 55-64 |
White/Other |
College Grad |
Male |
No |
75-100k |
Married |
1211 |
| 25-34 |
Hispanic |
High School |
Male |
No |
25-35k |
Married |
874 |
| 65+ |
White/Other |
Graduate Degree |
Male |
No |
50-75k |
Single |
943 |
| 25-34 |
White/Other |
High School |
Female |
Yes |
125-150k |
Single |
1775 |
| 35-44 |
African American |
College Grad |
Male |
No |
50-75k |
Single |
628 |
| 25-34 |
Hispanic |
College Grad |
Female |
Yes |
25-35k |
Single |
557 |
| 35-44 |
Hispanic |
College Grad |
Female |
NO |
125-150k |
Married |
754 |
| 65+ |
Hispanic |
High School |
Male |
Yes |
75-100k |
Single |
627 |
| 25-34 |
White/Other |
High School |
Female |
NO |
50-75k |
Single |
708 |
Plots
This plot is showing 2 dimensions of what the data looks like. There are actually 8 dimensions that we will be clustering.

Explanation
With 8 variables to consider, varying between 2 and 11 types in each, the total number of combinations in our data is 16,896.
The rule of thumb when picking the number of clusters is sqrt(n/2) which for us is ~69 but the scree plot below is telling us to use 10 clusters.

Since we’re dealing with nominal (categorical) data we cannot use kmeans which calculates euclidean distance and does not work on nominal data. I am using the kmodes clustering algorithm which uses the mode (instead of mean) to calculate a dissimilarity measure which is frequency based. So answers that occur most often will occur in the same clusters.