Group Members Names:

    Zhonghao Zhao, Yuman Wu, Haidong Wang

Title of Presentation:

    Deep Dive in OkCupid Profiles Data

Dataset:

    OkCupid Profiles

Summary of the Analytic Questions:

(a) Relationship between body type, age, and single status.

We plan to investigate the correlation between different body type and single status, as well as age and single status. We will included graphs that we learned in the first half of the class, such as box plots, percentage bar plots. Then, we will train a simple KNN model with the input feature of body type and age. We will check our model’s accuracy.

(b) Can we cluster body type, age into a few ethnicity groups?

After examing the relations among body type, age and single status. We will do a further hierarchical cluster analysis on body type and age. We will try to cluster them into several ethnicity groups and visualize the clustering result.

(c) Text analysis on essay question 2 and single status.

We will do a text analysis on essay question 2, which is “what are you good at?”. Use users’ response to this particular question to predict their single status. We will do text and data cleaning. In the end, we want to train a logistic regression.