Zhonghao Zhao, Yuman Wu, Haidong Wang
Deep Dive in OkCupid Profiles Data
OkCupid Profiles
We plan to investigate the correlation between different body type and single status, as well as age and single status. We will included graphs that we learned in the first half of the class, such as box plots, percentage bar plots. Then, we will train a simple KNN model with the input feature of body type and age. We will check our model’s accuracy.
After examing the relations among body type, age and single status. We will do a further hierarchical cluster analysis on body type and age. We will try to cluster them into several ethnicity groups and visualize the clustering result.
We will do a text analysis on essay question 2, which is “what are you good at?”. Use users’ response to this particular question to predict their single status. We will do text and data cleaning. In the end, we want to train a logistic regression.