Saurabh Binani
Kashyap Dave
Kun Zhou
Jay Patel
4/29/2021
Data extraction using Stack Exchange API
Utilized 3 types of data: Posts, Users and Tags
Current focus is on the Stack Overflow child website data for further analysis
In the below horizontal bar chart, we can observe the top 10 high frequency tags on the website.
We implemented 2 types of modeling techniques: Classification and Regression. Classification to predict if post will get answered or not (0/1) and regression to predict the view count of a particular post based on tags.
After implementing above ML and NLP techniques, we conclude that highest accuracy is achieved by Decision Tree algorithm of 81.4% and we recommend using this techniques in order to find out whether a question will get answered or not.
Sentiment analysis was not relevant in this project as posts do not have positive/negative sentiments associated with them unlike movie reviews.
This project can also be applied to other Stack Exchange child websites to do similar analysis and prediction so that community engagement remains consistent and users are involved frequently.