Shravan Honade
Ritesh Sengar
Nikhil Patil
Dhananjay
Ghate
Neena Chaudhari
2023-10-25
Primary objectives:
This project aims to use data and modeling to explore and predict tags for questions on Stack Exchange.The primary objectives are as follows:
Data Extraction
Exploratory Data Analysis (EDA)
Machine Learning Model Development
Real-world Application: Improving tag prediction for Stack Exchange and other platforms could benefit millions of users.
Data-Driven Insights: EDA can reveal user behavior, popular topics, and challenges on Stack Exchange.
Automation: Automating tag prediction can save time and improve efficiency for moderators and users.
User Experience Improvement: Accurate tagging improves user engagement by making it easier to find relevant content.
List of EDA tasks:
Monthly questions count
Percentage marked as answered vs. questions
Relationships between answers and users
Common words/tags in titles
Growth or shrinking of particular tags over time
API Integration (httr package): httr will be used to interact with APIs and retrieve data for analysis.
Data Visualization (ggplot): ggplot will be used to visualize data patterns and model performance.
Machine Learning Libraries (H2O, Keras): H2O, Keras and other models will be utilized as the machine learning frameworks.
Objectives and goals:
Objective 1: Extract relevant data from the Stack Exchange API.
Objective 2: Conduct exploratory data analysis (EDA).
Objective 3: Develop a machine learning model.
Key Performance Indicators (KPIs):
KPI 1: Accuracy
KPI 2: Metrics
KPI 3: Reduction in Manual Tagging