Tag Prediction on Stack Exchange

Group 3

Shravan Honade
Ritesh Sengar
Nikhil Patil
Dhananjay Ghate
Neena Chaudhari

2023-10-25

Problem Description

Primary objectives:

This project aims to use data and modeling to explore and predict tags for questions on Stack Exchange.The primary objectives are as follows:

  1. Data Extraction

  2. Exploratory Data Analysis (EDA)

  3. Machine Learning Model Development

Why is it interesting?

  1. Real-world Application: Improving tag prediction for Stack Exchange and other platforms could benefit millions of users.

  2. Data-Driven Insights: EDA can reveal user behavior, popular topics, and challenges on Stack Exchange.

  3. Automation: Automating tag prediction can save time and improve efficiency for moderators and users.

  4. User Experience Improvement: Accurate tagging improves user engagement by making it easier to find relevant content.

Analytics Plan - Exploratory Data Analysis

List of EDA tasks:

  1. Monthly questions count

  2. Percentage marked as answered vs. questions

  3. Relationships between answers and users

  4. Common words/tags in titles

  5. Growth or shrinking of particular tags over time

Methods and Tools

  1. API Integration (httr package): httr will be used to interact with APIs and retrieve data for analysis.

  2. Data Visualization (ggplot): ggplot will be used to visualize data patterns and model performance.

  3. Machine Learning Libraries (H2O, Keras): H2O, Keras and other models will be utilized as the machine learning frameworks.

Evaluation Plan

Objectives and goals:

Objective 1: Extract relevant data from the Stack Exchange API.

Objective 2: Conduct exploratory data analysis (EDA).

Objective 3: Develop a machine learning model.

Key Performance Indicators (KPIs):

KPI 1: Accuracy

KPI 2: Metrics

KPI 3: Reduction in Manual Tagging