Data

Data is retrieved using the Stack Exchange API for Stack Exchange which needs to be setup; details of which are in this link - https://api.stackexchange.com/docs
The data includes video details like number of videos and views, subscription details, count of the likes and dislikes on videos, comments, video suggestions, channel statistics, etc

Data Collection procedure

Create account on Google Cloud Platform and enable the YouTube API services
Create the API key and enable Key restrictions to prevent unauthorized use and quota theft
Pull the data using functions like get_stats, get_comment_threads, get_subscriptions etc
Cleaning the data by removal of stop words from comments

Problem Description

Understand user sentiments from the likes and dislikes on videos in respective categories
Use text analysis to classify comments as positive, negative and neutral
Analyse impact of words in a video title on the view count

Analytics Plan

Data preparation,importing text, creating a corpus(tough task) preprocessing: sentence segmentation tokenization normalization–lower casing, stemming, lemmatization parts of speech tagging document-term matrix (DTM) ltering (removing stopwords) and weighting (tf-idf)

Use of NLP to perform tokenization, stop words removal
Libraries to be used : tidyverse, tidytext, wordcloud, ggplot2. Ggplot and word cloud to analyze frequency of words in each category
Visualizations to compare number of positive and negative reviews across categories

Evaluation Plan

Use the sentiment analysis of answers on various websites of stack exchanges to identify the type of category in which discussions happen
Predict the future usage of stack exchange websites by applying [] modeling techniques to predict the number of users and popularity (by posts, questions, answers)
Predict the reputation of users by analyzing historical data and recommendation for any kind of reward process to keep the stack exchange community engaged

Group 2 - CIS-8392 Proposal - Stack Exchange Sentiment Analysis

Team Members

Data

Problem Description

Analytics Plan

Evaluation Plan