3/25/2021
Data is retrieved using the Stack Exchange API for Stack Exchange which needs to be setup; details of which are in this link - https://api.stackexchange.com/docs
The data includes video details like number of videos and views, subscription details, count of the likes and dislikes on videos, comments, video suggestions, channel statistics, etc
Data Collection procedure
Create account on Google Cloud Platform and enable the YouTube API services
Create the API key and enable Key restrictions to prevent unauthorized use and quota theft
Pull the data using functions like get_stats, get_comment_threads, get_subscriptions etc
Cleaning the data by removal of stop words from comments
Understand user sentiments from the likes and dislikes on videos in respective categories
Use text analysis to classify comments as positive, negative and neutral
Analyse impact of words in a video title on the view count
Data preparation,importing text, creating a corpus(tough task) preprocessing: sentence segmentation tokenization normalization–lower casing, stemming, lemmatization parts of speech tagging document-term matrix (DTM) ltering (removing stopwords) and weighting (tf-idf)
Use of NLP to perform tokenization, stop words removal
Libraries to be used : tidyverse, tidytext, wordcloud, ggplot2. Ggplot and word cloud to analyze frequency of words in each category
Visualizations to compare number of positive and negative reviews across categories
Use the sentiment analysis of answers on various websites of stack exchanges to identify the type of category in which discussions happen
Predict the future usage of stack exchange websites by applying [] modeling techniques to predict the number of users and popularity (by posts, questions, answers)
Predict the reputation of users by analyzing historical data and recommendation for any kind of reward process to keep the stack exchange community engaged