3/25/2021
Data is retrieved using the YouTube API for YouTube channels which needs to be setup; details of which are in this link - https://developers.google.com/youtube/v3/getting-started
The data includes video details like number of videos and views, subscription details, count of the likes and dislikes on videos, comments, video suggestions, channel statistics, etc
Data Collection procedure
Create account on Google Cloud Platform and enable the YouTube API services
Create the API key and enable Key restrictions to prevent unauthorized use and quota theft
Pull the data using functions like get_stats, get_comment_threads, get_subscriptions etc
Cleaning the data by removal of stop words from comments
Understand user sentiments from the likes and dislikes on videos in respective categories
Use text analysis to classify comments as positive, negative and neutral
Analyse impact of words in a video title on the view count
Use of NLP to perform tokenization, stop words removal
Libraries to be used : tidyverse, tidytext, wordcloud, ggplot2. Ggplot and word cloud to analyze frequency of words in each category
Visualizations to compare number of positive and negative reviews across categories
Evaluate percentage of positive and negative comments
Use a classification model to classify comments into positive, neutral & negative, and regression to predict number of views
Use statistics like r-square, adjusted r-square, p-value etc. to evaluate our model