YouTube Video Statistics and Comments Analysis

Anuja Jain, Jayati Kaul, Shatawari Jain, Sithara Vanmerimeethal Paleri

Introduction

YouTube displays a huge list of videos uploaded by all users in each country, from a new song of a popular artist to a new movie trailer, everything in between to help them see what’s happening on YouTube and around the world.

GOAL:

Analyze the top-performing channels in different categories thus finding key characteristics in them.

Data Overview

The two csv file contains data about channels and their videos. For this project we collected data on 28 channels and around 34,000 comments for videos distributed against 6 different categories.

Channel Statistics

Video Comments

Peer Comments Addressed

  1. Add visualizations?
  2. Will you be considering domain for videos ?
  3. Graphs to interpret channels, categories, age duration for all videos?
  4. Machine learning algorithms?
  5. Do you have the data for ‘device type’ in your API output?
  6. What kind of characteristics will you be evaluating and recommending to users?
  7. Region-wise analysis of sentiments can be done for political videos?
  8. Are you guys planning to categorize the comment to get the overall sentiment of the video?
  9. What other variables are you considering to validate your hypothesis?

Analytics plan Recap

Analyze the data based on comments, time of publishing data, Number of likes, View Count, subscriber’s etc.

Deep Dive Recap

  1. Sentiment analysis on comments to understand the relation between comments and performance of the video
  2. Using the analysis and EDA we run ML algorithm

Exploratory Data Analysis

Distribution of View Count Over Categories

Comments deviation based on Categories

Videos Distribution over the years based on Categories

Channel Uploading maximun number of videos on YouTube

Natural Language Processing

NLP on Video Comments and Channel Description

  1. Tokenization
  2. Pre-Processing
  3. Stop-words and Rare-words removal
  4. Word Cloud

Wordcloud for Video Comments

Wordcloud for Channel Description

Sentiment Analysis on Comments

Most Common Positive and Negative Words

Sentiment Based On Scores

Comment Sentiment Labels Based On Categories

Sentiment Analysis on Channel Description

Topic Modeling

  1. Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.
  2. We performed topic modeling on our data to more closely understand the bucket of words that define the textual content in our data.

Below the table shows the most probable words used channel wise:

Key Take Away

  1. By performing this exercise we became familiar with the technique of extracting data via the API’s.
  2. We did exploratory analysis on the data to figure out the generic trend and pattern in the data.
  3. We then analyzed this data to understand the sentiment and mood of the public who comments on the YouTube’s top performing channel’s videos.
  4. We also analyzed the text, to group the terms in multiple buckets based on their usage and frequency. In addition to video comments, we also performed sentiment analysis on channel description.