Introduction

The modern music industry is no longer driven solely by official releases. Social media, live performances, and online fan communities can generate massive attention for songs long before they become available on streaming platforms. A recent example is “ARE YOU ENTERTAINED?” by Sueco featuring Jeris Johnson, which was first performed live during the 2025 Warped Tour before its official release. Fans recorded the performance, shared clips across social media, and discussed the song extensively online, helping it go viral despite it being unreleased. Due to reported creative differences between the artists, the song’s official release was delayed for nearly a year, allowing anticipation and online discussion to continue growing before it was finally released.

This project explores how online audiences respond to music by analyzing comments from the official YouTube music video. Using the YouTube Data API, comments were collected, cleaned, and analyzed through word frequency analysis. The findings were then presented using a frequency table, bar chart, and word cloud to identify the most common topics, emotions, and reactions expressed by viewers. This demonstrates how data analysis can be used to better understand fan engagement and public reception within today’s digital music industry.

Video used: https://www.youtube.com/watch?v=7iskEPUCtSE

Load Packages

library(dplyr)
library(readr)
library(tidytext)
library(stringr)
library(ggplot2)
library(wordcloud)
library(RColorBrewer)
library(knitr)

Load Saved Comments

comments_clean <- read_csv("youtube_comments.csv")

head(comments_clean)
## # A tibble: 6 × 5
##   authorDisplayName textOriginal             publishedAt         likeCount id   
##   <chr>             <chr>                    <dttm>                  <dbl> <chr>
## 1 @MikeNiemand-r1v  "Hello again. I am into… 2026-06-27 10:22:41         0 UgwK…
## 2 @turtle13x57      "This song fucking rule… 2026-06-26 20:46:27         0 Ugxw…
## 3 @EvilWilma        "Pay the limo company S… 2026-06-25 21:27:33         1 UgwO…
## 4 @relbuldii        "Damn this band went bi… 2026-06-25 16:46:44         0 Ugw6…
## 5 @GeminiWalker-s3x "Yes!!!!!! Get it \U000… 2026-06-25 15:05:03         0 Ugxa…
## 6 @Darkpilgrimdan   "damn good song  (Sueco… 2026-06-24 07:35:23         0 Ugwr…
glimpse(comments_clean)
## Rows: 614
## Columns: 5
## $ authorDisplayName <chr> "@MikeNiemand-r1v", "@turtle13x57", "@EvilWilma", "@…
## $ textOriginal      <chr> "Hello again. I am into the second round with beauti…
## $ publishedAt       <dttm> 2026-06-27 10:22:41, 2026-06-26 20:46:27, 2026-06-2…
## $ likeCount         <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0…
## $ id                <chr> "UgwKkJx9dTii9f8sU9V4AaABAg", "Ugxw-SkdLGTDLomwA4Z4A…

Number of Comments Collected

nrow(comments_clean)
## [1] 614

Clean Text for Analysis

data(stop_words)

words <- comments_clean %>%
  select(textOriginal) %>%
  unnest_tokens(word, textOriginal) %>%
  anti_join(stop_words, by = "word") %>%
  filter(!str_detect(word, "^[0-9]+$")) %>%
  filter(str_length(word) > 2)

Word Frequency Analysis

word_freq <- words %>%
  count(word, sort = TRUE)

head(word_freq, 20)
## # A tibble: 20 × 2
##    word            n
##    <chr>       <int>
##  1 song          102
##  2 sueco          86
##  3 love           64
##  4 music          41
##  5 entertained    33
##  6 jeris          33
##  7 wait           24
##  8 banger         22
##  9 finally        22
## 10 bro            21
## 11 tour           21
## 12 waiting        21
## 13 fucking        20
## 14 fan            16
## 15 fire           16
## 16 i’m            16
## 17 listening      16
## 18 live           16
## 19 time           16
## 20 album          15

Top 20 Most Common Words

kable(
  head(word_freq, 20),
  caption = "Top 20 Most Common Words"
)
Top 20 Most Common Words
word n
song 102
sueco 86
love 64
music 41
entertained 33
jeris 33
wait 24
banger 22
finally 22
bro 21
tour 21
waiting 21
fucking 20
fan 16
fire 16
i’m 16
listening 16
live 16
time 16
album 15

Bar Chart

word_freq %>%
  slice_max(n, n = 15) %>%
  ggplot(aes(x = reorder(word, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 15 Most Common Words in YouTube Comments",
    x = "Word",
    y = "Frequency"
  ) +
  theme_minimal()

Word Cloud

set.seed(123)

wordcloud(
  words = word_freq$word,
  freq = word_freq$n,
  min.freq = 2,
  max.words = 100,
  random.order = FALSE,
  colors = brewer.pal(8, "Dark2")
)

Results and Discussion

A total of 614 YouTube comments were collected from the selected video.

The word frequency analysis shows which words appeared most often in the comments. The most common words included song, Sueco, love, music, entertained, Jeris, wait, banger, and finally. These words suggest that viewers were mainly discussing the song itself, the artists, and their excitement about the release.

The table and bar chart make the most frequent words easy to compare, while the word cloud gives a visual summary of the comments. Larger words in the word cloud appeared more often, showing which ideas were most common among viewers.

Based on the results, many viewers had positive reactions to the video. Words such as love, banger, finally, and song suggest that fans were excited about the release and enjoyed the music. The frequent appearance of Sueco and Jeris also shows that viewers were directly engaging with the artists involved.

Overall, the comments show strong audience engagement. Even though many YouTube comments are short, they still provide useful information about audience opinions, common reactions, and discussion trends surrounding the music video.

Conclusion

This project demonstrated how text mining techniques can be applied to social media data to better understand audience engagement. By collecting YouTube comments with the YouTube Data API and performing word frequency analysis, it was possible to identify the most common themes and reactions expressed by viewers. The table, bar chart, and word cloud provided clear visualizations that made these patterns easy to interpret.

The results show that even short, informal comments can provide meaningful insights into audience opinions and discussion trends. This type of analysis can help researchers, content creators, and businesses better understand how people respond to online content. Overall, the project highlights the value of combining data collection, data cleaning, and visualization techniques to transform unstructured text into useful information for analysis and decision-making.