Introduction

For this assignment, I looked up a watch video on Youtube to see the comments within it. I was curious to see what watches people recommend.

Data Source Description

For this analysis, I collected user-generated comments from a YouTube video related a watch Youtuber discusing “20 Watches Under $1,000 That Look Like THey Cost $10,000.” These sorts of video have been recently been playing in the background I work. I looked up one video about a watch and now my algorithm suggest watch videos. I just so happen to find them now to be interesting and kind of relaxing while I work. I really don’t know why.

This particular talks about 20 watches that this person recommends, so I found it interesting to see what commenters will mention or recommend in the comments.

# This now correctly references 'comments_df'
# echo=TRUE forces the HTML to show this code snippet AND the data table
head(comments_df)
## # A tibble: 6 × 5
##   authorDisplayName textOriginal                     publishedAt likeCount id   
##   <chr>             <chr>                            <chr>           <int> <chr>
## 1 @TheWatchBros     Loved this breakdown of watches… 2026-02-11…        42 Ugw5…
## 2 @seabiscuit726142 Ditch the TRUE GMT term, use TR… 2026-06-26…         0 Ugyz…
## 3 @blackknight5110  ‏‪22:19‬‏ i have mr.jon silent …     2026-06-25…         0 Ugzt…
## 4 @davidbennett4395 Again you have not mentioned wi… 2026-06-25…         0 Ugzq…
## 5 @donpatisson      Interesting and certainly good … 2026-06-25…         0 UgyK…
## 6 @davidbennett4395 I noticed you did not put Wise … 2026-06-25…         0 UgzP…

Text Analysis: Word Frequency & Visualization

Below you’ll find some visualizations of the top words that were used. The word frequencies shows the top words are “watch” or “watches” which is understandable, but it was interesting to see other watch brands as well in there which was the main point of the Youtube comment scraping.

# Tokenize using comments_df and the exact column name 'textOriginal'
tokens <- comments_df %>%
  unnest_tokens(word, textOriginal) %>%
  anti_join(stop_words, by = "word")

# Frequency Table
top_words <- tokens %>%
  count(word, sort = TRUE)

# Word Cloud
wordcloud(words = top_words$word, 
          freq = top_words$n, 
          min.freq = 2, 
          max.words = 50, 
          colors = brewer.pal(8, "Dark2"))

# Bar Graph (Top 10 Words)
top_words %>%
  head(10) %>% 
  mutate(word = reorder(word, n)) %>% 
  ggplot(aes(x = word, y = n)) +
  geom_col(fill = "green") + 
  coord_flip() +
  labs(title = "Top 10 Most Frequent Words in Comments",
       x = "Word",
       y = "Frequency") +
  theme_minimal()

Interpretation of Findings

Other than the typical words like “watch” or “watches,” I see quite a bit of brands like “Certina,” “Bulova,” “Rolex,” “Seiko,” and “Tudor.” I was interested to see such a wide range of watch brands from really expensive like Rolex and Tudor to Bulova.