Lab4_CoreWeave

Introduction

For this assignment, I went ahead and collected and analyzed Youtube comments that are related to CoreWeave, the AI cloud infrastructure company, this continues a thread from my earlier labs(newsAPI sentiment analysis and Bluesky exercise) which tracked public talks around CoreWeave’s role in the AI infrastructure buildout

My Data Source

My data source is the comment section of a Big Technology Podcast episode on Youtube about a debate on CoreWeave (video ID: m1uh7Ka6868). This video is chosen because it features substantive discussion of CoreWeave’s business model and ultimately the broader debate around AI infrastructure the way the finances work, which made the comment section a good source of public opinion on the topic.

My Data Collection Process

I used the Youtube Data API v3, which was done by using the tuber package in R via the OAuth autnetication. These are the steps I used below:

Authenticating the Youtube Data API
Collecting the comments using ‘get_all_comments()’ 3.Save raw data to CSV
clean the text
Tokenixe
Remove stopwords
Look and measure the word frquency
Visualize via bar chart and the word cloud

library(tuber)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(stringr)
library(tidytext)
library(ggplot2)
library(wordcloud2)

comments <- read.csv("coreweave_comments_raw.csv", stringsAsFactors = FALSE)
nrow(comments)

## [1] 119

Summary of the collected data

My scraping of comments from the video produced about 119 total commetns from 76 unique commenters, posted between January and June 2026, with the average of about 2.19 likes per comment . After removing the stopwords, the data gave me 761 unqie tokens. The top three was “interview,” “coreweave,” and “providers,” followed closely by financially themed terms such as “company,” “money,” “cloud,” “data,” “gpus,” “chips,” and “bubble.” I think this means that the viewers are less engaged with the podcast format itself and more about hte underlying financial debate about CoreWeave’s business model.

texts <- comments %>% select(textOriginal)

clean_text <- texts %>%
  mutate(text = str_to_lower(textOriginal)) %>%
  mutate(text = str_replace_all(text, "[^a-z\\s]", " ")) %>%
  mutate(text = str_squish(text))

data("stop_words")

tokens <- clean_text %>%
  select(text) %>%
  unnest_tokens(word, text)

tokens_clean <- tokens %>%
  anti_join(stop_words, by = "word") %>%
  filter(str_length(word) > 2)

word_counts <- tokens_clean %>%
  count(word, sort = TRUE)

head(word_counts, 20)

##         word  n
## 1  interview 32
## 2  coreweave 20
## 3  providers 14
## 4    company 13
## 5      money 11
## 6       alex 10
## 7       guys 10
## 8      cloud  9
## 9       data  9
## 10       don  9
## 11 questions  9
## 12      gpus  8
## 13      tier  8
## 14    bubble  7
## 15     build  7
## 16     chips  7
## 17       gpu  7
## 18   michael  7
## 19   amazing  6
## 20  building  6

word_counts %>%
  slice_max(n, n = 20) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = n, y = word, fill = n)) +
  geom_col(show.legend = FALSE) +
  scale_fill_gradient(low = "#a1d99b", high = "#006d2c") +
  labs(
    title = "Top 20 Most Frequent Words in CoreWeave YouTube Comments",
    subtitle = "Source: Big Technology Podcast — CoreWeave Debate",
    x = "Frequency",
    y = NULL
  ) +
  theme_minimal(base_size = 13)

Text Analysis Visualization — Word Frequency Bar Chart

wc_data <- word_counts %>%
  filter(n >= 2) %>%
  slice_max(n, n = 100)

wordcloud2(
  data = wc_data,
  size = 0.6,
  color = "random-dark",
  backgroundColor = "white"
)

Text Analysis Visualization — Word Frequency Bar Chart

cat("=== Dataset Summary ===\n")

## === Dataset Summary ===

cat("Total comments collected:    ", nrow(comments), "\n")

## Total comments collected:     119

cat("Unique commenters:           ", n_distinct(comments$authorDisplayName), "\n")

## Unique commenters:            76

cat("Avg. likes per comment:      ", round(mean(comments$likeCount, na.rm = TRUE), 2), "\n")

## Avg. likes per comment:       2.19

cat("Total unique tokens:         ", nrow(word_counts), "\n")

## Total unique tokens:          761

cat("Top 3 words:                 ", paste(head(word_counts$word, 3), collapse = ", "), "\n")

## Top 3 words:                  interview, coreweave, providers

Interpretation

The word frequency results for comments on the debate video about CoreWeave show that audience engagement centered heaviuly on financial mechanics rather than the podcast content itself. Beyond expected terms like “coreweave,” “interview,” and “providers,” high-frequency words such as “money,” “gpus,” “chips,” “cloud,” “bubble,” and “crypto” point to a clear viewer preoccupation with how CoreWeave is financed rather than what it builds. This is a real ongoing debate in the financial media: CoreWeave has used Nvidia GPUs describe as central to looking at wheter AI infrastructure spending reflects durable demand or speculative excess (Quartz, 2026) Everyone is talking about the “bubble” and “crypto” which is among the top terms that are notable, since CoreWeave originated as a cryptocurrency mining operation before going to AI compute, and commenters appear to be trying to draw a connection between the history an skepticism about the durability of the current AI infrastructure boom. Overall, after lookign at the comment section, i cam to a conclusion that it serves less as a discussion of the podcasts content and more of something to go back and look at whetter GPU debt financing represents a true infrastructure investment or a repeat of past financial bubbles.

References

Quartz. (2026, May 8). GPU-collateralized debt explained: AI financing risks. https://qz.com/gpu-collateralized-debt-ai-neocloud-coreweave-financing-risks-050526

Lab4_CoreWeave_YouTube

Keith Concepcion

2026-06-26

Introduction

My Data Source

My Data Collection Process

Summary of the collected data

Text Analysis Visualization — Word Frequency Bar Chart

Text Analysis Visualization — Word Frequency Bar Chart

Interpretation

References