A digital methods analysis of 26 Chapo Trap House YouTube videos published between January 3 and February 10, 2026.
Data Collection: YouTube Data Tools (Rieder, 2015)
Thematic Coding: TF-IDF keyword extraction from video transcripts, classified into 7 categories (based on my skim-through due to time constraints, and on some insights from previous scholarship):
Tools: Python (pandas, scikit-learn) · Gephi · R (plotly, flexdashboard) · YouTube Data Tools
State Violence & Repression and Right-wing Culture Critique each account for 6 videos, which seem to be the dominant modes of CTH’s discourse in this short period. Anti-imperialism & Foreign Policy has only 3 videos but generates 141K views (the second-highest total), mostly covering Venezuela and Greenland. Elite Accountability (Epstein files, Elon Musk) generates the highest average views per video at ~45K.
What I find interesting is the gap between volume and attention. In this short period, CTH talks about state violence the most, but their audience gravitates toward elite scandal and foreign policy. The Epstein and Greenland content seem to punch way above its weight. I have seen something similar in some previous experiences analyzing far-right discourse on social media, but the audience (in question then) engage mostly with short videos, I am curious whether there’s much more complex mechanisms at play here.
| Format | Videos | Avg Views | Avg Likes | Avg Like Ratio |
|---|---|---|---|---|
| Full Episode (>40 min) | 7 | 48,893 | 1,130 | 2.31% |
| Segment (10–40 min) | 10 | 28,233 | 565 | 2.00% |
| Short Clip (<10 min) | 9 | 17,443 | 442 | 2.53% |
Full Episodes account for 27% of videos but over 50% of total views. Short Clips have the highest like ratio (2.53%), suggesting stronger per-view engagement for bite-sized content.
Key observations: ICE and Trump are the highest-frequency nodes. The Right-wing Culture Critique cluster (Scott Adams, Charlie Kirk, Kid Rock) forms a distinct community. Epstein connects to multiple elite figures. The Anti-imperialism cluster (Greenland, Denmark, NATO, Venezuela) is tightly cohesive. Network generated in Gephi with fruchterman-reingold layout (Fruchterman & Reingold, 1991). Honestly, I find this visualization more illustrative than analytically revealing. The clusters confirm the coding scheme rather than surfacing anything unexpected. My background is more in BERTopic and LLM-based topic modeling on larger (formal) datasets, where I can see some meaningful values from connections models predict. But with a podcast like CTH, topics constantly bleed into each other through irony and digression. I enjoyed more watching them and making notes, the network came somewhat unexpected. My themes here are from roughly skimming the videos, and from previous research. I wonder what would be a better approach.
1. Data Collection
2. TF-IDF Keyword Extraction
3. Thematic Coding
Seven categories coded:
| Theme | Description |
|---|---|
| State Violence & Repression | ICE raids, police killings, militarized enforcement |
| Elite Accountability | Epstein files, corporate bribery, oligarch exposure |
| Right-wing Culture Critique | Satirical deconstruction of conservative cultural production |
| Political Figure Ridicule | Ad hominem satirical commentary on individual politicians |
| Anti-imperialism & Foreign Policy | Critique of US imperial power and interventionism |
| Liberal Establishment Critique | Critique of centrist Democrats |
| Liberal Media Critique | Critique of mainstream media complicity |
4. Network Analysis
5. Tools
Python (pandas, scikit-learn) · Gephi · R (plotly, flexdashboard) · YouTube Data Tools
---
title: "CTH YouTube Discourse Analysis"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
theme: flatly
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(plotly)
library(dplyr)
engagement <- read.csv("engagement.csv", stringsAsFactors = FALSE)
format_stats <- read.csv("format_stats.csv", stringsAsFactors = FALSE)
theme_colors <- c(
"State Violence & Repression" = "#e63946",
"Elite Accountability" = "#8338ec",
"Right-wing Culture Critique" = "#2a9d8f",
"Political Figure Ridicule" = "#f4a261",
"Anti-imperialism & Foreign Policy" = "#e76f51",
"Liberal Establishment Critique" = "#457b9d",
"Liberal Media Critique" = "#264653"
)
theme_summary <- engagement %>%
group_by(theme) %>%
summarise(
n_videos = n(),
total_views = sum(views),
avg_views = round(mean(views)),
avg_likes = round(mean(likes)),
.groups = "drop"
) %>%
arrange(desc(total_views))
```
Overview
=====================================
Row {data-height=120}
-------------------------------------
### Videos Analyzed
```{r}
valueBox(26, icon = "fa-video", color = "#2a9d8f")
```
### Total Views
```{r}
valueBox(format(sum(engagement$views), big.mark = ","), icon = "fa-eye", color = "#e76f51")
```
### Total Likes
```{r}
valueBox(format(sum(engagement$likes), big.mark = ","), icon = "fa-thumbs-up", color = "#f4a261")
```
### Total Comments
```{r}
valueBox(format(sum(engagement$comments), big.mark = ","), icon = "fa-comment", color = "#457b9d")
```
Row {data-height=500}
-------------------------------------
### About This Dashboard
A digital methods analysis of **26 Chapo Trap House YouTube videos** published between January 3 and February 10, 2026.
**Data Collection**: YouTube Data Tools (Rieder, 2015)
**Thematic Coding**: TF-IDF keyword extraction from video transcripts, classified into 7 categories (based on my skim-through due to time constraints, and on some insights from previous scholarship):
- Higdon, N. & Lyons, J. (2022). "The Other Populist Media: The Rise of the Prog-Left and the Decline of Legacy Media?" Democratic Communiqué, 31(1). https://doi.org/10.7275/6x96-6y12
- Semley, J. (2018). "The Dirtbag Manifesto." Dissent Magazine. https://dissentmagazine.org/article/chapo-trap-house-book-dirtbag-manifesto-satire-liberalism-socialism/
- Frost, A. (2016). "The Necessity of Political Vulgarity" — the "dirtbag left" discursive mode.
**Tools**: Python (pandas, scikit-learn) · Gephi · R (plotly, flexdashboard) · YouTube Data Tools
### Theme Distribution
```{r}
plot_ly(
theme_summary,
y = ~reorder(theme, total_views),
x = ~total_views,
type = "bar",
orientation = "h",
marker = list(color = theme_colors[theme_summary$theme]),
text = ~paste0(format(total_views, big.mark = ","), " views (", n_videos, " videos)"),
textposition = "outside",
hoverinfo = "text"
) %>%
layout(
xaxis = list(title = "Total Views"),
yaxis = list(title = ""),
margin = list(l = 220)
)
```
Thematic Analysis
=====================================
Row {data-height=500}
-------------------------------------
### Videos per Theme
```{r}
plot_ly(
theme_summary,
y = ~reorder(theme, n_videos),
x = ~n_videos,
type = "bar",
orientation = "h",
marker = list(color = theme_colors[theme_summary$theme]),
text = ~paste0(n_videos, " videos"),
textposition = "outside",
hoverinfo = "text"
) %>%
layout(
xaxis = list(title = "Number of Videos"),
yaxis = list(title = ""),
margin = list(l = 220)
)
```
### Average Views per Theme
```{r}
plot_ly(
theme_summary,
y = ~reorder(theme, avg_views),
x = ~avg_views,
type = "bar",
orientation = "h",
marker = list(color = theme_colors[theme_summary$theme]),
text = ~paste0(format(avg_views, big.mark = ","), " avg views"),
textposition = "outside",
hoverinfo = "text"
) %>%
layout(
xaxis = list(title = "Average Views per Video"),
yaxis = list(title = ""),
margin = list(l = 220)
)
```
Row {data-height=100}
-------------------------------------
### Thematic Findings
**State Violence & Repression** and **Right-wing Culture Critique** each account for 6 videos, which seem to be the dominant modes of CTH's discourse in this short period. **Anti-imperialism & Foreign Policy** has only 3 videos but generates 141K views (the second-highest total), mostly covering Venezuela and Greenland. **Elite Accountability** (Epstein files, Elon Musk) generates the highest average views per video at ~45K.
What I find interesting is the gap between volume and attention. In this short period, CTH talks about state violence the most, but their audience gravitates toward elite scandal and foreign policy. The Epstein and Greenland content seem to punch way above its weight. I have seen something similar in some previous experiences analyzing far-right discourse on social media, but the audience (in question then) engage mostly with short videos, I am curious whether there's much more complex mechanisms at play here.
Engagement
=====================================
Row {data-height=550}
-------------------------------------
### Views × Likes (size = duration, color = theme)
```{r}
plot_ly(
engagement,
x = ~views,
y = ~likes,
size = ~duration,
color = ~theme,
colors = theme_colors,
type = "scatter",
mode = "markers",
marker = list(
opacity = 0.8,
sizemode = "diameter",
sizeref = 2,
line = list(width = 1, color = "#fff")
),
text = ~paste0(
"<b>", title, "</b><br>",
"Views: ", format(views, big.mark = ","), "<br>",
"Likes: ", format(likes, big.mark = ","), "<br>",
"Duration: ", round(duration, 0), " min<br>",
"Theme: ", theme
),
hoverinfo = "text"
) %>%
layout(
xaxis = list(title = "Views"),
yaxis = list(title = "Likes"),
legend = list(orientation = "h", y = -0.15)
)
```
### Like Ratio by Theme
```{r}
plot_ly(
engagement,
x = ~theme,
y = ~like_ratio,
color = ~theme,
colors = theme_colors,
type = "scatter",
mode = "markers",
marker = list(size = 14, opacity = 0.7),
text = ~paste0(title, "<br>", like_ratio, "%"),
hoverinfo = "text"
) %>%
layout(
xaxis = list(title = "", tickangle = -30),
yaxis = list(title = "Like Ratio (%)"),
showlegend = FALSE,
margin = list(b = 120)
)
```
Row {data-height=350}
-------------------------------------
### Engagement by Content Format
```{r}
format_stats$format <- factor(
format_stats$format,
levels = c("Short Clip", "Segment", "Full Episode")
)
plot_ly(format_stats, x = ~format) %>%
add_bars(y = ~avg_views, name = "Avg Views", marker = list(color = "#2a9d8f")) %>%
add_bars(y = ~avg_likes, name = "Avg Likes", marker = list(color = "#e76f51")) %>%
layout(
yaxis = list(title = "Count"),
barmode = "group",
legend = list(orientation = "h", y = -0.15)
)
```
### Format Insights
| Format | Videos | Avg Views | Avg Likes | Avg Like Ratio |
|--------|--------|-----------|-----------|----------------|
| Full Episode (>40 min) | 7 | 48,893 | 1,130 | 2.31% |
| Segment (10–40 min) | 10 | 28,233 | 565 | 2.00% |
| Short Clip (<10 min) | 9 | 17,443 | 442 | 2.53% |
**Full Episodes** account for 27% of videos but over 50% of total views. **Short Clips** have the highest like ratio (2.53%), suggesting stronger per-view engagement for bite-sized content.
Network
=====================================
Row {data-height=400}
-------------------------------------
### How to Read This Network
- **Nodes** = TF-IDF keywords extracted from video transcripts
- **Edges** = keywords belonging to the same thematic cluster
- **Node size** = TF-IDF weight (keyword importance)
- **Node color** = thematic category
**Key observations:** **ICE** and **Trump** are the highest-frequency nodes. The **Right-wing Culture Critique** cluster (Scott Adams, Charlie Kirk, Kid Rock) forms a distinct community. **Epstein** connects to multiple elite figures. The **Anti-imperialism** cluster (Greenland, Denmark, NATO, Venezuela) is tightly cohesive. Network generated in **Gephi** with fruchterman-reingold layout (Fruchterman & Reingold, 1991). Honestly, I find this visualization more illustrative than analytically revealing. The clusters confirm the coding scheme rather than surfacing anything unexpected. My background is more in BERTopic and LLM-based topic modeling on larger (formal) datasets, where I can see some meaningful values from connections models predict. But with a podcast like CTH, topics constantly bleed into each other through irony and digression. I enjoyed more watching them and making notes, the network came somewhat unexpected. My themes here are from roughly skimming the videos, and from previous research. I wonder what would be a better approach.
Row {data-height=600}
-------------------------------------
### Network Visualization
```{r, out.width="100%", fig.align="center"}
knitr::include_graphics("Topic.png")
```
Methodology
=====================================
Row
-------------------------------------
### Data Pipeline
**1. Data Collection**
- YouTube Data Tools (Rieder, 2015)
- Channel: Chapo Trap House
- Period: January 3 – February 10, 2026
- N = 26 videos with full metadata and auto-generated transcripts
**2. TF-IDF Keyword Extraction**
- Transcripts cleaned (filler words, contraction fragments removed)
- scikit-learn TfidfVectorizer with custom podcast stopword list
- Bigram-priority extraction to preserve named entities (e.g., "Charlie Kirk" not split into "Charlie" + "Kirk")
- Transcripts grouped by theme before TF-IDF computation
**3. Thematic Coding**
Seven categories coded:
| Theme | Description |
|-------|-------------|
| State Violence & Repression | ICE raids, police killings, militarized enforcement |
| Elite Accountability | Epstein files, corporate bribery, oligarch exposure |
| Right-wing Culture Critique | Satirical deconstruction of conservative cultural production |
| Political Figure Ridicule | Ad hominem satirical commentary on individual politicians |
| Anti-imperialism & Foreign Policy | Critique of US imperial power and interventionism |
| Liberal Establishment Critique | Critique of centrist Democrats |
| Liberal Media Critique | Critique of mainstream media complicity |
**4. Network Analysis**
- Keyword co-occurrence within themes (intra-theme edges)
- Visualized in Gephi (ForceAtlas2 layout)
**5. Tools**
Python (pandas, scikit-learn) · Gephi · R (plotly, flexdashboard) · YouTube Data Tools