Overview

Row

Videos Analyzed

Total Views

781,567

Total Likes

17,539

Total Comments

6,390

Row

About This Dashboard

A digital methods analysis of 26 Chapo Trap House YouTube videos published between January 3 and February 10, 2026.

Data Collection: YouTube Data Tools (Rieder, 2015)

Thematic Coding: TF-IDF keyword extraction from video transcripts, classified into 7 categories (based on my skim-through due to time constraints, and on some insights from previous scholarship):

Higdon, N. & Lyons, J. (2022). “The Other Populist Media: The Rise of the Prog-Left and the Decline of Legacy Media?” Democratic Communiqué, 31(1). https://doi.org/10.7275/6x96-6y12
Semley, J. (2018). “The Dirtbag Manifesto.” Dissent Magazine. https://dissentmagazine.org/article/chapo-trap-house-book-dirtbag-manifesto-satire-liberalism-socialism/
Frost, A. (2016). “The Necessity of Political Vulgarity” — the “dirtbag left” discursive mode.

Tools: Python (pandas, scikit-learn) · Gephi · R (plotly, flexdashboard) · YouTube Data Tools

Theme Distribution

Thematic Analysis

Row

Videos per Theme

Average Views per Theme

Row

Thematic Findings

State Violence & Repression and Right-wing Culture Critique each account for 6 videos, which seem to be the dominant modes of CTH’s discourse in this short period. Anti-imperialism & Foreign Policy has only 3 videos but generates 141K views (the second-highest total), mostly covering Venezuela and Greenland. Elite Accountability (Epstein files, Elon Musk) generates the highest average views per video at ~45K.

What I find interesting is the gap between volume and attention. In this short period, CTH talks about state violence the most, but their audience gravitates toward elite scandal and foreign policy. The Epstein and Greenland content seem to punch way above its weight. I have seen something similar in some previous experiences analyzing far-right discourse on social media, but the audience (in question then) engage mostly with short videos, I am curious whether there’s much more complex mechanisms at play here.

Engagement

Row

Views × Likes (size = duration, color = theme)

Like Ratio by Theme

Row

Engagement by Content Format

Format Insights

Format	Videos	Avg Views	Avg Likes	Avg Like Ratio
Full Episode (>40 min)	7	48,893	1,130	2.31%
Segment (10–40 min)	10	28,233	565	2.00%
Short Clip (<10 min)	9	17,443	442	2.53%

Full Episodes account for 27% of videos but over 50% of total views. Short Clips have the highest like ratio (2.53%), suggesting stronger per-view engagement for bite-sized content.

Network

Row

How to Read This Network

Nodes = TF-IDF keywords extracted from video transcripts
Edges = keywords belonging to the same thematic cluster
Node size = TF-IDF weight (keyword importance)
Node color = thematic category

Key observations: ICE and Trump are the highest-frequency nodes. The Right-wing Culture Critique cluster (Scott Adams, Charlie Kirk, Kid Rock) forms a distinct community. Epstein connects to multiple elite figures. The Anti-imperialism cluster (Greenland, Denmark, NATO, Venezuela) is tightly cohesive. Network generated in Gephi with fruchterman-reingold layout (Fruchterman & Reingold, 1991). Honestly, I find this visualization more illustrative than analytically revealing. The clusters confirm the coding scheme rather than surfacing anything unexpected. My background is more in BERTopic and LLM-based topic modeling on larger (formal) datasets, where I can see some meaningful values from connections models predict. But with a podcast like CTH, topics constantly bleed into each other through irony and digression. I enjoyed more watching them and making notes, the network came somewhat unexpected. My themes here are from roughly skimming the videos, and from previous research. I wonder what would be a better approach.

Row

Network Visualization

Methodology

Row

Data Pipeline

1. Data Collection

YouTube Data Tools (Rieder, 2015)
Channel: Chapo Trap House
Period: January 3 – February 10, 2026
N = 26 videos with full metadata and auto-generated transcripts

2. TF-IDF Keyword Extraction

Transcripts cleaned (filler words, contraction fragments removed)
scikit-learn TfidfVectorizer with custom podcast stopword list
Bigram-priority extraction to preserve named entities (e.g., “Charlie Kirk” not split into “Charlie” + “Kirk”)
Transcripts grouped by theme before TF-IDF computation

3. Thematic Coding

Seven categories coded:

Theme	Description
State Violence & Repression	ICE raids, police killings, militarized enforcement
Elite Accountability	Epstein files, corporate bribery, oligarch exposure
Right-wing Culture Critique	Satirical deconstruction of conservative cultural production
Political Figure Ridicule	Ad hominem satirical commentary on individual politicians
Anti-imperialism & Foreign Policy	Critique of US imperial power and interventionism
Liberal Establishment Critique	Critique of centrist Democrats
Liberal Media Critique	Critique of mainstream media complicity

4. Network Analysis

Keyword co-occurrence within themes (intra-theme edges)
Visualized in Gephi (ForceAtlas2 layout)

5. Tools

Python (pandas, scikit-learn) · Gephi · R (plotly, flexdashboard) · YouTube Data Tools

---
title: "CTH YouTube Discourse Analysis"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
    theme: flatly
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(plotly)
library(dplyr)

engagement <- read.csv("engagement.csv", stringsAsFactors = FALSE)
format_stats <- read.csv("format_stats.csv", stringsAsFactors = FALSE)

theme_colors <- c(
  "State Violence & Repression"       = "#e63946",
  "Elite Accountability"              = "#8338ec",
  "Right-wing Culture Critique"       = "#2a9d8f",
  "Political Figure Ridicule"         = "#f4a261",
  "Anti-imperialism & Foreign Policy" = "#e76f51",
  "Liberal Establishment Critique"    = "#457b9d",
  "Liberal Media Critique"            = "#264653"
)

theme_summary <- engagement %>%
  group_by(theme) %>%
  summarise(
    n_videos    = n(),
    total_views = sum(views),
    avg_views   = round(mean(views)),
    avg_likes   = round(mean(likes)),
    .groups = "drop"
  ) %>%
  arrange(desc(total_views))
```

Overview
=====================================

Row {data-height=120}
-------------------------------------

### Videos Analyzed
```{r}
valueBox(26, icon = "fa-video", color = "#2a9d8f")
```

### Total Views
```{r}
valueBox(format(sum(engagement$views), big.mark = ","), icon = "fa-eye", color = "#e76f51")
```

### Total Likes
```{r}
valueBox(format(sum(engagement$likes), big.mark = ","), icon = "fa-thumbs-up", color = "#f4a261")
```

### Total Comments
```{r}
valueBox(format(sum(engagement$comments), big.mark = ","), icon = "fa-comment", color = "#457b9d")
```

Row {data-height=500}
-------------------------------------

### About This Dashboard

A digital methods analysis of **26 Chapo Trap House YouTube videos** published between January 3 and February 10, 2026.

**Data Collection**: YouTube Data Tools (Rieder, 2015)

**Thematic Coding**: TF-IDF keyword extraction from video transcripts, classified into 7 categories (based on my skim-through due to time constraints, and on some insights from previous scholarship):

- Higdon, N. & Lyons, J. (2022). "The Other Populist Media: The Rise of the Prog-Left and the Decline of Legacy Media?" Democratic Communiqué, 31(1). https://doi.org/10.7275/6x96-6y12
- Semley, J. (2018). "The Dirtbag Manifesto." Dissent Magazine. https://dissentmagazine.org/article/chapo-trap-house-book-dirtbag-manifesto-satire-liberalism-socialism/
- Frost, A. (2016). "The Necessity of Political Vulgarity" — the "dirtbag left" discursive mode.

**Tools**: Python (pandas, scikit-learn) · Gephi · R (plotly, flexdashboard) · YouTube Data Tools

### Theme Distribution
```{r}
plot_ly(
  theme_summary,
  y = ~reorder(theme, total_views),
  x = ~total_views,
  type = "bar",
  orientation = "h",
  marker = list(color = theme_colors[theme_summary$theme]),
  text = ~paste0(format(total_views, big.mark = ","), " views (", n_videos, " videos)"),
  textposition = "outside",
  hoverinfo = "text"
) %>%
  layout(
    xaxis = list(title = "Total Views"),
    yaxis = list(title = ""),
    margin = list(l = 220)
  )
```

Thematic Analysis
=====================================

Row {data-height=500}
-------------------------------------

### Videos per Theme
```{r}
plot_ly(
  theme_summary,
  y = ~reorder(theme, n_videos),
  x = ~n_videos,
  type = "bar",
  orientation = "h",
  marker = list(color = theme_colors[theme_summary$theme]),
  text = ~paste0(n_videos, " videos"),
  textposition = "outside",
  hoverinfo = "text"
) %>%
  layout(
    xaxis = list(title = "Number of Videos"),
    yaxis = list(title = ""),
    margin = list(l = 220)
  )
```

### Average Views per Theme
```{r}
plot_ly(
  theme_summary,
  y = ~reorder(theme, avg_views),
  x = ~avg_views,
  type = "bar",
  orientation = "h",
  marker = list(color = theme_colors[theme_summary$theme]),
  text = ~paste0(format(avg_views, big.mark = ","), " avg views"),
  textposition = "outside",
  hoverinfo = "text"
) %>%
  layout(
    xaxis = list(title = "Average Views per Video"),
    yaxis = list(title = ""),
    margin = list(l = 220)
  )
```

Row {data-height=100}
-------------------------------------

### Thematic Findings

**State Violence & Repression** and **Right-wing Culture Critique** each account for 6 videos, which seem to be the dominant modes of CTH's discourse in this short period. **Anti-imperialism & Foreign Policy** has only 3 videos but generates 141K views (the second-highest total), mostly covering Venezuela and Greenland. **Elite Accountability** (Epstein files, Elon Musk) generates the highest average views per video at ~45K.

What I find interesting is the gap between volume and attention. In this short period, CTH talks about state violence the most, but their audience gravitates toward elite scandal and foreign policy. The Epstein and Greenland content seem to punch way above its weight. I have seen something similar in some previous experiences analyzing far-right discourse on social media, but the audience (in question then) engage mostly with short videos, I am curious whether there's much more complex mechanisms at play here.

Engagement
=====================================

Row {data-height=550}
-------------------------------------

### Views × Likes (size = duration, color = theme)
```{r}
plot_ly(
  engagement,
  x = ~views,
  y = ~likes,
  size = ~duration,
  color = ~theme,
  colors = theme_colors,
  type = "scatter",
  mode = "markers",
  marker = list(
    opacity = 0.8,
    sizemode = "diameter",
    sizeref = 2,
    line = list(width = 1, color = "#fff")
  ),
  text = ~paste0(
    "<b>", title, "</b><br>",
    "Views: ", format(views, big.mark = ","), "<br>",
    "Likes: ", format(likes, big.mark = ","), "<br>",
    "Duration: ", round(duration, 0), " min<br>",
    "Theme: ", theme
  ),
  hoverinfo = "text"
) %>%
  layout(
    xaxis = list(title = "Views"),
    yaxis = list(title = "Likes"),
    legend = list(orientation = "h", y = -0.15)
  )
```

### Like Ratio by Theme
```{r}
plot_ly(
  engagement,
  x = ~theme,
  y = ~like_ratio,
  color = ~theme,
  colors = theme_colors,
  type = "scatter",
  mode = "markers",
  marker = list(size = 14, opacity = 0.7),
  text = ~paste0(title, "<br>", like_ratio, "%"),
  hoverinfo = "text"
) %>%
  layout(
    xaxis = list(title = "", tickangle = -30),
    yaxis = list(title = "Like Ratio (%)"),
    showlegend = FALSE,
    margin = list(b = 120)
  )
```

Row {data-height=350}
-------------------------------------

### Engagement by Content Format
```{r}
format_stats$format <- factor(
  format_stats$format,
  levels = c("Short Clip", "Segment", "Full Episode")
)

plot_ly(format_stats, x = ~format) %>%
  add_bars(y = ~avg_views, name = "Avg Views", marker = list(color = "#2a9d8f")) %>%
  add_bars(y = ~avg_likes, name = "Avg Likes", marker = list(color = "#e76f51")) %>%
  layout(
    yaxis = list(title = "Count"),
    barmode = "group",
    legend = list(orientation = "h", y = -0.15)
  )
```

### Format Insights

| Format | Videos | Avg Views | Avg Likes | Avg Like Ratio |
|--------|--------|-----------|-----------|----------------|
| Full Episode (>40 min) | 7 | 48,893 | 1,130 | 2.31% |
| Segment (10–40 min) | 10 | 28,233 | 565 | 2.00% |
| Short Clip (<10 min) | 9 | 17,443 | 442 | 2.53% |

**Full Episodes** account for 27% of videos but over 50% of total views. **Short Clips** have the highest like ratio (2.53%), suggesting stronger per-view engagement for bite-sized content.


Network
=====================================

Row {data-height=400}
-------------------------------------

### How to Read This Network

- **Nodes** = TF-IDF keywords extracted from video transcripts
- **Edges** = keywords belonging to the same thematic cluster
- **Node size** = TF-IDF weight (keyword importance)
- **Node color** = thematic category

**Key observations:** **ICE** and **Trump** are the highest-frequency nodes. The **Right-wing Culture Critique** cluster (Scott Adams, Charlie Kirk, Kid Rock) forms a distinct community. **Epstein** connects to multiple elite figures. The **Anti-imperialism** cluster (Greenland, Denmark, NATO, Venezuela) is tightly cohesive. Network generated in **Gephi** with fruchterman-reingold layout (Fruchterman & Reingold, 1991). Honestly, I find this visualization more illustrative than analytically revealing. The clusters confirm the coding scheme rather than surfacing anything unexpected. My background is more in BERTopic and LLM-based topic modeling on larger (formal) datasets, where I can see some meaningful values from connections models predict. But with a podcast like CTH, topics constantly bleed into each other through irony and digression. I enjoyed more watching them and making notes, the network came somewhat unexpected. My themes here are from roughly skimming the videos, and from previous research. I wonder what would be a better approach.

Row {data-height=600}
-------------------------------------

### Network Visualization
```{r, out.width="100%", fig.align="center"}
knitr::include_graphics("Topic.png")
```


Methodology
=====================================

Row
-------------------------------------

### Data Pipeline

**1. Data Collection**

- YouTube Data Tools (Rieder, 2015)
- Channel: Chapo Trap House
- Period: January 3 – February 10, 2026
- N = 26 videos with full metadata and auto-generated transcripts

**2. TF-IDF Keyword Extraction**

- Transcripts cleaned (filler words, contraction fragments removed)
- scikit-learn TfidfVectorizer with custom podcast stopword list
- Bigram-priority extraction to preserve named entities (e.g., "Charlie Kirk" not split into "Charlie" + "Kirk")
- Transcripts grouped by theme before TF-IDF computation

**3. Thematic Coding**

Seven categories coded:

| Theme | Description | 
|-------|-------------|
| State Violence & Repression | ICE raids, police killings, militarized enforcement |
| Elite Accountability | Epstein files, corporate bribery, oligarch exposure | 
| Right-wing Culture Critique | Satirical deconstruction of conservative cultural production | 
| Political Figure Ridicule | Ad hominem satirical commentary on individual politicians | 
| Anti-imperialism & Foreign Policy | Critique of US imperial power and interventionism | 
| Liberal Establishment Critique | Critique of centrist Democrats | 
| Liberal Media Critique | Critique of mainstream media complicity | 

**4. Network Analysis**

- Keyword co-occurrence within themes (intra-theme edges)
- Visualized in Gephi (ForceAtlas2 layout)

**5. Tools**

Python (pandas, scikit-learn) · Gephi · R (plotly, flexdashboard) · YouTube Data Tools