Among the most widely known social media apps, X uniquely enables individuals to easily share their thoughts, opinions, and news. Political figures in particular have taken to the platform to share updates on policy, making it easier for the public to be informed about where politicians stand on such matters. But what factors determine which policy issues political leaders discuss more frequently than others?
Conceptually grounded in the First-Level Agenda-Setting and Framing media theories, this study explores which of three topics individual members of the current U.S. Congress have posted about most often on their official X (formerly Twitter) accounts. The topics in question are Immigration, Economy, and the order to release the Epstein Files. Drawing upon the two media theories, the study examines which topics members have prioritized, how members have constructed narratives around the topics, and how political party affiliation and external political developments have influenced topic prioritization and narrative construction over time.
The data for the study consist of all X.com posts by members of the 119th United States Congress since Jan. 1, 2025 - more than 519,000 posts in all. The Scholars Week poster will include posts collected through March 21, the week preceding the poster session. The posts are being gathered via a Brandwatch query made possible by the MTSU School of Journalism and Strategic Media’s Social Media Insights lab. A custom R script was developed to compile and analyze the data. Posts from the first two weeks of September 2025 are missing, due to a failure in the connection between X.com and Brandwatch. We are investigating ways to acquire the missing posts. We are considering deploying inferential statistical analysis to assess volume differences by topic and party. However, inferential statistics may provide little additional value to the analysis, given that the dataset includes all Congressional X.com posts rather than a mere sample of the posts.
This chart visualizes data collected so far.
Meanwhile, total posting volume shows week-to-week variation. Within a given week, however, Republican posting volume generally exceeds Democratic/Independent posting volume.
Republicans hold slim majorities in both chambers of Congress, so one would expect more Republican volume. But perhaps Republicans also just use the platform more frequently.
Something kind of odd happens when you look at median “Like” counts by topic and party:
This code produces all of the visualizations shown above. Also see the Topic frequencies by party during specified weeks code and the Member post counts by party for a topic code at the bottom of the page.
# ============================================
# Congressional X Posts text analysis
# ============================================
# ============================================
# --- Load required libraries ---
# ============================================
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")
library(tidyverse)
library(tidytext)
# ============================================
# --- Load the data from project folder ---
# ============================================
Data <- readRDS("Latest119thData.rds")
# ============================================
# --- Add "Week" variable ---
# ============================================
Data <- Data %>%
mutate(Date = as.Date(Date)) %>%
mutate(
WeekStart = lubridate::floor_date(Date, unit = "week", week_start = 1)
)
first_week <- min(Data$WeekStart, na.rm = TRUE)
Data <- Data %>%
mutate(Week = as.integer(difftime(WeekStart, first_week, units = "weeks")) + 1)
# ============================================
# --- Combine Democrats and independents ---
# ============================================
Data <- Data %>%
mutate(
party = case_when(
party %in% c("Democrat", "Independent") ~ "Dem/Ind",
party == "Republican" ~ "Rep",
TRUE ~ "error"
)
)
# ============================================
# --- Optional prefilter ---
# (unchanged; left commented)
# ============================================
#
# phrases_prefilter <- c(
# "Trump",
# "the president",
# "our president",
# "White House",
# "Oval Office"
# )
#
# escaped_phrases_prefilter <- str_replace_all(
# phrases_prefilter,
# "([\\^$.|?*+()\\[\\]{}\\\\])",
# "\\\\\\1"
# )
#
# pattern_prefilter <- paste0("\\b", escaped_phrases_prefilter, "\\b", collapse = "|")
#
# Data <- Data %>%
# mutate(
# Full.Text.clean = str_squish(Full.Text),
# Prefilter = if_else(
# str_detect(Full.Text.clean, regex(pattern_prefilter, ignore_case = TRUE)),
# "Yes",
# "No"
# )
# )
#
# Data <- Data %>%
# filter(Prefilter == "Yes")
# ============================================
# --- Optional ngram analysis ---
# (unchanged; left commented)
# ============================================
#
# ngram_size <- 1 # <- change to 1, 2, 3, 4, etc.
#
# Ngram_Frequencies <- Data %>%
# mutate(Full.Text.clean = stringr::str_squish(Full.Text)) %>%
# unnest_tokens(
# output = "ngram",
# input = Full.Text.clean,
# token = "ngrams",
# n = ngram_size) %>%
# count(ngram, sort = TRUE) %>%
# filter(!is.na(ngram), ngram != "")
# ============================================
# --- Define topic labels for graphs ---
# ============================================
Topic1Label <- "Immigration"
Topic2Label <- "Economy"
Topic3Label <- "Epstein" # <<< NEW
# ============================================
# --- Flag Topic1-related posts ---
# ============================================
phrases_topic1 <- c(
"immigration",
"immigrants",
"immigrant",
"asylum seekers",
"refugee",
"refugees",
"alien",
"aliens",
"illegals",
"border crossings",
"cross the border",
"crossed the border",
"crossing the border",
"the southern border",
"border security",
"secure the border",
"secure the borders",
"securing the border",
"securing the borders",
"secure border",
"border security",
"ICE",
"I.C.E.",
"CBP",
"C.B.P.",
"Renee Good",
"Pretti"
)
escaped_phrases_topic1 <- str_replace_all(
phrases_topic1,
"([\\^$.|?*+()\\[\\]{}\\\\])",
"\\\\\\1"
)
pattern_topic1 <- paste0("\\b", escaped_phrases_topic1, "\\b", collapse = "|")
Data <- Data %>%
mutate(
Full.Text.clean = str_squish(Full.Text),
Topic1 = if_else(
str_detect(Full.Text.clean, regex(pattern_topic1, ignore_case = TRUE)),
"Yes",
"No"
)
)
# ============================================
# --- Flag Topic2-related posts ---
# ============================================
phrases_topic2 <- c(
"economy",
"economic",
"inflation",
"tariff",
"tariffs",
"trade",
"trading partners",
"prices",
"price of",
"paying more for",
"affordable",
"affordability",
"afford",
"affording"
)
escaped_phrases_topic2 <- str_replace_all(
phrases_topic2,
"([\\^$.|?*+()\\[\\]{}\\\\])",
"\\\\\\1"
)
pattern_topic2 <- paste0("\\b", escaped_phrases_topic2, "\\b", collapse = "|")
Data <- Data %>%
mutate(
Full.Text.clean = str_squish(Full.Text),
Topic2 = if_else(
str_detect(Full.Text.clean, regex(pattern_topic2, ignore_case = TRUE)),
"Yes",
"No"
)
)
# ============================================
# --- Flag Topic3-related posts --- <<< NEW
# ============================================
phrases_topic3 <- c(
"Epstein",
"release the files",
"Ghislaine Maxwell"
)
escaped_phrases_topic3 <- str_replace_all(
phrases_topic3,
"([\\^$.|?*+()\\[\\]{}\\\\])",
"\\\\\\1"
)
pattern_topic3 <- paste0("\\b", escaped_phrases_topic3, "\\b", collapse = "|")
Data <- Data %>%
mutate(
Full.Text.clean = str_squish(Full.Text),
Topic3 = if_else(
str_detect(Full.Text.clean, regex(pattern_topic3, ignore_case = TRUE)),
"Yes",
"No"
)
)
# ============================================
# --- Visualize weekly counts (stacked by party) ---
# ============================================
if (!require("plotly")) install.packages("plotly")
library(plotly)
# --- Build Week -> WeekStart lookup so hover can show a date ---
WeekDates <- Data %>%
distinct(Week, WeekStart) %>%
arrange(Week)
# --- Summarize weekly counts by Party for Topic1 ---
Topic1_weekly_party <- Data %>%
filter(party %in% c("Dem/Ind", "Rep"), Topic1 == "Yes") %>%
group_by(party, Week) %>%
summarize(Count = n(), .groups = "drop") %>%
mutate(Topic = Topic1Label)
# --- Summarize weekly counts by Party for Topic2 ---
Topic2_weekly_party <- Data %>%
filter(party %in% c("Dem/Ind", "Rep"), Topic2 == "Yes") %>%
group_by(party, Week) %>%
summarize(Count = n(), .groups = "drop") %>%
mutate(Topic = Topic2Label)
# --- Summarize weekly counts by Party for Topic3 --- <<< NEW
Topic3_weekly_party <- Data %>%
filter(party %in% c("Dem/Ind", "Rep"), Topic3 == "Yes") %>%
group_by(party, Week) %>%
summarize(Count = n(), .groups = "drop") %>%
mutate(Topic = Topic3Label)
# --- Combine Topic1 + Topic2 + Topic3 weekly counts --- <<< UPDATED
Weekly_counts_party <- bind_rows(Topic1_weekly_party, Topic2_weekly_party, Topic3_weekly_party) %>%
mutate(
# ensure all three topics exist (even if one has zero rows)
Topic = factor(Topic, levels = c(Topic1Label, Topic2Label, Topic3Label))
) %>%
tidyr::complete(
party,
Topic,
Week = full_seq(range(Data$Week, na.rm = TRUE), 1),
fill = list(Count = 0)
) %>%
left_join(WeekDates, by = "Week") %>%
arrange(party, Topic, Week)
# --- Compute a padded y max to avoid clipping the top point ---
y_max_raw <- max(Weekly_counts_party$Count, na.rm = TRUE)
y_max <- max(pretty(c(0, y_max_raw)))
# Alternative: y_max <- max(1, ceiling(y_max_raw * 1.05))
# --- Hover template (shows week number + WeekStart date) ---
hover_tpl <- paste0(
"<b>%{fullData.name}</b><br>",
"Week: %{x}<br>",
"Week start: %{customdata|%Y-%m-%d}<br>",
"Count: %{y}<extra></extra>"
)
# --- Choose a 3-color palette (Topic1, Topic2, Topic3)
topic_colors <- c("steelblue", "firebrick", "darkgreen")
# --- Build party-specific plots ---
# Show legend ONLY on the top subplot (Dem/Ind)
p_demind <- plot_ly(
data = Weekly_counts_party %>% filter(party == "Dem/Ind"),
x = ~Week,
y = ~Count,
color = ~Topic,
colors = topic_colors,
legendgroup = ~Topic, # group by topic so toggling applies to both panels
showlegend = TRUE, # legend here (top panel only)
type = "scatter",
mode = "lines+markers",
line = list(width = 2),
marker = list(size = 6),
customdata = ~WeekStart,
hovertemplate = hover_tpl
) %>%
layout(
title = "",
xaxis = list(title = ""),
yaxis = list(title = "Number of Posts", range = c(0, y_max)),
hovermode = "x unified",
legend = list(title = list(text = "Topic"))
)
# Hide legend on the bottom subplot (Rep)
p_rep <- plot_ly(
data = Weekly_counts_party %>% filter(party == "Rep"),
x = ~Week,
y = ~Count,
color = ~Topic,
colors = topic_colors,
legendgroup = ~Topic, # same legend groups as above
showlegend = FALSE, # hide legend here (bottom panel)
type = "scatter",
mode = "lines+markers",
line = list(width = 2),
marker = list(size = 6),
customdata = ~WeekStart,
hovertemplate = hover_tpl
) %>%
layout(
title = "",
xaxis = list(
title = "Week Number (Week 1 starts at first observed week in data)",
dtick = 1
),
yaxis = list(title = "Number of Posts", range = c(0, y_max)),
hovermode = "x unified"
)
# --- Tile them vertically (stacked) with title + styled subtitle ---
AS_party <- subplot(
p_demind, p_rep,
nrows = 2,
shareX = TRUE,
shareY = TRUE,
titleX = TRUE,
titleY = TRUE
) %>%
layout(
title = list(
text = paste0(
"Weekly topic mentions by party",
"<br><span style='font-size:0.80em; color:#6c757d;'>",
"Top = Dem/Ind, Bottom = Rep",
"</span>"
)
),
showlegend = TRUE,
# Force identical vertical scales on both subplots with padded max
yaxis = list(title = "Number of Posts", range = c(0, y_max)),
yaxis2 = list(title = "Number of Posts", range = c(0, y_max))
)
# ============================================
# --- Show the chart ---
# ============================================
AS_party
# ============================================
# --- Total weekly post volume by party ---
# ============================================
Total_weekly_party <- Data %>%
filter(party %in% c("Dem/Ind", "Rep")) %>%
group_by(party, Week) %>%
summarize(Count = n(), .groups = "drop") %>%
left_join(WeekDates, by = "Week") %>%
arrange(party, Week)
# --- Compute padded y max ---
y_max_tot_raw <- max(Total_weekly_party$Count, na.rm = TRUE)
y_max_tot <- max(pretty(c(0, y_max_tot_raw)))
# --- Hover template ---
hover_tot <- paste0(
"<b>%{fullData.name}</b><br>",
"Week: %{x}<br>",
"Week start: %{customdata|%Y-%m-%d}<br>",
"Count: %{y}<extra></extra>"
)
# --- Party colors (consistent with your topic palette style) ---
party_colors <- c("Dem/Ind" = "steelblue", "Rep" = "firebrick")
# ============================================
# --- Combined single-panel chart ---
# ============================================
AS_party_total_combined <- plot_ly(
data = Total_weekly_party,
x = ~Week,
y = ~Count,
color = ~party,
colors = party_colors,
type = "scatter",
mode = "lines+markers",
line = list(width = 2),
marker = list(size = 6),
customdata = ~WeekStart,
hovertemplate = hover_tot
) %>%
layout(
title = list(
text = paste0(
"Weekly total posts by party",
"<br><span style='font-size:0.80em; color:#6c757d;'>",
"Dem/Ind vs. Rep in a single combined panel",
"</span>"
)
),
xaxis = list(
title = "Week Number (Week 1 starts at first observed week in data)",
dtick = 1
),
yaxis = list(
title = "Number of Posts",
range = c(0, y_max_tot)
),
hovermode = "x unified",
legend = list(title = list(text = "Party"))
)
# ============================================
# --- Show the chart ---
# ============================================
AS_party_total_combined
# ============================================
# "Likes" by topic and party
# Step 1: Libraries
# ============================================
library(dplyr)
library(plotly)
library(scales) # for comma formatting of labels
# ============================================
# Step 2: Build derived variables (TopicCount, TopicLabel)
# ============================================
# Assumes Topic1/Topic2/Topic3 are exactly "Yes"/"No"
Data2 <- Data %>%
mutate(
TopicCount = rowSums(across(c(Topic1, Topic2, Topic3), ~ .x == "Yes")),
TopicLabel = case_when(
TopicCount == 1 & Topic1 == "Yes" ~ "Immigration",
TopicCount == 1 & Topic2 == "Yes" ~ "Economy",
TopicCount == 1 & Topic3 == "Yes" ~ "Epstein",
TRUE ~ NA_character_
)
)
# ============================================
# Step 3: Summary of TopicCount
# ============================================
TopicCountSummary <- Data2 %>%
count(TopicCount, name = "Count")
# ============================================
# Step 4: Filter to exactly one topic flagged "Yes"
# ============================================
Data3 <- Data2 %>%
filter(TopicCount == 1)
# ============================================
# Step 5: Create a 2-level party grouping (Rep vs Dem/Ind)
# ============================================
# Map "Rep" to Rep (red) and everything else (e.g., "Dem", "Ind") to Dem/Ind (blue).
Data3 <- Data3 %>%
mutate(
PartyGroup = if_else(party == "Rep", "Rep", "Dem/Ind")
)
# ============================================
# Step 6: Medians by TopicLabel and PartyGroup
# ============================================
TopicMedians <- Data3 %>%
group_by(TopicLabel, PartyGroup) %>%
summarise(
n_posts = n(),
n_likes = sum(!is.na(X.Likes)),
Median = median(Engagement.Score, na.rm = TRUE),
MedLikes = median(X.Likes, na.rm = TRUE),
.groups = "drop"
)
# ============================================
# Step 7: Diagnostics — missing likes (overall and by group)
# ============================================
likes_missing_total <- sum(is.na(Data3$X.Likes))
LikesMissingByGroup <- Data3 %>%
group_by(TopicLabel, PartyGroup) %>%
summarise(
n_posts = n(),
na_likes = sum(is.na(X.Likes)),
.groups = "drop"
) %>%
arrange(desc(na_likes))
# ============================================
# Step 8: Visualization — Median Likes (MedLikes) by TopicLabel × PartyGroup (plotly)
# ============================================
# Order TopicLabel by overall MedLikes across groups for readability
topic_order <- TopicMedians %>%
group_by(TopicLabel) %>%
summarise(overall_medlikes = median(MedLikes, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(overall_medlikes)) %>%
pull(TopicLabel)
TopicMedians <- TopicMedians %>%
mutate(TopicLabel = factor(TopicLabel, levels = topic_order))
# Define colors: Rep = red, Dem/Ind = blue
party_colors <- c("Rep" = "#d62728", "Dem/Ind" = "#1f77b4")
fig <- plot_ly()
for (pg in c("Rep", "Dem/Ind")) {
df <- TopicMedians %>% filter(PartyGroup == pg) %>% arrange(TopicLabel)
fig <- fig %>%
add_bars(
data = df,
x = ~MedLikes, # Bars show MedLikes
y = ~TopicLabel,
name = pg,
orientation = "h",
marker = list(color = party_colors[[pg]]), # Color by party group
text = ~comma(MedLikes), # Static labels with formatted MedLikes
textposition = "outside",
textfont = list(size = 12),
hoverinfo = "skip" # No pop-up labels
)
}
fig <- fig %>%
layout(
barmode = "group",
title = list(text = ""),
xaxis = list(title = "Median Likes", rangemode = "tozero"),
yaxis = list(title = ""),
legend = list(orientation = "h", x = 0, y = 1.1),
margin = list(l = 160, r = 40)
)
fig
This custom ngram analysis can filter for a specified week range, then show ngram frequences by party. Note that it can be run successfully only after a run of the main script, as it requires data frames produced by the main script.
It could be useful for detecting what one party posts about while the
other party is emphasizing a given topic or frame. Open
Ngram_Frequencies to see the ngram data.
The script also produces the Data_ngram data frame, which contains data for the posts from the selected period. Explore it to see posts with top engagement scores, particular words or phrases, and so on.
# ============================================
# --- Expanded ngram analysis with week filter
# --- Open Ngram_Frequencies to see results
# --- Open Data_ngram to explore posts from
# --- the selected period
# ============================================
ngram_size <- 3 # change to 1, 2, 3, etc.
StartWeek <- 54 # set desired week range
EndWeek <- 61 # set desired week range
library(tidytext)
library(dplyr)
library(tidyr)
library(stringr)
# --- Filter posts by week range (inclusive) ---
Data_ngram <- Data %>%
filter(Week >= StartWeek, Week <= EndWeek) %>%
mutate(Full.Text.clean = str_squish(Full.Text))
# --- Tokenize into n-grams ---
Ngrams <- Data_ngram %>%
unnest_tokens(
output = "ngram",
input = Full.Text.clean,
token = "ngrams",
n = ngram_size
) %>%
filter(!is.na(ngram), ngram != "")
# --- Count unique posts containing each n-gram by party ---
Ngram_Frequencies <- Ngrams %>%
distinct(post_id = Url, party, ngram) %>% # ensures 1 count per post per n-gram
group_by(ngram, party) %>%
summarize(PostCount = n(), .groups = "drop") %>%
pivot_wider(
names_from = party,
values_from = PostCount,
values_fill = 0
) %>%
mutate(
Total_Posts = `Dem/Ind` + Rep
) %>%
arrange(desc(Total_Posts))
# Print results
# Ngram_Frequencies
This figure and table offer a closer look at exactly who posted, and how often, during the topic surge explored in the above code (the “Democratic immigration surge” during weeks 54 through 61.)
| Authors & Post Counts by Party for Topic | |||||||
| Party | Unique authors for topic | Share of all party authors | Min posts/author | Max posts/author | Median posts/author | Mean posts/author | SD of posts/author |
|---|---|---|---|---|---|---|---|
| Rep | 242 | 87.4% | 1 | 113 | 5 | 13 | 18 |
| Dem/Ind | 245 | 91.8% | 1 | 215 | 16 | 22 | 23 |
Percentages use all distinct Authors in each party as the denominator (from counts_by_party). |
|||||||
The figure and table suggest that the surge had clear leaders, who posted about the topic much more frequently than their party colleagues did. There’s not much evidence that all, or even most, Democrats agreed to start posting about immigration as a broad-based, coordinated effort. Had that been the case, the blue histogram would have been “flatter,” with posting frequencies being more similar across all Democratic members. However, the proportion of Democrats who posted at least once about the topic is a little bit larger than the proportion of Republicans who posted about the topic. And the median posts per author is also greater for Democrats than for Republicans. So, Democrats did seem to post more frequently than Republicans overall. As Julia observed, Republicans were probably busy posting about their own take on the immigration topic: The “Save America Act.”
Below is the code, which must be run after both of the above code
blocks have been run. Topic1 (immigration) is specified by
default. To change to a different topic, find this line in Section
2:
filter(Topic1 == "Yes") %>% # <--- Select topic here
… and change Topic1 to Topic2 (economy) or
Topic3 (Epstein).
After running the code, you can open the df_demind and
df_rep data frames to explore post frequency counts for
individual members. It’s a little easier than doing so from the pop-up
windows in the graphic.
To drill down to the individual post level, open the
Data_ngram data frame and filter for the
Author you are interested in and filter for a “Yes” on the
topic you are interested in. It can be illumunating. For example,
looking at the 113 individual posts for John Cornyn, the top Republican
poster about “immigration” during the “immigration” surge, reveals only
seven that mention “Minneapolis,” and none that mention Alex Pretti or
Renee Good. The top Democratic poster, Joaquin Castro, mentiones all
three terms relatively often.
library(dplyr)
library(forcats)
library(plotly)
library(tidyr)
library(gt)
# -------------------------------------------------------------------
# 1) Distinct Author counts by party (overall, not filtered to Topic1)
# -------------------------------------------------------------------
counts_by_party <- Data %>%
distinct(Author, party) %>%
count(party, name = "n_authors") %>%
tidyr::pivot_wider(
names_from = party,
values_from = n_authors,
values_fill = 0
) %>%
transmute(
RepCount = coalesce(`Rep`, 0L),
DemIndCount = coalesce(`Dem/Ind`, 0L)
)
# View if desired
counts_by_party
# -------------------------------------------------------------------
# 2) Count topic posts by member
# -------------------------------------------------------------------
Source_counts <- Data_ngram %>%
filter(Topic1 == "Yes") %>% # <--- Select topic here
group_by(Author, party) %>%
summarize(Count = n(), .groups = "drop")
# Split by party and ORDER DESCENDING by Count
df_rep <- Source_counts %>%
filter(party == "Rep") %>%
arrange(desc(Count)) %>%
mutate(Author = fct_reorder(Author, Count, .desc = TRUE))
df_demind <- Source_counts %>%
filter(party == "Dem/Ind") %>%
arrange(desc(Count)) %>%
mutate(Author = fct_reorder(Author, Count, .desc = TRUE))
# Shared x-axis max
x_max <- max(Source_counts$Count, na.rm = TRUE)
# -------------------------------------------------------------------
# 3) Plots (with suppressed "trace 0/1" hover extra box)
# -------------------------------------------------------------------
p_rep <- plot_ly(
data = df_rep,
x = ~Count, y = ~Author,
type = "bar",
orientation = "h",
marker = list(color = "#C00000"),
hovertemplate = "%{y}<br>%{x} posts<extra></extra>"
) %>%
layout(
title = list(text = "Republicans"),
xaxis = list(title = "Number of posts (Topic1 = Yes)", range = c(0, x_max)),
yaxis = list(title = "", autorange = "reversed"),
margin = list(l = 140)
)
p_demind <- plot_ly(
data = df_demind,
x = ~Count, y = ~Author,
type = "bar",
orientation = "h",
marker = list(color = "#1F6FEB"),
hovertemplate = "%{y}<br>%{x} posts<extra></extra>"
) %>%
layout(
title = list(text = "Democrats / Independents"),
xaxis = list(title = "Number of posts (Topic1 = Yes)", range = c(0, x_max)),
yaxis = list(title = "", autorange = "reversed"),
margin = list(l = 140)
)
p_combined <- subplot(
p_rep, p_demind,
nrows = 2, shareX = TRUE, titleY = TRUE
) %>%
layout(
showlegend = FALSE,
title = list(text = "Topic Posts by Member & Party"),
bargap = 0.25
)
# Print the plot
p_combined
# -------------------------------------------------------------------
# 4) GT summary table
# - Unique Authors in df_rep / df_demind (Topic1 = Yes subset)
# - Percent of all party Authors (denominator from counts_by_party)
# - Min / Max / Median / Mean / SD of Count (posts per Author in Topic1)
# -------------------------------------------------------------------
# Helper to safely compute summary stats (returns NA if no rows)
compute_stats <- function(x) {
if (length(x) == 0) {
c(min = NA_real_, max = NA_real_, median = NA_real_, mean = NA_real_, sd = NA_real_)
} else {
c(
min = min(x, na.rm = TRUE),
max = max(x, na.rm = TRUE),
median = stats::median(x, na.rm = TRUE),
mean = mean(x, na.rm = TRUE),
sd = stats::sd(x, na.rm = TRUE)
)
}
}
# Unique authors (Topic1 = Yes subset)
rep_unique <- dplyr::n_distinct(df_rep$Author)
demind_unique <- dplyr::n_distinct(df_demind$Author)
# Bases (all distinct Authors by party overall)
rep_base <- if ("RepCount" %in% names(counts_by_party)) counts_by_party$RepCount[[1]] else NA_integer_
demind_base <- if ("DemIndCount" %in% names(counts_by_party)) counts_by_party$DemIndCount[[1]] else NA_integer_
# Proportions (guard against divide-by-zero or missing base)
rep_prop <- if (!is.na(rep_base) && rep_base > 0) rep_unique / rep_base else NA_real_
demind_prop <- if (!is.na(demind_base) && demind_base > 0) demind_unique / demind_base else NA_real_
# Summary stats of posts-per-author (Topic1 = Yes)
rep_stats <- compute_stats(df_rep$Count)
demind_stats <- compute_stats(df_demind$Count)
summary_tbl <- tibble::tibble(
Party = c("Rep", "Dem/Ind"),
UniqueAuthors = c(rep_unique, demind_unique),
PctOfParty = c(rep_prop, demind_prop),
Min = c(rep_stats["min"], demind_stats["min"]),
Max = c(rep_stats["max"], demind_stats["max"]),
Median = c(rep_stats["median"], demind_stats["median"]),
Mean = c(rep_stats["mean"], demind_stats["mean"]),
SD = c(rep_stats["sd"], demind_stats["sd"])
)
summary_gt <- summary_tbl %>%
gt() %>%
tab_header(
title = md("**Authors & Post Counts by Party for Topic**")
) %>%
cols_label(
Party = "Party",
UniqueAuthors = "Unique authors for topic",
PctOfParty = "Share of all party authors",
Min = "Min posts/author",
Max = "Max posts/author",
Median = "Median posts/author",
Mean = "Mean posts/author",
SD = "SD of posts/author"
) %>%
fmt_number(
columns = c(UniqueAuthors, Min, Max, Median, Mean, SD),
decimals = 0
) %>%
fmt_percent(
columns = PctOfParty,
decimals = 1
) %>%
tab_source_note(
md("Percentages use *all* distinct Authors in each party as the denominator (from `counts_by_party`).")
)
# Print the table
summary_gt