Bluesky Word Frequency Analysis of Posts about Heather Cox Richardson
Introduction
For this project, I analyzed Bluesky posts related to Heather Cox Richardson. Heather Cox Richardson is a historian and political commentator whose work is often discussed in relation to American politics, democracy, history, and current events.
The goal of this project was to collect posts from Bluesky, clean the text data, perform a word frequency analysis, identify the most common terms, and create a word cloud visualization.
Load Required Packages
Bluesky Login Information
Authenticate with Bluesky
This code logs in to Bluesky using the AT Protocol API.
Collect Bluesky Posts
The search topic for this project is:
This code searches Bluesky for posts related to Heather Cox Richardson. It collects up to 100 posts, or fewer if fewer posts are available.
posts_list <- list()
cursor <- NULL
while (length(posts_list) < max_posts) {
req <- request("https://bsky.social/xrpc/app.bsky.feed.searchPosts") |>
req_headers(Authorization = paste("Bearer", access_token)) |>
req_url_query(
q = query,
limit = 25,
sort = "latest"
)
if (!is.null(cursor)) {
req <- req |>
req_url_query(cursor = cursor)
}
response <- req |>
req_perform()
data <- resp_body_json(response, simplifyVector = FALSE)
if (length(data$posts) == 0) {
break
}
for (post in data$posts) {
post_text <- post$record$text
author <- post$author$handle
created_at <- post$record$createdAt
like_count <- ifelse(is.null(post$likeCount), 0, post$likeCount)
repost_count <- ifelse(is.null(post$repostCount), 0, post$repostCount)
reply_count <- ifelse(is.null(post$replyCount), 0, post$replyCount)
posts_list[[length(posts_list) + 1]] <- data.frame(
author = author,
created_at = created_at,
text = post_text,
likes = like_count,
reposts = repost_count,
replies = reply_count,
stringsAsFactors = FALSE
)
if (length(posts_list) >= max_posts) {
break
}
}
if (is.null(data$cursor)) {
break
}
cursor <- data$cursor
Sys.sleep(1)
}
bluesky_posts <- bind_rows(posts_list)View the Collected Data
## author created_at
## 1 stefanejones.bsky.social 2026-07-05T01:46:04.966Z
## 2 mikeymomo.bsky.social 2026-07-05T00:43:10.896Z
## 3 bsargentnoble1.bsky.social 2026-07-05T00:31:11.664Z
## 4 onedandelion.bsky.social 2026-07-04T23:17:02.880Z
## 5 arcticfox87.bsky.social 2026-07-04T23:05:06.032Z
## 6 darleneryan.bsky.social 2026-07-04T21:53:45.830Z
## text
## 1 Very Long Doomscrolling Break.\n\n"The Lincoln Portrait," narrated by Heather Cox Richardson.\n\n*"This is what he said. This is what Abe Lincoln said."*\n\nwww.youtube.com/watch?v=RL2z...
## 2 Heather Cox Richardson\nJul 04, 2026\nJuly 3, 2026\nopen.substack.com/pub/heatherc...
## 3 www.facebook.com/share/v/1CmC...\nHeather Cox Richardson
## 4 7/03/26\n"I just duped you into something that you thought it was real but it really wasn't..."🤢\nyoutube.com/shorts/ruwmZ...
## 5 @Wajahat\n I agree with you, but have a listen, at least to the first 20 min of today's conversation between Heather Cox Richardson and Sarah Longwell, for a little bit of hope, which we desperately need\nyoutu.be/xBE68EKHq2c?...
## 6 July 3, 2026\nHEATHER COX RICHARDSON\nJUL 4\n\nopen.substack.com/pub/heatherc...
## likes reposts replies
## 1 2 2 1
## 2 1 0 0
## 3 0 0 0
## 4 1 3 0
## 5 1 0 0
## 6 0 0 0
Save Raw Data
Clean the Text Data
This section cleans the Bluesky post text by removing URLs, mentions, punctuation, numbers, extra spaces, and common stopwords.
I also removed the words “Heather,” “Cox,” and “Richardson” because those words are part of the search term and would likely dominate the results.
Tokenize the Text
Tokenizing means splitting the text into individual words.
Word Frequency Analysis
This code counts how often each word appears.
## word n
## 1 america 27
## 2 july 21
## 3 history 20
## 4 american 19
## 5 historian 17
## 6 independence 13
## 7 declaration 12
## 8 americas 11
## 9 opensubstackcompubheatherc 11
## 10 people 11
## 11 awful 10
## 12 tells 10
## 13 bskysocial 9
## 14 citizens 9
## 15 democracy 9
## 16 dream 9
## 17 modern 9
## 18 day 5
## 19 hope 5
## 20 nation 5
Save Word Frequency Table
Top 20 Most Common Words
## word n
## 1 america 27
## 2 july 21
## 3 history 20
## 4 american 19
## 5 historian 17
## 6 independence 13
## 7 declaration 12
## 8 americas 11
## 9 opensubstackcompubheatherc 11
## 10 people 11
## 11 awful 10
## 12 tells 10
## 13 bskysocial 9
## 14 citizens 9
## 15 democracy 9
## 16 dream 9
## 17 modern 9
## 18 day 5
## 19 hope 5
## 20 nation 5
## 21 rights 5
## 22 sarah 5
Bar Chart of Most Common Words
ggplot(top_20_words, aes(x = reorder(word, n), y = n)) +
geom_col(fill = "red") +
coord_flip() +
labs(
title = "Top 20 Most Common Words in Bluesky Posts about Heather Cox Richardson",
x = "Word",
y = "Frequency"
) +
theme_minimal()Word Cloud
set.seed(123)
wordcloud(
words = word_freq$word,
freq = word_freq$n,
max.words = 100,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)Findings
After collecting and cleaning the Bluesky posts, I performed a word frequency analysis to identify the most common terms related to Heather Cox Richardson.
## word n
## 1 america 27
## 2 july 21
## 3 history 20
## 4 american 19
## 5 historian 17
## 6 independence 13
## 7 declaration 12
## 8 americas 11
## 9 opensubstackcompubheatherc 11
## 10 people 11
## 11 awful 10
## 12 tells 10
## 13 bskysocial 9
## 14 citizens 9
## 15 democracy 9
## 16 dream 9
## 17 modern 9
## 18 day 5
## 19 hope 5
## 20 nation 5
## 21 rights 5
## 22 sarah 5
The most common words in the dataset suggest that Bluesky users discussing Heather Cox Richardson often connect her name with topics related to politics, history, democracy, news, and current events. The bar chart shows the top 20 most frequent words, while the word cloud visually highlights the most repeated terms.
Limitations
This analysis has several limitations. First, the dataset only includes posts available through Bluesky search at the time the data was collected. Second, the results depend on the search phrase “Heather Cox Richardson,” so different search terms such as “HCR” might produce different results. Third, word frequency analysis only counts how often words appear. It does not fully explain context, sarcasm, tone, or whether the posts are supportive or critical.
Conclusion
This project used the Bluesky API to collect posts related to Heather Cox Richardson. After cleaning the text, I used word frequency analysis to identify the most common terms in the dataset. The results provide a general overview of the themes and language commonly associated with discussions of Heather Cox Richardson on Bluesky.