All posts collected for this analysis are sourced from the subreddit r/StarWars. This analysis is conducted solely for educational purposes. No content is used for commercial gain, and all rights to original posts remain with their respective authors.
You might have already explored analyzing social media data using professional tools like SproutSocial, Brand24, or Numerous.AI (see the reference section of my tutorial). These platforms offer powerful features for monitoring Reddit sentiment, identifying trending topics, and tracking audience engagement.
For example, SproutSocial provides built-in Reddit sentiment analysis capabilities that help brands quickly assess how users feel about a topic or product (SproutSocial - Reddit Sentiment Analysis). Similarly, Brand24 offers comprehensive guides and tools for conducting Reddit sentiment analysis, making it easier to understand online conversations (How to Do Reddit Sentiment Analysis? Example & Guide). Numerous.AI also shares a simple, step-by-step guide for performing Reddit sentiment analysis using their subscription tool, even for beginners (Simple Step-by-Step Guide on Reddit Sentiment Analysis).
However, what if I told you that you can achieve similar results completely on your own — for free? Yes, it’s true! With just a bit of coding in R and RStudio, you can collect Reddit posts, clean the data, and perform your own sentiment analysis without relying on expensive third-party services.
In this tutorial, I’ll show you how to DIY your own Reddit analysis workflow, helping you build valuable skills while saving money.
Have you watched the new Star Wars series Andor recently? I am planning to watch it in May 2025. Before heading to the theater, I wanted to see what people are saying about it. I show you how Reddit discussions can be used to gauge people’s opinions about Star Wars, particularly focusing on the new episode Andor, based on real-time conversations and fan reactions. By cleaning and analyzing this data, we can uncover key themes, sentiment trends, and recurring topics across the Star Wars community.
If you never used R or R Studio before, you might want to visit R for data science here for some basic refresher: https://r4ds.had.co.nz/. Do not worry. R is easy to learn. Once you learn it, you will find it to be useful as many big or small companies including Morgan Stanley, Google, Twitter all use R on a daily basis (naukri 2024).
First, let’s load the data that is scraped from the Sub-Reddit at (https://www.reddit.com/r/StarWars/) in April 2025.
# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(wordcloud)
## Loading required package: RColorBrewer
library(wordcloud2)
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
##
## The following object is masked from 'package:ggplot2':
##
## annotate
library(sentimentr)
library(ggplot2)
library(tidytext)
library(tidyr)
library(DT)
# Read the data
reddit_data <- read.csv("starwars.csv", stringsAsFactors = FALSE)
head(reddit_data)
## X
## 1 S.H.Figuarts Obi Wan and Darth Vader Collectible Figures!
## 2 Kathleen Kennedy, Diego Luna, and Tony Gilroy Talk At Andor Season 2 Premiere Q&A
## 3 Did anyone else get this feeling about Mon Mothma and Luthen Rael?
## 4 Finally acquired my favorite hot toys figure!
## 5 Vehicles for my New Republic Army
## 6 Work in Progress
## date_utc timestamp
## 1 2025-04-17 1744855353
## 2 2025-04-11 1744386492
## 3 2025-04-10 1744272596
## 4 2025-04-07 1744042175
## 5 2025-04-04 1743802107
## 6 2025-03-31 1743442252
## title
## 1 S.H.Figuarts Obi Wan and Darth Vader Collectible Figures!
## 2 Kathleen Kennedy, Diego Luna, and Tony Gilroy Talk At Andor Season 2 Premiere Q&A
## 3 Did anyone else get this feeling about Mon Mothma and Luthen Rael?
## 4 Finally acquired my favorite hot toys figure!
## 5 Vehicles for my New Republic Army
## 6 Work in Progress
## text
## 1
## 2
## 3 **SPOILERS for Andor Season 1**\n\nI just finished binge-watching Andor Season 1 (finally!) in anticipation of Season 2 releasing in a couple of weeks. Brilliant show that I can't rave enough about!\n\nSomething occurred to me though around the episode 10-11 mark which sure puts a darker complexion on Mon Mothma's situation.\n\nLuthen Rael is shown to be a ruthless, amoral figure prepared to do *anything* for the Rebellion - including coercing his agent to remain with the ISB by subtly threatening hs family, letting Anto Kreegyr and his men die to protect said ISB mole, and sending Vel and Cinta to kill Andor simply because Andor had seen his face!\n\nKeeping this in mind, one wonders what he would be prepared to do to Mon Mothma if *she* became a liability to him. Which she would become if her illicit transactions to finance the Rebellion came to light and she was arrested and interrogated. Unlike Andor, she knows Luthien's name and a great deal about him and his network.\n\nWas *this* why Mon Mothma was desperate enough to let her daughter potentially be betrothed to a gangster's son?\n\nAnother further chilling element to this - Vel's stern insistence that Mothma's financial dealing *cannot* be exposed. Which leads you to realize that if Luthen *did* want Mon Mothma dead before she could expose him, he'd probably send Vel to kill her. So Vel was not only trying to save her cousin's life with her warning, she was also hoping to prevent a situation wherein *she* might be compelled to kill her cousin :O\n\nIn the long run, I'm curious to see how the Mon Mothma-Luthen Rael situation pans out, though we obviously know that Mon ends up becoming the leader of the Rebellion while Rael is seemingly forgotten (and dead?)
## 4
## 5 1. The Gargantuan\n\n2. T4-B heavy Tank\n\n3. HTT-26 heavy troop transport\n\n4. Arrow-23 landspeeder\n\n5. Heavy Tracker\n\n6. MCL-3 Light Tank\n\n7. V-Wing Airspeeder\n\n8. AT-ST\n\n9. A-A5 Speeder Truck\n\n10. AT-AT\n\n11. T2-B Repulsor Tank\n\n12. T3-B Heavy Tank\n\n13. Juggernaut \n\n14. T-47 Airspeeder with variants\n\n15. Overracer Speeder Bike\n\n16. Mobile Proton Torpedo Launcher-2a with Spotter Droid\n\n17. Manka-Class Walker\n\n18. UT-AT\n\n19. AT-TE\n\n20. Rebel Combat Speeder \n
## 6 I've had a great time building this model.
## subreddit comments
## 1 StarWars 0
## 2 StarWars 1
## 3 StarWars 9
## 4 StarWars 0
## 5 StarWars 0
## 6 StarWars 0
## url
## 1 https://www.reddit.com/r/StarWars/comments/1k11pxz/shfiguarts_obi_wan_and_darth_vader_collectible/
## 2 https://www.reddit.com/r/StarWars/comments/1jwt3de/kathleen_kennedy_diego_luna_and_tony_gilroy_talk/
## 3 https://www.reddit.com/r/StarWars/comments/1jvspo5/did_anyone_else_get_this_feeling_about_mon_mothma/
## 4 https://www.reddit.com/r/StarWars/comments/1jtp0st/finally_acquired_my_favorite_hot_toys_figure/
## 5 https://www.reddit.com/r/StarWars/comments/1jrn7tp/vehicles_for_my_new_republic_army/
## 6 https://www.reddit.com/r/StarWars/comments/1jo8ve3/work_in_progress/
# Convert date_utc to proper date format
reddit_data$date <- as_datetime(reddit_data$date_utc)
# Basic summary statistics
summary(reddit_data$comments)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 5.00 21.00 97.23 83.00 6420.00
# Time series analysis - posts per day
daily_posts <- reddit_data %>%
mutate(date = as.Date(date)) %>%
count(date)
# Analyze comment engagement
ggplot(reddit_data, aes(x = comments)) +
geom_histogram(bins = 30, fill = "steelblue") +
labs(title = "Distribution of Comments on The Star Wars Series- Andor",
x = "Number of Comments",
y = "Count") +
theme_minimal()
# Find most engaging posts (most comments)
top_posts <- reddit_data %>%
arrange(desc(comments)) %>%
select(date, title, comments) %>%
head(10)
print(top_posts)
## date
## 1 2025-04-01
## 2 2025-04-23
## 3 2025-04-09
## 4 2025-04-23
## 5 2025-04-25
## 6 2025-04-18
## 7 2025-04-10
## 8 2025-04-23
## 9 2025-04-12
## 10 2025-04-15
## title
## 1 John Boyega Says \030Star Wars\031 Is \030So White That a Black Person Existing in It\031 Is a Big Deal: Toxic Fans Are \030Okay With Us Playing the Friend\031 but We \030Cant Touch Their Heroes\030
## 2 Andor (Season 2) - Episodes 1, 2 & 3 - Discussion Thread!
## 3 Why was Solo disliked?
## 4 Supposedly every confirmed Star Wars Project
## 5 How did it feel seeing this part for the first time where it showed that Yoda wasn\031t some frail old man?
## 6 Star Wars: Starfighter starring Ryan Gosling and directed by Shawn Levy will release on May 28, 2027.
## 7 Put aside your feelings about The Acolyte show, what did you think about Qimir?
## 8 Is there a reason Qui-Gon didn\031t let these EIGHTEEN (at least) other people with blasters help him fight Maul?
## 9 How do force users consistently forget that they have the Force?
## 10 What are some changes Disney made to the canon that are actually better?
## comments
## 1 6420
## 2 5737
## 3 4421
## 4 1436
## 5 1284
## 6 1126
## 7 1101
## 8 1086
## 9 1039
## 10 995
# Display as an interactive table with formatting options using the **DT** library
datatable(top_posts,
options = list(
pageLength = 10,
autoWidth = TRUE,
columnDefs = list(list(
targets = 1,
width = '60%'
)),
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel')
),
caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-size: 16px; font-weight: bold;',
'Top 10 Most Engaging Reddit Posts'
),
rownames = FALSE,
filter = 'top',
class = 'cell-border stripe hover'
) %>%
formatDate('date', method = 'toLocaleDateString') %>%
formatStyle('comments',
background = styleColorBar(c(0, max(top_posts$comments)), 'lightblue'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center')
# Text analysis - create a corpus from post titles
corpus <- Corpus(VectorSource(reddit_data$title))
corpus <- corpus %>%
tm_map(removePunctuation) %>%
tm_map(removeNumbers) %>%
tm_map(tolower) %>%
tm_map(removeWords, stopwords("english")) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removeWords, stopwords("english")):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
# Create document-term matrix
dtm <- DocumentTermMatrix(corpus)
freq <- colSums(as.matrix(dtm))
word_freq <- data.frame(word = names(freq), freq = freq)
# Plot most common words
word_freq %>%
arrange(desc(freq)) %>%
head(20) %>%
ggplot(aes(x = reorder(word, freq), y = freq)) +
geom_col(fill = "darkgreen") +
coord_flip() +
labs(title = "Most Common Words in Reddit Posts for the Star Wars Series- Andor",
x = "Word",
y = "Frequency") +
theme_minimal()
# Text analysis - create a corpus from post titles
corpus <- Corpus(VectorSource(reddit_data$text))
corpus <- corpus %>%
tm_map(removePunctuation) %>%
tm_map(removeNumbers) %>%
tm_map(tolower) %>%
tm_map(removeWords, stopwords("english")) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removeWords, stopwords("english")):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
# Create document-term matrix
dtm <- DocumentTermMatrix(corpus)
freq <- colSums(as.matrix(dtm))
word_freq <- data.frame(word = names(freq), freq = freq)
# Plot most common words
word_freq %>%
arrange(desc(freq)) %>%
head(30) %>%
ggplot(aes(x = reorder(word, freq), y = freq)) +
geom_col(fill = "darkgreen") +
coord_flip() +
labs(title = "Most Common Words in Reddit Post Text for the Star Wars Series – Andor ",
x = "Word",
y = "Frequency") +
theme_minimal()
# Generate word cloud
set.seed(123)
wordcloud(words = word_freq$word,
freq = word_freq$freq,
max.words = 100,
colors = brewer.pal(8, "Dark2"))
# Create the interactive word cloud
wordcloud2(data = word_freq,
size = 1,
color = "random-dark",
backgroundColor = "white",
shape = "circle",
rotateRatio = 0.3)
# NRC lexicon method on the cleaned data
sentiments <- word_freq %>%
inner_join(get_sentiments("bing"), by = "word", relationship = "many-to-many") %>%
count(sentiment, sort = TRUE)
sentiments %>%
ggplot(aes(x = sentiment, y = n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis of Reddit Comments", x = "Sentiment", y = "Frequency")
In this tutorial, I aim to explore public perceptions of the current Star Wars Series, Andor, using text data scraped from Reddit. While this approach cannot replace traditional data collected using surveys and other methods, it offers valuable complementary insights that reflect the lived experiences and opinions of individuals.
Which analysis/graph did you like the best? Why? Feel free to follow me on Twitter at https://twitter.com/MKTJimmyxu, on YouTube at https://www.youtube.com/@webdatax, or visit my website at https://github.com/utjimmyx.
naukri 2024. Companies Using R Programming Language. https://www.naukri.com/code360/library/companies-using-r-programming-language
How to Do Reddit Sentiment Analysis? Example & Guide: https://brand24.com/blog/reddit-sentiment-analysis/
Sproutsocial - Reddit sentiment analysis: https://sproutsocial.com/glossary/reddit-sentiment-analysis/
Simple Step-by-Step Guide On Reddit Sentiment Analysis: https://numerous.ai/blog/reddit-sentiment-analysis
R for Data Science: https://r4ds.had.co.nz/
Text Mining with R: https://www.tidytextmining.com/