All posts collected for this analysis are sourced from the subreddit r/StarWars. This analysis is conducted solely for educational purposes. No content is used for commercial gain, and all rights to original posts remain with their respective authors.
If you never used R or R Studio before, you might want to visit R for data science here for some basic refresher: https://r4ds.had.co.nz/.
First, let’s load the data that is scraped from the Sub-Reddit at (https://www.reddit.com/r/economy/) in April 2025.
# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(wordcloud)
## Loading required package: RColorBrewer
library(wordcloud2)
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
##
## The following object is masked from 'package:ggplot2':
##
## annotate
library(sentimentr)
library(ggplot2)
library(tidytext)
library(tidyr)
library(DT)
# Read the data
reddit_data <- read.csv("StarWars_Andor.csv", stringsAsFactors = FALSE)
head(reddit_data)
## X
## 1 S.H.Figuarts Obi Wan and Darth Vader Collectible Figures!
## 2 Kathleen Kennedy, Diego Luna, and Tony Gilroy Talk At Andor Season 2 Premiere Q&A
## 3 Did anyone else get this feeling about Mon Mothma and Luthen Rael?
## 4 Finally acquired my favorite hot toys figure!
## 5 Vehicles for my New Republic Army
## 6 Work in Progress
## date_utc timestamp
## 1 4/17/2025 1744855353
## 2 4/11/2025 1744386492
## 3 4/10/2025 1744272596
## 4 4/7/2025 1744042175
## 5 4/4/2025 1743802107
## 6 3/31/2025 1743442252
## title
## 1 S.H.Figuarts Obi Wan and Darth Vader Collectible Figures!
## 2 Kathleen Kennedy, Diego Luna, and Tony Gilroy Talk At Andor Season 2 Premiere Q&A
## 3 Did anyone else get this feeling about Mon Mothma and Luthen Rael?
## 4 Finally acquired my favorite hot toys figure!
## 5 Vehicles for my New Republic Army
## 6 Work in Progress
## text
## 1
## 2
## 3 **SPOILERS for Andor Season 1**\n\nI just finished binge-watching Andor Season 1 (finally!) in anticipation of Season 2 releasing in a couple of weeks. Brilliant show that I can't rave enough about!\n\nSomething occurred to me though around the episode 10-11 mark which sure puts a darker complexion on Mon Mothma's situation.\n\nLuthen Rael is shown to be a ruthless, amoral figure prepared to do *anything* for the Rebellion - including coercing his agent to remain with the ISB by subtly threatening hs family, letting Anto Kreegyr and his men die to protect said ISB mole, and sending Vel and Cinta to kill Andor simply because Andor had seen his face!\n\nKeeping this in mind, one wonders what he would be prepared to do to Mon Mothma if *she* became a liability to him. Which she would become if her illicit transactions to finance the Rebellion came to light and she was arrested and interrogated. Unlike Andor, she knows Luthien's name and a great deal about him and his network.\n\nWas *this* why Mon Mothma was desperate enough to let her daughter potentially be betrothed to a gangster's son?\n\nAnother further chilling element to this - Vel's stern insistence that Mothma's financial dealing *cannot* be exposed. Which leads you to realize that if Luthen *did* want Mon Mothma dead before she could expose him, he'd probably send Vel to kill her. So Vel was not only trying to save her cousin's life with her warning, she was also hoping to prevent a situation wherein *she* might be compelled to kill her cousin :O\n\nIn the long run, I'm curious to see how the Mon Mothma-Luthen Rael situation pans out, though we obviously know that Mon ends up becoming the leader of the Rebellion while Rael is seemingly forgotten (and dead?)
## 4
## 5 1. The Gargantuan\n\n2. T4-B heavy Tank\n\n3. HTT-26 heavy troop transport\n\n4. Arrow-23 landspeeder\n\n5. Heavy Tracker\n\n6. MCL-3 Light Tank\n\n7. V-Wing Airspeeder\n\n8. AT-ST\n\n9. A-A5 Speeder Truck\n\n10. AT-AT\n\n11. T2-B Repulsor Tank\n\n12. T3-B Heavy Tank\n\n13. Juggernaut \n\n14. T-47 Airspeeder with variants\n\n15. Overracer Speeder Bike\n\n16. Mobile Proton Torpedo Launcher-2a with Spotter Droid\n\n17. Manka-Class Walker\n\n18. UT-AT\n\n19. AT-TE\n\n20. Rebel Combat Speeder \n
## 6 I've had a great time building this model.
## subreddit comments
## 1 StarWars 0
## 2 StarWars 1
## 3 StarWars 9
## 4 StarWars 0
## 5 StarWars 0
## 6 StarWars 0
# Convert date_utc to proper date format
reddit_data$date <- as_datetime(reddit_data$date_utc)
## Warning: All formats failed to parse. No formats found.
# Basic summary statistics
summary(reddit_data$comments)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 5.00 21.00 97.23 83.00 6420.00
# Time series analysis - posts per day
daily_posts <- reddit_data %>%
mutate(date = as.Date(date)) %>%
count(date)
# Analyze comment engagement
ggplot(reddit_data, aes(x = comments)) +
geom_histogram(bins = 30, fill = "steelblue") +
labs(title = "Distribution of Comments on The Star Wars Series- Andor",
x = "Number of Comments",
y = "Count") +
theme_minimal()
# Find most engaging posts (most comments)
top_posts <- reddit_data %>%
arrange(desc(comments)) %>%
select(date, title, comments) %>%
head(10)
print(top_posts)
## date
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
## 7 <NA>
## 8 <NA>
## 9 <NA>
## 10 <NA>
## title
## 1 John Boyega Says \030Star Wars\031 Is \030So White That a Black Person Existing in It\031 Is a Big Deal: Toxic Fans Are \030Okay With Us Playing the Friend\031 but We \030Cant Touch Their Heroes\030
## 2 Andor (Season 2) - Episodes 1, 2 & 3 - Discussion Thread!
## 3 Why was Solo disliked?
## 4 Supposedly every confirmed Star Wars Project
## 5 How did it feel seeing this part for the first time where it showed that Yoda wasn\031t some frail old man?
## 6 Star Wars: Starfighter starring Ryan Gosling and directed by Shawn Levy will release on May 28, 2027.
## 7 Put aside your feelings about The Acolyte show, what did you think about Qimir?
## 8 Is there a reason Qui-Gon didn\031t let these EIGHTEEN (at least) other people with blasters help him fight Maul?
## 9 How do force users consistently forget that they have the Force?
## 10 What are some changes Disney made to the canon that are actually better?
## comments
## 1 6420
## 2 5737
## 3 4421
## 4 1436
## 5 1284
## 6 1126
## 7 1101
## 8 1086
## 9 1039
## 10 995
# Display as an interactive table with formatting options using the **DT** library
datatable(top_posts,
options = list(
pageLength = 10,
autoWidth = TRUE,
columnDefs = list(list(
targets = 1,
width = '60%'
)),
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel')
),
caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-size: 16px; font-weight: bold;',
'Top 10 Most Engaging Reddit Posts'
),
rownames = FALSE,
filter = 'top',
class = 'cell-border stripe hover'
) %>%
formatDate('date', method = 'toLocaleDateString') %>%
formatStyle('comments',
background = styleColorBar(c(0, max(top_posts$comments)), 'lightblue'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center')
# Text analysis - create a corpus from post titles
corpus <- Corpus(VectorSource(reddit_data$title))
corpus <- corpus %>%
tm_map(removePunctuation) %>%
tm_map(removeNumbers) %>%
tm_map(tolower) %>%
tm_map(removeWords, stopwords("english")) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removeWords, stopwords("english")):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
# Create document-term matrix
dtm <- DocumentTermMatrix(corpus)
freq <- colSums(as.matrix(dtm))
word_freq <- data.frame(word = names(freq), freq = freq)
# Plot most common words
word_freq %>%
arrange(desc(freq)) %>%
head(20) %>%
ggplot(aes(x = reorder(word, freq), y = freq)) +
geom_col(fill = "darkgreen") +
coord_flip() +
labs(title = "Most Common Words in Reddit Posts for the Star Wars Series- Andor",
x = "Word",
y = "Frequency") +
theme_minimal()
# Text analysis - create a corpus from post titles
corpus <- Corpus(VectorSource(reddit_data$text))
corpus <- corpus %>%
tm_map(removePunctuation) %>%
tm_map(removeNumbers) %>%
tm_map(tolower) %>%
tm_map(removeWords, stopwords("english")) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removeWords, stopwords("english")):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
# Create document-term matrix
dtm <- DocumentTermMatrix(corpus)
freq <- colSums(as.matrix(dtm))
word_freq <- data.frame(word = names(freq), freq = freq)
# Plot most common words
word_freq %>%
arrange(desc(freq)) %>%
head(30) %>%
ggplot(aes(x = reorder(word, freq), y = freq)) +
geom_col(fill = "darkgreen") +
coord_flip() +
labs(title = "Most Common Words in Reddit Post Text for the Star Wars Series – Andor ",
x = "Word",
y = "Frequency") +
theme_minimal()
# Generate word cloud
set.seed(123)
wordcloud(words = word_freq$word,
freq = word_freq$freq,
max.words = 100,
colors = brewer.pal(8, "Dark2"))
# Create the interactive word cloud
wordcloud2(data = word_freq,
size = 1,
color = "random-dark",
backgroundColor = "white",
shape = "circle",
rotateRatio = 0.3)
# NRC lexicon method on the cleaned data
sentiments <- word_freq %>%
inner_join(get_sentiments("bing"), by = "word", relationship = "many-to-many") %>%
count(sentiment, sort = TRUE)
sentiments %>%
ggplot(aes(x = sentiment, y = n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis of Reddit Comments", x = "Sentiment", y = "Frequency")
In this tutorial, we aim to explore public perceptions of the current Star Wars Series, Andor, using text data. Through data scraping, cleaning, and text analysis, we were able to capture real-time conversations and identify emerging economic concerns, sentiments, and trends.
While this approach cannot replace traditional economic data, it offers valuable complementary insights that reflect the lived experiences and opinions of individuals.
Which analysis/graph did you like the best? Why? Feel free to follow me on Twitter at https://twitter.com/MKTJimmyxu, on YouTube at https://www.youtube.com/@webdatax, or visit my website at https://github.com/utjimmyx.
How to Do Reddit Sentiment Analysis? Example & Guide: https://brand24.com/blog/reddit-sentiment-analysis/
Sproutsocial - Reddit sentiment analysis: https://sproutsocial.com/glossary/reddit-sentiment-analysis/
Simple Step-by-Step Guide On Reddit Sentiment Analysis: https://numerous.ai/blog/reddit-sentiment-analysis
R for Data Science: https://r4ds.had.co.nz/
Text Mining with R: https://www.tidytextmining.com/