This is a simple text analysis using data scraped from Reddit.com. Specifically, we want to see what users are talking about on r/digitaldetox.
We first load out libraries. You may need to install some of these packages if you do not have them installed already.
library(magrittr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract() masks magrittr::extract()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ purrr::set_names() masks magrittr::set_names()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(htmltools)
library(lubridate)
library(wordcloud)
## Loading required package: RColorBrewer
library(wordcloud2)
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
##
## The following object is masked from 'package:ggplot2':
##
## annotate
library(sentimentr)
library(ggplot2)
library(tidytext)
library(textdata)
library(RColorBrewer) # Added for brewer.pal
library(atrrr)
library(rsconnect)
Subreddit description: “A community dedicated to embracing periods of disconnection from electronic devices like smartphones and computers, to alleviate stress and enhance real-world social interactions.”
#Read the data
data <- read.csv("DigitalDetoxReddit.csv", stringsAsFactors = FALSE, quote ="")
corpus_text <- Corpus(VectorSource(data$text))
corpus_text <- corpus_text %>%
tm_map(removePunctuation) %>%
tm_map(removeNumbers) %>%
tm_map(tolower) %>%
tm_map(removeWords, stopwords("english")) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removeWords, stopwords("english")):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
# Create document-term matrix for text content
dtm_text <- DocumentTermMatrix(corpus_text)
freq_text <- colSums(as.matrix(dtm_text))
word_freq_text <- data.frame(word = names(freq_text), freq = freq_text)
This plot will show us the frequency of the most common words of the most recent posts of the subreddit
# Plot most common words in post content
word_freq_text %>%
arrange(desc(freq)) %>%
head(30) %>%
ggplot(aes(x = reorder(word, freq), y = freq)) +
geom_col(fill = "darkgreen") +
coord_flip() +
labs(title = "Most Common Words in Reddit Post Content", # Fixed title
x = "Word",
y = "Frequency") +
theme_minimal()
We can then visualize the data using a word cloud
#Plot word cloud
wordcloud2(word_freq_text, size = 1)
Here are a few of the words that stand out:
This can give us an idea about why people might choose to detox themselves from their smartphones or other digital devices. These users probably see that everyone, including themselves, are always on social media apps, and such activities take up a lot of their time.
For further analysis, we would like to remove some of the clutter words. For example, words like “just” or “one” might not have much meaning on their own. By excluding the extra noise, we can more accurately examine why people choose to detox themselves from their digital devices.
Xu, Zhenning. (2025, June 29). Analyzing Stock-Related Posts Scraped From Reddit.com. Rpubs. Retrieved from https://rpubs.com/utjimmyx/reddit_stocks