Introduction

This tutorial was originally written by Dr. Zhenning Xu. You can take a look at the original tutorial here: https://rpubs.com/utjimmyx/reddit_stocks

Load Libraries

The first step is to load these libraries. Note: you may need to install these packages if you have not used them before.

library(magrittr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract()   masks magrittr::extract()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ purrr::set_names() masks magrittr::set_names()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(htmltools)
library(lubridate)
library(wordcloud)
## Loading required package: RColorBrewer
library(wordcloud2)
library(tm)
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## 
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(sentimentr)
library(ggplot2)
library(tidytext)
library(textdata)
library(RColorBrewer)  # Added for brewer.pal
library(atrrr)

Bluesky Authentication

Replace my Bluesky tag with your own. Be sure to copy down the password Bluesky gives you when you run this line–it will be the same password you use each time you rerun this code.

auth("danielfox7.bsky.social") # Replace with your tag. Do not include the '@'

Check followers

This just sees who we follow and who follows us.

myfollowers <- get_followers(actor="danielfox7.bsky.social",limit=4000)
## ℹ Parsing 2 results.✔ Got 2 results. All done!
myfollows <- get_follows(actor="danielfox7.bsky.social",limit=4000)
## ℹ Parsing 7 results.✔ Got 7 results. All done!
# find those followers who are not followed by us
not.followed.by.me <- myfollowers %>%
  dplyr::anti_join(myfollows,by="did")

Word Frequency Plot

Now we will create a bar chart that shows the most frequent words from a csv file in our working directory called “crypto2.csv”

# change working directory to your file path
setwd("C:/Users/danie/OneDrive/Documents/R")

# Read the data
data <- read.csv("crypto2.csv", stringsAsFactors = FALSE)
corpus_text <- Corpus(VectorSource(data$text))
corpus_text <- corpus_text %>%
  tm_map(removePunctuation) %>%
  tm_map(removeNumbers) %>%
  tm_map(tolower) %>%
  tm_map(removeWords, stopwords("english")) %>%
  tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removeWords, stopwords("english")):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
# Create document-term matrix for text content
dtm_text <- DocumentTermMatrix(corpus_text)
freq_text <- colSums(as.matrix(dtm_text))
word_freq_text <- data.frame(word = names(freq_text), freq = freq_text)

# Plot most common words in post content
word_freq_text %>%
  arrange(desc(freq)) %>%
  head(30) %>%
  ggplot(aes(x = reorder(word, freq), y = freq)) +
  geom_col(fill = "darkgreen") +
  coord_flip() +
  labs(title = "Most Common Words in Reddit Post Content",  # Fixed title
       x = "Word",
       y = "Frequency") +
  theme_minimal()

Create Word Cloud

This final part is the really neat part. We can create a word cloud in which the size of the word increase the more frequent the word appears in our dataset.

#Plot word cloud
wordcloud2(word_freq_text, size = 1,shape = 'star')