This code will read an excel file called “MyTweets.xlsx” and produce
word frequency counts for text in a column called “text.” Before
producing the word frequency count, it will automatically delete
standard “stop words” (a, and, the, etc.) and, optionally, words from a
user-customizable list called my_stopwords
. Finally, the
script will save the word frequency counts as an Excel file called
“WordFrequencies.xlsx.”
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")
if (!require("dplyr")) install.packages("dplyr")
if (!require("writexl")) install.packages("writexl")
library(readxl)
library(tidytext)
library(dplyr)
library(writexl)
MyTweets <- read_excel("MyTweets.xlsx")
# Tokenizing by word and counting words by source
tidy_text <- MyTweets %>%
unnest_tokens(word,text) %>%
count(word, sort = TRUE)
# Deleting standard stop words
data("stop_words")
tidy_text <- tidy_text %>%
anti_join(stop_words)
# Deleting custom stop words
my_stopwords <- tibble(word = c("and",
"but"))
tidy_text <- tidy_text %>%
anti_join(my_stopwords)
View(tidy_text)
# Saving word frequencies in Excel format
write_xlsx(tidy_text, "WordFrequencies.xlsx")
You can use this code to add a user-named column to the data file, check each cell in the data file’s “text” column for the presence of one or more user-specified words or phrases, then add a 1 if the word or phrase is found and a 0 if the word or phrase is not found.
Two examples are shown. The first will add a column called “Biden” and check for whether the terms “president,” “Joe Biden,” or “POTUS” appear in the text column. The second will add a column called “Trump” and check for whether the terms “former president” or “Donald Trump” appear in the text column.
By following the pattern in the two examples, users can code for as many sets of terms as they like.
Finally, the data with the newly added columns will be saved in an Excel file called “MyTweetsCoded.xlsx.”
# A basic coding operation
searchterms <- "president|Joe Biden|POTUS"
MyTweets$Biden <- ifelse(grepl(searchterms,
MyTweets$text,
ignore.case = TRUE),1,0)
searchterms <- "former president|Donald Trump"
MyTweets$Trump <- ifelse(grepl(searchterms,
MyTweets$text,
ignore.case = TRUE),1,0)
# Saving MyTweets file with coding results as MyTweetsCoded.xlsx
write_xlsx(MyTweets, "MyTweetsCoded.xlsx")