Word frequency counter

This code will read an excel file called “MyTweets.xlsx” and produce word frequency counts for text in a column called “text.” Before producing the word frequency count, it will automatically delete standard “stop words” (a, and, the, etc.) and, optionally, words from a user-customizable list called my_stopwords. Finally, the script will save the word frequency counts as an Excel file called “WordFrequencies.xlsx.”

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")
if (!require("dplyr")) install.packages("dplyr")
if (!require("writexl")) install.packages("writexl")

library(readxl)
library(tidytext)
library(dplyr)
library(writexl)

MyTweets <- read_excel("MyTweets.xlsx")

# Tokenizing by word and counting words by source

tidy_text <- MyTweets %>% 
  unnest_tokens(word,text) %>% 
  count(word, sort = TRUE)

# Deleting standard stop words
data("stop_words")
tidy_text <- tidy_text %>%
  anti_join(stop_words)

# Deleting custom stop words
my_stopwords <- tibble(word = c("and",
                                "but"))
tidy_text <- tidy_text %>% 
  anti_join(my_stopwords)

View(tidy_text)

# Saving word frequencies in Excel format
write_xlsx(tidy_text, "WordFrequencies.xlsx")

Basic word/phrase coder

You can use this code to add a user-named column to the data file, check each cell in the data file’s “text” column for the presence of one or more user-specified words or phrases, then add a 1 if the word or phrase is found and a 0 if the word or phrase is not found.

Two examples are shown. The first will add a column called “Biden” and check for whether the terms “president,” “Joe Biden,” or “POTUS” appear in the text column. The second will add a column called “Trump” and check for whether the terms “former president” or “Donald Trump” appear in the text column.

By following the pattern in the two examples, users can code for as many sets of terms as they like.

Finally, the data with the newly added columns will be saved in an Excel file called “MyTweetsCoded.xlsx.”

# A basic coding operation

searchterms <- "president|Joe Biden|POTUS"
MyTweets$Biden <- ifelse(grepl(searchterms,
                                MyTweets$text,
                                ignore.case = TRUE),1,0)

searchterms <- "former president|Donald Trump"
MyTweets$Trump <- ifelse(grepl(searchterms,
                               MyTweets$text,
                               ignore.case = TRUE),1,0)

# Saving MyTweets file with coding results as MyTweetsCoded.xlsx
write_xlsx(MyTweets, "MyTweetsCoded.xlsx")