Below is the code written for the Week 14 Lab of MTSU’s JOUR 3841 course - Data Skills for Media Professionals - taught by Dr. Ken Blake.
After removing standard stop words and also the terms “title” and “tca”, some of the most easily identifiable words in the “tidy_text” data frame are “board”, “school”, “department,” and “public”. These words appear 275, 245, 219, and 201 times respectively. Therefore, I believe the most common topic in these Bills is education, and that a majority of these Bills are focused on education and school boards.
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")
library(tidyverse)
library(tidytext)
mydata <- read.csv("https://raw.githubusercontent.com/drkblake/Data/main/TNBills22_23.csv")
view(mydata)
tidy_text <- mydata %>%
unnest_tokens(word,description) %>%
count(word, sort = TRUE)
# Deleting standard stop words
data("stop_words")
tidy_text <- tidy_text %>%
anti_join(stop_words)
my_stopwords <- tibble(word = c("title",
"tca"))
tidy_text <- tidy_text %>%
anti_join(my_stopwords)