R Markdown

# Load packages

if (!require("tidyverse")) install.packages("tidyverse")
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require("tidytext")) install.packages("tidytext")
## Loading required package: tidytext
## Warning: package 'tidytext' was built under R version 4.3.3
library(tidyverse)
library(tidytext)

# Read the data

mydata <- read.csv("https://raw.githubusercontent.com/drkblake/Data/main/TNBills22_23.csv")

# Extract individual words to a "tidytext" data frame

tidy_text <- mydata %>% 
  unnest_tokens(word,description) %>% 
  count(word, sort = TRUE)

# Delete standard stop words

data("stop_words")
tidy_text <- tidy_text %>%
  anti_join(stop_words)
## Joining with `by = join_by(word)`
# Define search terms and count items that include them
# "School" terms are used as an example

searchterms <- "school|education"
mydata$SchoolTerm <- ifelse(grepl(searchterms,
                             mydata$description,
                             ignore.case = TRUE),1,0)
sum(mydata$SchoolTerm)
## [1] 171

I was able to determine that one of the most common topics that were discussed recently during bills that have been passed have been bills having to do with schools and bills concerning education. Using the description variable and using the code from monday’s lesson, I was able to determine that school was mentioned 171 times, making it one of the most mentioned topics in the data set.