What America’s Governors Are Talking About

The dataset contains every one-word phrase that was mentioned in at least 10 speeches and every two- or three-word phrase that was mentioned in at least five speeches by the State Governors.

[State of the State data web link] (https://github.com/fivethirtyeight/data/tree/master/state-of-the-state)


The dataset consists of the following columns:

phrase : one-, two- or three-word phrase
category : thematic categories
d_speeches: number of Democratic speeches
r_speeches: number of Republican speeches
total: total number of speeches
percent_of_d_speeches: percent of the 23 Democratic speeches containing the phrase
percent_of_r_speeches: percent of the 27 Republican speeches containing the phrase
chi2: chi^2 statistics
pval: p-value for chi^2 test


Load Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Read the Original data from GitHub link
urlfile <- 'https://raw.githubusercontent.com/fivethirtyeight/data/master/state-of-the-state/words.csv'
datain <- read.csv(urlfile)
speech_data <- data.frame(datain)


Rename the columns

View sample data

colnames(speech_data) <- c("Phrase","Category", "Democratic_Speeches", "Republican_Speeches", "Total_Speeches", "%_of_Dem_Speeches", "%_of_Rep_Speeches", "Chi^2","Probability_Measure")
head(speech_data)
##            Phrase              Category Democratic_Speeches Republican_Speeches
## 1    minimum wage economy/fiscal issues                   9                   0
## 2    clean energy    energy/environment                  11                   1
## 3  climate change    energy/environment                  13                   2
## 4    gun violence         crime/justice                   8                   0
## 5 affordable care                                        10                   1
## 6   international                                         0                  10
##   Total_Speeches %_of_Dem_Speeches %_of_Rep_Speeches     Chi^2
## 1              9             39.13              0.00 10.565217
## 2             12             47.83              3.70 10.074611
## 3             15             56.52              7.41  9.986581
## 4              8             34.78              0.00  9.391304
## 5             11             43.48              3.70  8.931196
## 6             10              0.00             37.04  8.518519
##   Probability_Measure
## 1         0.001152355
## 2         0.001503264
## 3         0.001576851
## 4         0.002180170
## 5         0.002803407
## 6         0.003515506


Subset the data frame with Category and Total_Speeches columns

Display the Categories, order by sum of speeches

# Create Sub-set data
speech_byCategory_subdata <- subset(speech_data, select = c("Category","Total_Speeches"))

# Aggregate them
speech_byCategory_aggregate_data <-  aggregate(speech_byCategory_subdata$Total_Speeches, by=list(speech_byCategory_subdata$Category), FUN=sum)
#View(speech_byCategory_aggregate_data)

# Order them
speech_byCategory_order_aggregate_data <- speech_byCategory_aggregate_data[order(speech_byCategory_aggregate_data$x),] 
colnames(speech_byCategory_order_aggregate_data) <- c("Category", "Total_Associated_Speeches")

# Filter blank Catergory
speech_byCategory_order_aggregate_data[!(!is.na(speech_byCategory_order_aggregate_data$Category) & speech_byCategory_order_aggregate_data$Category == ""), ]
##                        Category Total_Associated_Speeches
## 8 mental health/substance abuse                       206
## 6            energy/environment                       226
## 3                 crime/justice                       424
## 7                   health care                       451
## 5                     education                      1275
## 4         economy/fiscal issues                      2651
## 2                                                   30401
View(speech_byCategory_order_aggregate_data)

Of all the categories, leading topics are “Economy/Fiscal issues”, “Education” and “Health Care”.


Subset the data frame based on Words, Category used by Governors

Display the Words, order by number of speeches by their Polical affiliation

Top 10 Phrases used by Democratic Governors

# Create Sub-set data
Phrases_subdata <- subset(speech_data, select = c("Phrase", "Category","Democratic_Speeches", "Republican_Speeches"))

library(sqldf)
top_Dem_Phrases <- sqldf( "SELECT * FROM Phrases_subdata WHERE TRIM(Category) != '' ORDER BY Democratic_Speeches DESC LIMIT 10", row.names=FALSE)


knitr::kable(top_Dem_Phrases, format="html")
Phrase Category Democratic_Speeches Republican_Speeches
health care health care 23 19
business economy/fiscal issues 23 24
health health care 23 25
economic economy/fiscal issues 23 25
budget economy/fiscal issues 23 25
students education 23 25
education education 23 27
school education 23 27
working economy/fiscal issues 23 27
economy economy/fiscal issues 22 24


Top 10 Phrases used by Repulican Governors

top_Rep_Phrases <- sqldf( "SELECT * FROM Phrases_subdata WHERE TRIM(Category) != '' ORDER BY Republican_Speeches DESC LIMIT 10", row.names=FALSE)

knitr::kable(top_Rep_Phrases, format="html")
Phrase Category Democratic_Speeches Republican_Speeches
education education 23 27
school education 23 27
working economy/fiscal issues 23 27
job economy/fiscal issues 22 26
jobs economy/fiscal issues 22 26
tax economy/fiscal issues 18 25
health health care 23 25
economic economy/fiscal issues 23 25
budget economy/fiscal issues 23 25
students education 23 25


Phrases which are used by Democrates only, not by Republicans

Phrases_DemsOnly <- sqldf( "SELECT * FROM Phrases_subdata WHERE TRIM(Category) != '' AND Republican_Speeches == 0 ORDER BY Democratic_Speeches DESC LIMIT 15", row.names=FALSE)

knitr::kable(Phrases_DemsOnly, format="html")
Phrase Category Democratic_Speeches Republican_Speeches
minimum wage economy/fiscal issues 9 0
gun violence crime/justice 8 0
education need education 7 0
students state education 6 0
gun safety crime/justice 6 0
pre existing conditions health care 5 0
reproductive health health care 5 0
educators deserve education 5 0
energy future energy/environment 5 0
economy works economy/fiscal issues 5 0
existing conditions health care 5 0
cost health health care 5 0


Phrases which are used by Republicans only, not by Democrates

Phrases_RepsOnly <- sqldf( "SELECT * FROM Phrases_subdata WHERE TRIM(Category) != '' AND Democratic_Speeches == 0 ORDER BY Republican_Speeches DESC LIMIT 15", row.names=FALSE)

knitr::kable(Phrases_RepsOnly, format="html")
Phrase Category Democratic_Speeches Republican_Speeches
doing business economy/fiscal issues 0 7
state income economy/fiscal issues 0 7
savings account economy/fiscal issues 0 5
schools safer crime/justice 0 5
local law enforcement crime/justice 0 5
prison population crime/justice 0 5
local law crime/justice 0 5
state income tax economy/fiscal issues 0 5
education workforce education 0 5
tax rates economy/fiscal issues 0 5
fully funding economy/fiscal issues 0 5


Top Phrases Plot by Democratic Governors

library(ggplot2)
library(ggbeeswarm)
dem_plot <- ggplot(data = top_Dem_Phrases,
  aes(y =Phrase , x = Democratic_Speeches)) +  geom_beeswarm()
dem_plot


Top Phrases Plot by Republican Governors

rep_plot <- ggplot(data = top_Rep_Phrases,
  aes(y =Phrase , x = Republican_Speeches)) +  geom_boxplot(notch=FALSE)
rep_plot


Let’s see which Phrases are used by Governors from both sides in the top Category

Added a new column Variance in the dataframe

List the Phrases with least Variance

# Create Sub-set data
Category_subdata <- subset(speech_data, select = c("Phrase","Category","Democratic_Speeches", "Republican_Speeches"))
Category_subdata$Variance <- abs(Category_subdata$Democratic_Speeches - Category_subdata$Republican_Speeches)

Category_match_data <- sqldf( "SELECT * FROM Category_subdata WHERE TRIM(Category) != '' AND Variance == 0 AND Democratic_Speeches > 5", row.names=FALSE)

knitr::kable(Category_match_data, format="html")
Phrase Category Democratic_Speeches Republican_Speeches Variance
cost economy/fiscal issues 20 20 0
teachers education 19 19 0
employees economy/fiscal issues 19 19 0
economic development economy/fiscal issues 14 14 0
opioid mental health/substance abuse 11 11 0
spend economy/fiscal issues 11 11 0
educators education 10 10 0
colleges education 9 9 0
careers economy/fiscal issues 9 9 0
education funding education 6 6 0
entrepreneurs economy/fiscal issues 6 6 0
substance abuse mental health/substance abuse 6 6 0


List the Phrases with least Variance for “Economy/Fiscal Issues” used by the Governors

TopCategory_match_data <- sqldf( "SELECT * FROM Category_subdata WHERE Category == 'economy/fiscal issues' AND Variance == 0", row.names=FALSE)

knitr::kable(TopCategory_match_data, format="html")
Phrase Category Democratic_Speeches Republican_Speeches Variance
cost economy/fiscal issues 20 20 0
employees economy/fiscal issues 19 19 0
economic development economy/fiscal issues 14 14 0
spend economy/fiscal issues 11 11 0
careers economy/fiscal issues 9 9 0
entrepreneurs economy/fiscal issues 6 6 0
fiscally economy/fiscal issues 5 5 0
tax credit economy/fiscal issues 5 5 0
tax relief economy/fiscal issues 4 4 0
fully fund economy/fiscal issues 3 3 0
business leaders economy/fiscal issues 3 3 0
budget includes economy/fiscal issues 3 3 0
cut taxes economy/fiscal issues 3 3 0
new taxes economy/fiscal issues 3 3 0


To conclude, based on the above analysis we found that Governors mostly talk about the economy/fiscal, education and mental health/substance abuse issues. The top ranking is economy/fiscal issues which tries to address the cost of doing business, impact on careers, employees and business development using tax relief, tax credit, cutting taxes in some areas and adding new taxes where possible.

It is also noticed that Democratic Governors talk about minimum wage, gun violence and education need which Republicans never bring up in their speeches.

On the other hand, Republican Governors talk about doing business, state income and savings account which Democrates never discuss.