What America’s Governors Are Talking About

The dataset contains every one-word phrase that was mentioned in at least 10 speeches and every two- or three-word phrase that was mentioned in at least five speeches by the State Governors.

[State of the State data web link] (https://github.com/fivethirtyeight/data/tree/master/state-of-the-state)

The dataset consists of the following columns:

phrase : one-, two- or three-word phrase

category : thematic categories

d_speeches: number of Democratic speeches

r_speeches: number of Republican speeches

total: total number of speeches

percent_of_d_speeches: percent of the 23 Democratic speeches containing the phrase

percent_of_r_speeches: percent of the 27 Republican speeches containing the phrase

chi2: chi^2 statistics

pval: p-value for chi^2 test

Load Libraries

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(sqldf)

## Loading required package: gsubfn

## Loading required package: proto

## Loading required package: RSQLite

## Read the Original data from GitHub link
urlfile <- 'https://raw.githubusercontent.com/fivethirtyeight/data/master/state-of-the-state/words.csv'
datain <- read.csv(urlfile)
speech_data <- data.frame(datain)

Rename the columns

View sample data

colnames(speech_data) <- c("Phrase","Category", "Democratic_Speeches", "Republican_Speeches", "Total_Speeches", "%_of_Dem_Speeches", "%_of_Rep_Speeches", "Chi^2","Probability_Measure")
head(speech_data)

##            Phrase              Category Democratic_Speeches Republican_Speeches
## 1    minimum wage economy/fiscal issues                   9                   0
## 2    clean energy    energy/environment                  11                   1
## 3  climate change    energy/environment                  13                   2
## 4    gun violence         crime/justice                   8                   0
## 5 affordable care                                        10                   1
## 6   international                                         0                  10
##   Total_Speeches %_of_Dem_Speeches %_of_Rep_Speeches     Chi^2
## 1              9             39.13              0.00 10.565217
## 2             12             47.83              3.70 10.074611
## 3             15             56.52              7.41  9.986581
## 4              8             34.78              0.00  9.391304
## 5             11             43.48              3.70  8.931196
## 6             10              0.00             37.04  8.518519
##   Probability_Measure
## 1         0.001152355
## 2         0.001503264
## 3         0.001576851
## 4         0.002180170
## 5         0.002803407
## 6         0.003515506

Subset the data frame with Category and Total_Speeches columns

Display the Categories, order by sum of speeches

# Create Sub-set data
speech_byCategory_subdata <- subset(speech_data, select = c("Category","Total_Speeches"))

# Aggregate them
speech_byCategory_aggregate_data <-  aggregate(speech_byCategory_subdata$Total_Speeches, by=list(speech_byCategory_subdata$Category), FUN=sum)
#View(speech_byCategory_aggregate_data)

# Order them
speech_byCategory_order_aggregate_data <- speech_byCategory_aggregate_data[order(speech_byCategory_aggregate_data$x),] 
colnames(speech_byCategory_order_aggregate_data) <- c("Category", "Total_Associated_Speeches")

# Filter blank Catergory
speech_byCategory_order_aggregate_data[!(!is.na(speech_byCategory_order_aggregate_data$Category) & speech_byCategory_order_aggregate_data$Category == ""), ]

##                        Category Total_Associated_Speeches
## 8 mental health/substance abuse                       206
## 6            energy/environment                       226
## 3                 crime/justice                       424
## 7                   health care                       451
## 5                     education                      1275
## 4         economy/fiscal issues                      2651
## 2                                                   30401

View(speech_byCategory_order_aggregate_data)

Of all the categories, leading topics are “Economy/Fiscal issues”, “Education” and “Health Care”.

Subset the data frame based on Words, Category used by Governors

Display the Words, order by number of speeches by their Polical affiliation

Top 10 Phrases used by Democratic Governors

# Create Sub-set data
Phrases_subdata <- subset(speech_data, select = c("Phrase", "Category","Democratic_Speeches", "Republican_Speeches"))

library(sqldf)
top_Dem_Phrases <- sqldf( "SELECT * FROM Phrases_subdata WHERE TRIM(Category) != '' ORDER BY Democratic_Speeches DESC LIMIT 10", row.names=FALSE)


knitr::kable(top_Dem_Phrases, format="html")

Phrase	Category	Democratic_Speeches	Republican_Speeches
health care	health care	23	19
business	economy/fiscal issues	23	24
health	health care	23	25
economic	economy/fiscal issues	23	25
budget	economy/fiscal issues	23	25
students	education	23	25
education	education	23	27
school	education	23	27
working	economy/fiscal issues	23	27
economy	economy/fiscal issues	22	24

Top 10 Phrases used by Repulican Governors

top_Rep_Phrases <- sqldf( "SELECT * FROM Phrases_subdata WHERE TRIM(Category) != '' ORDER BY Republican_Speeches DESC LIMIT 10", row.names=FALSE)

knitr::kable(top_Rep_Phrases, format="html")

Phrase	Category	Democratic_Speeches	Republican_Speeches
education	education	23	27
school	education	23	27
working	economy/fiscal issues	23	27
job	economy/fiscal issues	22	26
jobs	economy/fiscal issues	22	26
tax	economy/fiscal issues	18	25
health	health care	23	25
economic	economy/fiscal issues	23	25
budget	economy/fiscal issues	23	25
students	education	23	25

Phrases which are used by Democrates only, not by Republicans

Phrases_DemsOnly <- sqldf( "SELECT * FROM Phrases_subdata WHERE TRIM(Category) != '' AND Republican_Speeches == 0 ORDER BY Democratic_Speeches DESC LIMIT 15", row.names=FALSE)

knitr::kable(Phrases_DemsOnly, format="html")

Phrase	Category	Democratic_Speeches
minimum wage	economy/fiscal issues	9
gun violence	crime/justice	8
education need	education	7
students state	education	6
gun safety	crime/justice	6
pre existing conditions	health care	5
reproductive health	health care	5
educators deserve	education	5
energy future	energy/environment	5
economy works	economy/fiscal issues	5
existing conditions	health care	5
cost health	health care	5

Phrases which are used by Republicans only, not by Democrates

Phrases_RepsOnly <- sqldf( "SELECT * FROM Phrases_subdata WHERE TRIM(Category) != '' AND Democratic_Speeches == 0 ORDER BY Republican_Speeches DESC LIMIT 15", row.names=FALSE)

knitr::kable(Phrases_RepsOnly, format="html")

Phrase	Category	Republican_Speeches
doing business	economy/fiscal issues	7
state income	economy/fiscal issues	7
savings account	economy/fiscal issues	5
schools safer	crime/justice	5
local law enforcement	crime/justice	5
prison population	crime/justice	5
local law	crime/justice	5
state income tax	economy/fiscal issues	5
education workforce	education	5
tax rates	economy/fiscal issues	5
fully funding	economy/fiscal issues	5

Top Phrases Plot by Democratic Governors

library(ggplot2)
library(ggbeeswarm)
dem_plot <- ggplot(data = top_Dem_Phrases,
  aes(y =Phrase , x = Democratic_Speeches)) +  geom_beeswarm()
dem_plot

Top Phrases Plot by Republican Governors

rep_plot <- ggplot(data = top_Rep_Phrases,
  aes(y =Phrase , x = Republican_Speeches)) +  geom_boxplot(notch=FALSE)
rep_plot

Let’s see which Phrases are used by Governors from both sides in the top Category

Added a new column Variance in the dataframe

List the Phrases with least Variance

# Create Sub-set data
Category_subdata <- subset(speech_data, select = c("Phrase","Category","Democratic_Speeches", "Republican_Speeches"))
Category_subdata$Variance <- abs(Category_subdata$Democratic_Speeches - Category_subdata$Republican_Speeches)

Category_match_data <- sqldf( "SELECT * FROM Category_subdata WHERE TRIM(Category) != '' AND Variance == 0 AND Democratic_Speeches > 5", row.names=FALSE)

knitr::kable(Category_match_data, format="html")

Phrase	Category	Democratic_Speeches	Republican_Speeches
cost	economy/fiscal issues	20	20
teachers	education	19	19
employees	economy/fiscal issues	19	19
economic development	economy/fiscal issues	14	14
opioid	mental health/substance abuse	11	11
spend	economy/fiscal issues	11	11
educators	education	10	10
colleges	education	9	9
careers	economy/fiscal issues	9	9
education funding	education	6	6
entrepreneurs	economy/fiscal issues	6	6
substance abuse	mental health/substance abuse	6	6

List the Phrases with least Variance for “Economy/Fiscal Issues” used by the Governors

TopCategory_match_data <- sqldf( "SELECT * FROM Category_subdata WHERE Category == 'economy/fiscal issues' AND Variance == 0", row.names=FALSE)

knitr::kable(TopCategory_match_data, format="html")

Phrase	Category	Democratic_Speeches	Republican_Speeches
cost	economy/fiscal issues	20	20
employees	economy/fiscal issues	19	19
economic development	economy/fiscal issues	14	14
spend	economy/fiscal issues	11	11
careers	economy/fiscal issues	9	9
entrepreneurs	economy/fiscal issues	6	6
fiscally	economy/fiscal issues	5	5
tax credit	economy/fiscal issues	5	5
tax relief	economy/fiscal issues	4	4
fully fund	economy/fiscal issues	3	3
business leaders	economy/fiscal issues	3	3
budget includes	economy/fiscal issues	3	3
cut taxes	economy/fiscal issues	3	3
new taxes	economy/fiscal issues	3	3

To conclude, based on the above analysis we found that Governors mostly talk about the economy/fiscal, education and mental health/substance abuse issues. The top ranking is economy/fiscal issues which tries to address the cost of doing business, impact on careers, employees and business development using tax relief, tax credit, cutting taxes in some areas and adding new taxes where possible.

It is also noticed that Democratic Governors talk about minimum wage, gun violence and education need which Republicans never bring up in their speeches.

DATA 607 Homework
Words used in Speeches for State Of The States

Bikram Barua

8/25/2021

What America’s Governors Are Talking About

The dataset contains every one-word phrase that was mentioned in at least 10 speeches and every two- or three-word phrase that was mentioned in at least five speeches by the State Governors.

The dataset consists of the following columns:

phrase : one-, two- or three-word phrase

category : thematic categories

d_speeches: number of Democratic speeches

r_speeches: number of Republican speeches

total: total number of speeches

percent_of_d_speeches: percent of the 23 Democratic speeches containing the phrase

percent_of_r_speeches: percent of the 27 Republican speeches containing the phrase

chi2: chi^2 statistics

pval: p-value for chi^2 test

Load Libraries

Rename the columns

View sample data

Subset the data frame with Category and Total_Speeches columns

Display the Categories, order by sum of speeches

Of all the categories, leading topics are “Economy/Fiscal issues”, “Education” and “Health Care”.

Subset the data frame based on Words, Category used by Governors

Display the Words, order by number of speeches by their Polical affiliation

Top 10 Phrases used by Democratic Governors

Top 10 Phrases used by Repulican Governors

Phrases which are used by Democrates only, not by Republicans

Phrases which are used by Republicans only, not by Democrates

Top Phrases Plot by Democratic Governors

Top Phrases Plot by Republican Governors

Let’s see which Phrases are used by Governors from both sides in the top Category

Added a new column Variance in the dataframe

List the Phrases with least Variance

List the Phrases with least Variance for “Economy/Fiscal Issues” used by the Governors

It is also noticed that Democratic Governors talk about minimum wage, gun violence and education need which Republicans never bring up in their speeches.

On the other hand, Republican Governors talk about doing business, state income and savings account which Democrates never discuss.

DATA 607 Homework Words used in Speeches for State Of The States

Bikram Barua

8/25/2021

What America’s Governors Are Talking About

The dataset contains every one-word phrase that was mentioned in at least 10 speeches and every two- or three-word phrase that was mentioned in at least five speeches by the State Governors.

The dataset consists of the following columns:

phrase : one-, two- or three-word phrase

category : thematic categories

d_speeches: number of Democratic speeches

r_speeches: number of Republican speeches

total: total number of speeches

percent_of_d_speeches: percent of the 23 Democratic speeches containing the phrase

percent_of_r_speeches: percent of the 27 Republican speeches containing the phrase

chi2: chi^2 statistics

pval: p-value for chi^2 test

Load Libraries

Rename the columns

View sample data

Subset the data frame with Category and Total_Speeches columns

Display the Categories, order by sum of speeches

Of all the categories, leading topics are “Economy/Fiscal issues”, “Education” and “Health Care”.

Subset the data frame based on Words, Category used by Governors

Display the Words, order by number of speeches by their Polical affiliation

Top 10 Phrases used by Democratic Governors

Top 10 Phrases used by Repulican Governors

Phrases which are used by Democrates only, not by Republicans

Phrases which are used by Republicans only, not by Democrates

Top Phrases Plot by Democratic Governors

Top Phrases Plot by Republican Governors

Let’s see which Phrases are used by Governors from both sides in the top Category

Added a new column Variance in the dataframe

List the Phrases with least Variance

List the Phrases with least Variance for “Economy/Fiscal Issues” used by the Governors

It is also noticed that Democratic Governors talk about minimum wage, gun violence and education need which Republicans never bring up in their speeches.

On the other hand, Republican Governors talk about doing business, state income and savings account which Democrates never discuss.

DATA 607 Homework
Words used in Speeches for State Of The States