Hypothesis

For this text analysis, my hypothesis was that during the 2020 Democratic debates, female pronouns would be surrounded by more negative words and male pronouns would be used with more positive words. This is based on the fact that politics in the United States are largely dominated by men. There has never been a female president in the US and so my personal expectation would be that due to the lack of females expierence in the position and the fact that the majority of people on the debate stage are male I would predict that the words surrounding female pronouns would be more negative. This hypothesis is also based on the fact that debates are very competitive and so the male candidates would try to paint the female candidates in a negative light.

R Requirements

1.Tidy text

2.Tidy verse

3.Text data

4.Debate Transcripts (code found on Kaggle produced from transcripts on rev.com)

library(tidytext)
library(tidyverse)
library(textdata)
debates <- readr::read_csv("/Users/Genna/Desktop/textanalysis/debate_transcripts.csv")

The data set that I used to analyze my hypothesis was found online at Kaggle. This data set took the transcripts of 9 of the 2020 Democratic debates (from July 30, 2019 to March 25, 2020) from rev.com and transformed it into a format that is R friendly. The different variables included are as follows.

Variable Explination
date date which the debate took place
debate_name name of the debate (including location)
debate_section section of the debate (ie part 1, gun control)
speaker candidate speaking
speech what was said by the speaker
speeking_time_seconds the amount of time spent speaking (seconds)

Figure 1. Table of all the variables and their description.

In order to test my hypothesis I specifically focused on date, speaker and speech.

Methodology

I decided that the best way to test my hypothesis was to look at each individual speaker. To accomplish this, I used a filter on the entire “debate” dataset and created variables for each candidate. I decided to only use the candidates that were present in all the debates because they would have the most speech in total and they were all included in the same total number of debates which seemed the most fair. This meant that the candidates I was left with were Joe Biden, Bernie Sanders, Elizabeth Warren, Amy Klouchar and Pete Buttigieg.

debates %>% 
  filter(speaker %in% "Amy Klobuchar") -> AmyText
debates %>% 
  filter(speaker %in% "Joe Biden") -> BidenText
debates %>% 
  filter(speaker %in% "Pete Buttigieg") -> PeteText
debates %>% 
  filter(speaker %in% "Bernie Sanders") -> BernieText
debates %>% 
  filter(speaker %in% "Elizabeth Warren") -> lizText

I created data sets including all the speech from each of the five candidates I chose to analyze. From there I proceeded to seperate the speech into individual words and I seperated words out in groups of two. I seperated words in twos because I wanted to be able to find what comes both before and after male and female pronouns during the speeches from the candidates.

BernieText %>% unnest_tokens(bigram, speech, token="ngrams", n=2) ->Sanders_bigrams
Sanders_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ") -> sanders_bigrams_seperated


BidenText %>% unnest_tokens(bigram, speech, token="ngrams", n=2) ->biden_bigrams
biden_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ") -> biden_bigrams_seperated

PeteText %>% unnest_tokens(bigram, speech, token="ngrams", n=2) ->pete_bigrams
pete_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ") -> pete_bigrams_seperated

AmyText %>% unnest_tokens(bigram, speech, token="ngrams", n=2) ->amy_bigrams
amy_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ") -> amy_bigrams_seperated


lizText %>% unnest_tokens(bigram, speech, token="ngrams", n=2) ->liz_bigrams
liz_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ") -> liz_bigrams_seperated

Below is one example of the output I got when seperating the speech as such.

liz_bigrams_seperated 
## # A tibble: 27,121 x 7
##    date   debate_name   debate_section speaker speaking_time_s… word1 word2
##    <chr>  <chr>         <chr>          <chr>              <dbl> <chr> <chr>
##  1 1/14/… January Iowa… Entire Debate  Elizab…                0 30    years
##  2 1/14/… January Iowa… Entire Debate  Elizab…                0 years ago  
##  3 1/14/… January Iowa… Entire Debate  Elizab…                0 ago   it's 
##  4 1/14/… January Iowa… Entire Debate  Elizab…                0 it's  what 
##  5 1/14/… January Iowa… Entire Debate  Elizab…                0 what  we   
##  6 1/14/… January Iowa… Entire Debate  Elizab…                0 we    need 
##  7 1/14/… January Iowa… Entire Debate  Elizab…                0 need  to   
##  8 1/14/… January Iowa… Entire Debate  Elizab…                0 to    do   
##  9 1/14/… January Iowa… Entire Debate  Elizab…                1 we    need 
## 10 1/14/… January Iowa… Entire Debate  Elizab…                1 need  a    
## # … with 27,111 more rows

Figure 2. This is the output I got when seperating Elizabeth Warren’s speech into bigrams. You can now see there is no “speech” variable but rather a “word1” and “word2” column that I created to make the speech easier to analyze. These are the columns I will focus on for the bulk of the analysis.

The next step was to create a data frame for both male and female pronouns to be able to search for these within the debate data. When analyzing the female candidates I used “I” and “me” as female pronouns and when analyzing the male candidates, I used “I” and “me” as male pronouns. This meant that I had to create two different data frames for each pronoun group.

pronounsFF <- c( "she", "hers", "her", "I", "me", "herself")
pronounsMF <- c("he", "his", "him", "himself")

pronounsFM <- c( "she", "hers", "her","herself")
pronounsMM <-c("he", "his", "him", "himself", "I", "me")
Now that the speech is seperated into bigrams and there are data frames created for each of the pronouns associated with either gender, the next step was to find the pronouns within the bigrams and note what words were directly before and after those pronouns. In order to do this I had to create four different variables for each candidate.

1.word before female- word spoken directly before a female pronoun

2.word before male- word spoken directly before a male pronoun

3.word after female- word spoken directly after a female pronoun

4.word after male- work spoken directly after a male pronoun

(stop words were removed from all of these variables)

liz_bigrams_filtered1F <- liz_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsFF) 
liz_bigrams_filtered1M <- liz_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsMF) 
liz_bigrams_filtered2F <- liz_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsFF) 
liz_bigrams_filtered2M <- liz_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsMF) 

liz_bigrams_filtered1F$word1 -> liz_word_before_female
liz_bigrams_filtered1M$word1 ->liz_word_before_male
as.data.frame(liz_word_before_female) -> word1_liz_female
as.data.frame(liz_word_before_male)-> word1_liz_male

liz_bigrams_filtered2F$word2 -> liz_word_after_female
liz_bigrams_filtered2M$word2 -> liz_word_after_male 
as.data.frame(liz_word_after_female) ->word2_liz_female
as.data.frame(liz_word_after_male) ->word2_liz_male


amy_bigrams_filtered1F <- amy_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsFF) 
amy_bigrams_filtered1M <- amy_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsMF) 
amy_bigrams_filtered2F <-amy_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsFF) 
amy_bigrams_filtered2M <- amy_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsMF) 

amy_bigrams_filtered1F$word1 -> amy_word_before_female
amy_bigrams_filtered1M$word1 ->amy_word_before_male
as.data.frame(amy_word_before_female) -> word1_amy_female
as.data.frame(amy_word_before_male)-> word1_amy_male

amy_bigrams_filtered2F$word2 -> amy_word_after_female
amy_bigrams_filtered2M$word2 -> amy_word_after_male 
as.data.frame(amy_word_after_female) ->word2_amy_female
as.data.frame(amy_word_after_male) ->word2_amy_male


pete_bigrams_filtered1F <- pete_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsFM) 
pete_bigrams_filtered1M <- pete_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsMM) 
pete_bigrams_filtered2F <- pete_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsFM) 
pete_bigrams_filtered2M <- pete_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsMM) 

pete_bigrams_filtered1F$word1 -> pete_word_before_female
pete_bigrams_filtered1M$word1 ->pete_word_before_male
as.data.frame(pete_word_before_female) -> word1_pete_female
as.data.frame(pete_word_before_male)-> word1_pete_male

pete_bigrams_filtered2F$word2 -> pete_word_after_female
pete_bigrams_filtered2M$word2 -> pete_word_after_male 
as.data.frame(pete_word_after_female) ->word2_pete_female
as.data.frame(pete_word_after_male) ->word2_pete_male


biden_bigrams_filtered1F <- biden_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsFM) 
biden_bigrams_filtered1M <- biden_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsMM) 
biden_bigrams_filtered2F <- biden_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsFM) 
biden_bigrams_filtered2M <- biden_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsMM) 

biden_bigrams_filtered1F$word1 -> biden_word_before_female
biden_bigrams_filtered1M$word1 ->biden_word_before_male
as.data.frame(biden_word_before_female) -> word1_biden_female
as.data.frame(biden_word_before_male)-> word1_biden_male

biden_bigrams_filtered2F$word2 -> biden_word_after_female
biden_bigrams_filtered2M$word2 -> biden_word_after_male 
as.data.frame(biden_word_after_female) ->word2_biden_female
as.data.frame(biden_word_after_male) ->word2_biden_male


sanders_bigrams_filtered1F <- sanders_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsFM) 
sanders_bigrams_filtered1M <- sanders_bigrams_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(word2 %in% pronounsMM) 
sanders_bigrams_filtered2F <- sanders_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsFM) 
sanders_bigrams_filtered2M <- sanders_bigrams_seperated %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(word1 %in% pronounsMM) 

sanders_bigrams_filtered1F$word1 -> sanders_word_before_female
sanders_bigrams_filtered1M$word1 ->sanders_word_before_male
as.data.frame(sanders_word_before_female) -> word1_sanders_female
as.data.frame(sanders_word_before_male)-> word1_sanders_male

sanders_bigrams_filtered2F$word2 -> sanders_word_after_female
sanders_bigrams_filtered2M$word2 -> sanders_word_after_male 
as.data.frame(sanders_word_after_female) ->word2_sanders_female
as.data.frame(sanders_word_after_male) ->word2_sanders_male

Below is an example of what the variables and data look like after preforming these steps.

This is the output for the words that come before male and female pronouns from the speaker Elizabeth Warren.

word1_liz_female
##    liz_word_before_female
## 1                 brought
## 2                  called
## 3                    told
## 4                    join
## 5                  letter
## 6                    told
## 7                  police
## 8                   meant
## 9                  mother
## 10                partner
## 11              addressed
## 12              mentioned
## 13                    hug
## 14                   join
## 15            identifying
## 16                 called
## 17                trashed
## 18               terrific
## 19                 wished
## 20                protect
## 21                  trust
## 22                   send
## 23                  class
## 24                brought
## 25                   send
## 26                 wished
word1_liz_male
##    liz_word_before_male
## 1                  beat
## 2               sanders
## 3                  send
## 4               helping
## 5                    45
## 6                  push
## 7             dictators
## 8            understand
## 9              enriched
## 10                syria
## 11               crisis
## 12                  job
## 13              america
## 14           originally
## 15                 wait
## 16           fundraiser
## 17             diabetes
## 18             diabetes
## 19            neighbors
## 20             language
## 21               hiding
## 22             changing
## 23              elected
## 24            bloomberg
## 25                trust
## 26                means
## 27                 2016
## 28                 2012
## 29              release
## 30             released
## 31                taxes
## 32              protect
## 33                hands
## 34               rachel
## 35                 carl
## 36              killing
## 37                 care
## 38                cover
## 39                month
## 40                 beat

Afinn Sentiment Analysis

From here I took each of the variable (before/after, male/female) for each speaker and ran an “afinn” sentiment analysis. This sentiment analysis assigns scores between -5 and 5 to words (-5 being high negative sentiment and 5 being high positive sentiment). It is important to note that not all words have a sentiment value and are considered “neutral”. While this removes a lot of words from my data set, it is still telling because that means that those words all have a neutral connotation and are stil contributing to the analysis of sentiment.
speaker/pronoun before sentiment after sentiment
Sanders/Female 8 1
Sanders/Male 67 20
Biden/Female 11 0
Biden/Male 193 24
Pete/Female 15 1
Pete/Male 81 12
Amy/Female 49 9
Amy/Male 164 21
Warren/Female 45 8
Warren/Male 77 11

Figure 3. Table of number of pronouns before and after afinn sentiment analysis by speaker and gender.

word1_sanders_female_afinn <- word1_sanders_female%>%
  inner_join(get_sentiments("afinn"), by = c(sanders_word_before_female = "word")) %>% 
  count(sanders_word_before_female, value, sort= TRUE) %>% 
  ungroup()
word1_sanders_male_afinn <- word1_sanders_male%>%
  inner_join(get_sentiments("afinn"), by = c(sanders_word_before_male = "word")) %>% 
  count(sanders_word_before_male, value, sort= TRUE) %>% 
  ungroup()
word2_sanders_female_afinn <- word2_sanders_female%>%
  inner_join(get_sentiments("afinn"), by = c(sanders_word_after_female = "word")) %>% 
  count(sanders_word_after_female, value, sort= TRUE) %>% 
  ungroup()
word2_sanders_male_afinn <- word2_sanders_male%>%
  inner_join(get_sentiments("afinn"), by = c(sanders_word_after_male = "word")) %>% 
  count(sanders_word_after_male, value, sort= TRUE) %>% 
  ungroup()

word1_biden_female_afinn <- word1_biden_female%>%
  inner_join(get_sentiments("afinn"), by = c(biden_word_before_female = "word")) %>% 
  count(biden_word_before_female, value, sort= TRUE) %>% 
  ungroup()
word1_biden_male_afinn <- word1_biden_male%>%
  inner_join(get_sentiments("afinn"), by = c(biden_word_before_male = "word")) %>% 
  count(biden_word_before_male, value, sort= TRUE) %>% 
  ungroup()
word2_biden_female_afinn <- word2_biden_female%>%
  inner_join(get_sentiments("afinn"), by = c(biden_word_after_female = "word")) %>% 
  count(biden_word_after_female, value, sort= TRUE) %>% 
  ungroup()
word2_biden_male_afinn <- word2_biden_male%>%
  inner_join(get_sentiments("afinn"), by = c(biden_word_after_male = "word")) %>% 
  count(biden_word_after_male, value, sort= TRUE) %>% 
  ungroup()

word1_pete_female_afinn <- word1_pete_female %>% 
  inner_join(get_sentiments("afinn"), by = c(pete_word_before_female = "word")) %>% 
  count(pete_word_before_female, value, sort= TRUE) %>% 
  ungroup()
word1_pete_male_afinn <-word1_pete_male %>% 
  inner_join(get_sentiments("afinn"), by= c(pete_word_before_male = "word")) %>% 
  count(pete_word_before_male, value, sort= TRUE) %>% 
  ungroup()
word2_pete_female_afinn <- word2_pete_female %>% 
  inner_join(get_sentiments("afinn"), by = c(pete_word_after_female = "word")) %>% 
  count(pete_word_after_female, value, sort= TRUE) %>% 
  ungroup()
word2_pete_male_afinn <- word2_pete_male %>%
  inner_join(get_sentiments("afinn"), by= c(pete_word_after_male= "word")) %>% 
  count(pete_word_after_male, value, sort= TRUE) %>% 
  ungroup()

word1_amy_female_afinn <- word1_amy_female%>%
  inner_join(get_sentiments("afinn"), by = c(amy_word_before_female = "word")) %>% 
  count(amy_word_before_female, value, sort= TRUE) %>% 
  ungroup()
word1_amy_male_afinn <- word1_amy_male %>% 
  inner_join(get_sentiments("afinn"), by= c(amy_word_before_male = "word")) %>% 
  count(amy_word_before_male, value, sort= TRUE) %>% 
  ungroup()
word2_amy_female_afinn <-word2_amy_female %>% 
  inner_join(get_sentiments("afinn"), by = c(amy_word_after_female = "word")) %>% 
  count(amy_word_after_female, value, sort= TRUE) %>% 
  ungroup()
word2_amy_male_afinn <- word2_amy_male %>%
  inner_join(get_sentiments("afinn"), by= c(amy_word_after_male= "word")) %>% 
  count(amy_word_after_male, value, sort= TRUE) %>% 
  ungroup()

word1_liz_female_afinn <- word1_liz_female%>%
  inner_join(get_sentiments("afinn"), by = c(liz_word_before_female = "word")) %>% 
  count(liz_word_before_female, value, sort= TRUE) %>% 
  ungroup()
word1_liz_male_afinn <- word1_liz_male %>% 
  inner_join(get_sentiments("afinn"), by= c(liz_word_before_male = "word")) %>% 
  count(liz_word_before_male, value, sort= TRUE) %>% 
  ungroup()
word2_liz_female_afinn <-word2_liz_female %>% 
  inner_join(get_sentiments("afinn"), by = c(liz_word_after_female = "word")) %>% 
  count(liz_word_after_female, value, sort= TRUE) %>% 
  ungroup()
word2_liz_male_afinn <- word2_liz_male %>%
  inner_join(get_sentiments("afinn"), by= c(liz_word_after_male= "word")) %>% 
  count(liz_word_after_male, value, sort= TRUE) %>% 
  ungroup()

Below is each bar graph showing the negative and positive words following both female and male pronouns seperated by speakers.

Bernie Sanders

word1_sanders_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(sanders_word_before_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word1_sanders_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(sanders_word_before_male, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_sanders_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(sanders_word_after_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_sanders_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(sanders_word_after_male, value), value)) + 
  geom_col() + 
  coord_flip() 

Joe Biden

word1_biden_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(biden_word_before_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word1_biden_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(biden_word_before_male, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_biden_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(biden_word_after_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_biden_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(biden_word_after_male, value), value)) + 
  geom_col() + 
  coord_flip() 

Pete Buttigieg

word1_pete_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(pete_word_before_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word1_pete_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(pete_word_before_male, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_pete_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(pete_word_after_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_pete_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(pete_word_after_male, value), value)) + 
  geom_col() + 
  coord_flip() 

Amy Klobuchar

word1_amy_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(amy_word_before_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word1_amy_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(amy_word_before_male, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_amy_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(amy_word_after_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_amy_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(amy_word_after_male, value), value)) + 
  geom_col() + 
  coord_flip() 

Elizabeth Warren

word1_liz_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(liz_word_before_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word1_liz_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(liz_word_before_male, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_liz_female_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(liz_word_after_female, value), value)) + 
  geom_col() + 
  coord_flip() 

word2_liz_male_afinn %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  ggplot(aes(reorder(liz_word_after_male, value), value)) + 
  geom_col() + 
  coord_flip() 

Initial Analysis

From these graphs I made a few initial conclusions. First of all, it is obvious that male pronouns are used more often in total. This makes sense because more canididates are male and the current president of the United States is male and he was referrenced a lot during the debates. Also from these first graphs it is clear that the female candidates use female pronouns a lot more often while speaking than the male candidates. After sentiment analysis, there are only two instances of male candidates using female pronouns and in both cases the words have a negative connotation. In order to more clearly show this I created graphs that show the total sentiment score per speaker for both male and female pronouns. In doing this I attempted to show the overall sentiment by speaker on the same scale and make it easier to draw conclusions about the sentiment surrounding these pronouns.

Total

Bernie Sanders

word1_sanders_male_afinn_total <-word1_sanders_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word1_sanders_female_afinn_total<- word1_sanders_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

word2_sanders_male_afinn_total <-word2_sanders_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word2_sanders_female_afinn_total<- word2_sanders_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

bind_rows(word2_sanders_male_afinn_total, word2_sanders_female_afinn_total, 
          word1_sanders_male_afinn_total, word1_sanders_female_afinn_total) %>% 
  ggplot(aes(pronoun,sentiment, fill=pronoun)) +
  geom_col(show.legend = FALSE)

This graph shows that in total, the words the Bernie Sanders used around female pronouns had an Afinn sentiment score of -2 and the words he used around male pronouns had an Afinn sentiment score of -14.

Joe Biden

This shows that during the debates when Joe Biden spoke, all of the words he used surrounding female pronouns had a neutral connotation while the words he used before and after male pronouns had an afinn sentiment score of -12.

word1_biden_male_afinn_total <-word1_biden_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word1_biden_female_afinn_total<- word1_biden_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

word2_biden_male_afinn_total <-word2_biden_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word2_biden_female_afinn_total<- word2_biden_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

bind_rows(word2_biden_male_afinn_total, word2_biden_female_afinn_total, 
          word1_biden_male_afinn_total, word1_biden_female_afinn_total) %>% 
  ggplot(aes(pronoun,sentiment, fill=pronoun)) +
  geom_col(show.legend = FALSE)

Pete Buttigieg

word1_pete_male_afinn_total <-word1_pete_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word1_pete_female_afinn_total<- word1_pete_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

word2_pete_male_afinn_total <-word2_pete_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word2_pete_female_afinn_total<- word2_pete_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

bind_rows(word2_pete_male_afinn_total, word2_pete_female_afinn_total, 
          word1_pete_male_afinn_total, word1_pete_female_afinn_total) %>% 
  ggplot(aes(pronoun,sentiment, fill=pronoun)) +
  geom_col(show.legend = FALSE)

This graph shows that the words that candidate Pete Buttigieg used surrounding female pronouns had an afinn sentiment score of -1 and the words he used around male pronouns had a score of positive 14.

Amy Klobuchar

This graph depicts that Amy Klobuchar uses words surrounding female pronouns with a total afinn sentiment score of -3 and that she uses words surrounding male pronouns with a total afinn sentiment score of -11.

word1_amy_male_afinn_total <-word1_amy_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word1_amy_female_afinn_total<- word1_amy_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

word2_amy_male_afinn_total <-word2_amy_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word2_amy_female_afinn_total<- word2_amy_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

bind_rows(word2_amy_male_afinn_total, word2_amy_female_afinn_total, 
          word1_amy_male_afinn_total, word1_amy_female_afinn_total) %>% 
  ggplot(aes(pronoun,sentiment, fill=pronoun)) +
  geom_col(show.legend = FALSE)

Elizabeth Warren

word1_liz_male_afinn_total <-word1_liz_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word1_liz_female_afinn_total<- word1_liz_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

word2_liz_male_afinn_total <-word2_liz_male_afinn %>%  
  summarise(sentiment= sum(value)) %>% 
  mutate(pronoun="Male") 

word2_liz_female_afinn_total<- word2_liz_female_afinn %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(pronoun="Female")

bind_rows(word2_liz_male_afinn_total, word2_liz_female_afinn_total, 
          word1_liz_male_afinn_total, word1_liz_female_afinn_total) %>% 
  ggplot(aes(pronoun,sentiment, fill=pronoun)) +
  geom_col(show.legend = FALSE)

This graph shows that the words that Elizabeth Warren used with female pronouns have an afinn sentiment score of 6 and the words she used with male pronouns have an afinn sentiment score of -3.

Total

This graph combines all of the speakers and just takes into account the sentiment of all the words spoken during the debates around male and female pronouns and their sentiment score. It shows that overall the words used with female pronouns are equally negative and positive while the words used with male pronouns are overwhelmingly negative. However, it is important to remember (and is restated in the conclusion) that there are several more instances of male pronouns being used overall which could skew the results.

bind_rows(word2_liz_male_afinn_total, word2_liz_female_afinn_total, 
          word1_liz_male_afinn_total, word1_liz_female_afinn_total,
          word2_amy_male_afinn_total, word2_amy_female_afinn_total, 
          word1_amy_male_afinn_total, word1_amy_female_afinn_total,
          word2_pete_male_afinn_total, word2_pete_female_afinn_total, 
          word1_pete_male_afinn_total, word1_pete_female_afinn_total,
          word2_biden_male_afinn_total, word2_biden_female_afinn_total, 
          word1_biden_male_afinn_total, word1_biden_female_afinn_total,
          word2_sanders_male_afinn_total, word2_sanders_female_afinn_total, 
          word1_sanders_male_afinn_total, word1_sanders_female_afinn_total) %>% 
  ggplot(aes(pronoun,sentiment, fill=pronoun)) +
  geom_col(show.legend = FALSE)

Conclusion

When looking at all of the evidence complied in this report, it can be concluded that the original hypothesis is disproven or at least lacking sufficient evidence. After running multiple filters on the data to try to get accurate estimates for different categories, the sample size became very small. The numbers and sentiment scores are skewed because there were far more overall words used with male pronouns in the first place and then again after sentiment analysis. This means that there are more words contributing to the sentiment score so it is not beneficial to measure using the same scale. However, while this data ended up being inconclusive for the original hypothesis it did prove that even though there are female candidates participating in the debates, the male pronouns are still used far more often. And it also showed evidence that female candidates are more likely to speak about other females. This follows what most people would probably assume but it is supported by this analysis, specifically by the sentiment graphs and the table which shows the frequencies before and after analysis (figure 3). Overall, I would conclude that by just looking at these candidates for the 2020 Democratic debates, there is not enough evidence to confidently state that either male or female pronouns are used in association with more negative words.