The state of Texas adopted death by lethal injection in 1977. The first prisoner was lethally injected on December 7, 1982. His last words were, “I, at this very moment, have absolutely no fear of what may happen to this body… Verily unto Allah do we belong, verily unto him do we return. Be strong.” Since then, 559 other inmates have been lethally injected by the Texas Department of Criminal Justice.
In this project I explore the word frequency and sentiments of the last statements made by inmates before their execution on Death Row. I hypothesize that the frequency of positive words spoken by death row inmates in their last statements will be at least five percent higher than the frequency of negative words in last statements. After all, who wants their last words on Earth to be negative? Additionally, I hypothesize that the top five most commonly used words will all be assigned a positive sentiment by the bing lexicon.
The original dataset used for this project contains information about criminals executed by the Texas Department of Criminal Justice from 1982 to November 8, 2017. The dataset used from Kaggle was slightly out of date, so I added 15 new observations available from the Texas Department of Criminal Justice website to the existing 545 observations in the dataset from Kaggle. The data compiled and used in this project was last updated on March 1, 2019 and contains 560 observations. The last statements varied greatly in length, from 0 words (the 116 inmates who chose not to give a last statement) to 1,292 words. Twenty percent of death row inmates decided not to give a last statement. The average number of words inamtes used in inmates’ last statements was 112 and the median was 78.5 words. Find how I calculated this and made my variables below.
library(tidyverse)
library(tidytext)
library(dplyr)
#load dataset
lastWords <- read_csv("lastwords.csv")
#find occurance of no last statement
lastWords %>%
filter(LastStatement == "None")->tidyNone
#what is the number of words in each statement?
lastWords %>%
filter(!LastStatement %in% "None") %>%
mutate(numWords = (lengths(gregexpr("\\W+", LastStatement)) + 1)) -> lastwords2
#average number of words said
mean(lastwords2[["numWords"]])
#range of number of words said
range(lastwords2[["numWords"]])
#median of number of words said
median(lastwords2[["numWords"]])
I first found the words most frequently used by inmates in their last statements before execution and graphed all the words in a word cloud. I graphed the top 15 words in a bar chart. See my code and visualization below.
tidyLastWords <- lastWords %>%
unnest_tokens(word, LastStatement)
#get rid of stop words
tidyLastWords %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) ->tidyLastWordsSorted
#make bar chart of top 15 last words
tidyLastWordsSorted %>%
top_n(15) %>%
ggplot(aes(reorder(word, n), n)) + geom_col(fill= "#00BA38") + coord_flip() +
labs(y = "Word Frequency", x = "Word",
title = "Frequency of Top 15 Words") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text=element_text(size=15),
axis.title=element_text(size=20,face="bold"),
plot.title=element_text(size=18,face="bold")) -> top15
top15
#word cloud of last words
library(wordcloud2)
wordcloud2(tidyLastWordsSorted, size= 1.5)
The top five words - love, family, God, life, and hope - appear to be strikingly positive for an inmate facing imminent death. Love, by far the most commonly used word, appeared 799 times. The word family was used 361 times, God 243 times, life 168 times, and hope 167 times.
I then visualized the overall sentiment of the last statements of all inmates who gave a last statement. I used the bing lexicon to assign words either a positive of negative sentiment.
#get bing sentiments
LWSentiment <- tidyLastWords %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
#create variable for overall positive and negative
bingLWpos <- subset(LWSentiment, sentiment == "positive")
bingLWneg <- subset(LWSentiment, sentiment == "negative")
# group and count
LWSentiment2 <- LWSentiment %>% group_by(sentiment) %>% count(sentiment) %>% ungroup()
#graph
LWSentiment2 %>%
ggplot(aes(sentiment, n)) +
geom_col(show.legend = FALSE) +
geom_col(data = bingLWneg, fill = "#F8766D") +
geom_col(data = bingLWpos, fill = "#00BFC4") +
labs(title = "Sentiment of Last Words") +
labs(y = "Frequency") +
labs(x = "Sentiment") +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text=element_text(size=15),
axis.title=element_text(size=20,face="bold"),
plot.title=element_text(size=18,face="bold"))
I found that, overall, not just in the top 15 words, the sentiment of the inmates’ last statements was more positive than negative. Inmates used 2,925 positive words and 1,589 negative words in their last statements.
I then visualized the top ten words that had a positive sentiment and the top ten words that had a negative sentiment using the bing lexicon.
LWSentiment %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
coord_flip() +
labs(y = "Contribution to Sentiment", x = NULL, title = "Top 10 Positive and Negative Words") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text=element_text(size=15),
axis.title=element_text(size=20,face="bold"),
plot.title=element_text(size=18,face="bold"))
I also visualized this using a comparison cloud where the most commonly used words are separated by sentiment.
#make positive to negative word cloud
library(reshape2)
library(wordcloud)
tidyLastWords %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud(colors = c("gray20", "gray80"),
max.words = 100)
Love remained the most used positive word by far with thank, love, and peace as the next most common, but far less used, words. On the negative side, sorry was used twice as much as the next most used negative words, death and pain.
I then turned to the NRC lexicon to assign emotions to words and plot the emotions expressed by the last words of the inmates. I plotted the emotions of the top 50 most frequently used words.
#nrc sentiments
LWSentimentNRC <- tidyLastWords %>%
inner_join(get_sentiments("nrc")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
#graph nrc sentiments of top 50 words -> 10 emotions
LWSentimentNRC %>% top_n(50) %>% ggplot(aes(sentiment, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
labs(y = "Frequency", x = "NRC Sentiment", title = "NRC Sentiment of Top 50 Words") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text=element_text(size=15),
axis.title=element_text(size=20,face="bold"),
plot.title=element_text(size=18,face="bold"))
Positivity was the most expressed emotion of the 50 most common words followed by joy, trust, anticipation, and fear. The top three emotions all expressed feelings that skew positive.
After looking at the popularity of individual words and the emotions they conveyed, I decided to start investigating the most frequently used bigrams, or pairs of words, to see what new context they might bring to the inmates’ last statements. Sometimes when individual words are pulled out of sentences they lose context. See below how I cleaned the bigrams for stop words and for “NA”, or prisoners who decided not to give a last statement, and visualized the top ten most frequently used bigrams.
#bigrams
lwBigrams <- lastWords %>%
unnest_tokens(output=bigram, input=LastStatement, token = "ngrams", n=2)
lwBigrams %>%
count(bigram, sort = TRUE)
#get rid of stop word in bigrams
library(tidyr)
bigrams_separated <- lwBigrams %>%
separate(bigram, c("word1", "word2"), sep =" ")
bigrams_filtered <- bigrams_separated %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)
#new bigram counts
bigram_counts <- bigrams_filtered %>%
count(word1, word2, sort =TRUE)
#unite words into one column and get rid of NA NA
bigrams_united <- bigram_counts %>%
unite(bigram, word1, word2, sep = " ") %>%
filter(!bigram %in% "NA NA")
#plot top 10 bigrams
bigrams_united %>%
top_n(10) %>%
ggplot(aes(reorder(bigram, n), n)) + geom_col(fill = "#00BA38") + coord_flip() +
labs(y = "Word Pair Frequency", x = "Word Pair",
title = "Frequency of Top 10 Word Pairs") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text=element_text(size=15),
axis.title=element_text(size=20,face="bold"),
plot.title=element_text(size=18,face="bold"))
The bigram, “stay strong” was used 52 times, followed by “God Bless” used 40 times, “death row” used 39 times, and “Jesus Christ” used 31 times. Since the bigrams provided interesting context for the last statements, I expanded the search to trigrams, or groups of three words. I found and graphed the 12 most frequently used trigrams in inmates’ last statements. I chose to show the top 12 words because these were the trigrams used 50 or more times. The trigram that ranks twelfth, “I am ready,” provides interesting context into what 50 prisoners were thinking, or at least saying, at the time of their execution.
#trigrams
lwTrigrams <- lastWords %>%
unnest_tokens(output=trigram, input=LastStatement, token = "ngrams", n=3)
lwTrigrams %>%
count(trigram, sort = TRUE)
#separate words
trigrams_separated <- lwTrigrams %>%
separate(trigram, c("word1", "word2", "word3"), sep =" ")
#plot top trigrams WITH stopwords
trigram_counts_all <- trigrams_separated %>%
count(word1, word2, word3, sort =TRUE)
trigrams_united_all <- trigram_counts_all %>%
unite(trigram, word1, word2, word3, sep = " ") %>%
filter(!trigram %in% "NA NA NA")
trigrams_united_all %>%
top_n(12) %>%
ggplot(aes(reorder(trigram, n), n)) + geom_col(fill = "#00BA38") + coord_flip() +
labs(y = "Trigram Frequency", x = "Trigram",
title = "Frequency of Top 12 Trigrams") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text=element_text(size=15),
axis.title=element_text(size=20,face="bold"),
plot.title=element_text(size=18,face="bold"))
#get rid of stop word in trigrams *not graphed here*
library(tidyr)
trigrams_separated <- lwTrigrams %>%
separate(trigram, c("word1", "word2", "word3"), sep =" ")
trigrams_filtered <- trigrams_separated %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word3 %in% stop_words$word)
The most commonly used trigram, again by far, is “I love you.” I also cleaned the trigram data, which took out phrases like the second most popular trigram, “I would like” but also took out phrases like “I love you” and “I am sorry.” Removing the stop words limited the data so much that the most frequently said trigram was only said 10 times. Among the top cleaned trigrams were: Lord Jesus Christ (10), holy holy holy (7), marching black people (4), and love ya’ll (4).
My hypothesis that the frequency of positive words spoken by death row inmates in their last statements would be at least five percent higher than the frequency of negative words in last statements was proven. Overall, 65 percent of the words used that were assigned sentiments by the bing lexicon were positive and 35 percent were negative. The use of positive words in the inmates’ last statements was 30 percent higher than the use of negative words. I thought the inmates would try to be at least a bit positive in their last statements but was surprised at how positive the language they used was.
My second hypothesis, that the top five most commonly used words would all be assigned a positive sentiment, was partially proven. The top five words – love, family, God, life, and hope, were not all assigned sentiments by bing. Love was proven to be positive, but the other words, though seemingly positive, were not proven positive or negative. These words, however, were assigned feelings of positivity and joy by the NRC lexicon, among other emotions.
This data set also includes race, age, and geographical information and provides ample opportunity for further investigation. In this project, I investigated ties between race/age/time spent on death row and last words used but did not include the results because they were not significantly different. Further areas for research include: does the race of death row offenders frequently match the race of their victims? Are offenders with a previous crime more likely to die on death row? What is the overall gender/ demographic breakdown of inmates on death row and the victims of their crimes? How does language used in the last statements change over time?