Introduction

Is the Death Penalty Still Alive?

If so, for whom?

In a time of political mistrust, the debate over capital punishment is still brewing. With historical data, I hope to reveal the prevalence of capital punishment over time and any biases skewing the results. The data set is large, with over 15,000 executions occurring over almost 400 years, and ranging from firing squads in colonial Jamestown, all the way to the massive media controversies facing the early 2000s.

The scope of this study covers a lot of history, and even more legislature. I am interested to understand the trends in capital punishment over time and how these changes correlate with pivotal events in our human history.

The Focus

To explore this relationship, I will be utilizing data from Executions in the United States, 1608-2002: The ESPY File. This data was collected in partnership with the National Science Federation and the United States Department of Justice. Seeing as the United States, as we know it, did not exist until 1776, the data includes earlier executions in any territories that would later became states.

Key Variables Include:

age
race
gender
occupation
date
method of execution
convicted crime

Analytical Approach

My proposed approach is to first plot the variables of interest by year. These graphs will help to shed light on any biases in the capital punishment system and how changes in legislature through the years has affected the number or method of executions.

I will then take a closer look at the years showing major shifts. The goal here would be to see if I can isolate an event, political or otherwise, that might help explain the results.

Mission

This analysis is intended to help consumers form an opinion on capital punishment based on sound data. The issue has been hotly contested for hundreds of years, meaning there is no shortage of op-ed pieces littering the internet. I hope that my analysis can help consumers, including myself, gain a clearer understanding of capital punishment, without biased interruption.

Ultimately, I would like to understand if there were any biases still present in 2002 and, if so, do they still exist today?

In 2017, capital punishment is still legal in 31 states.

Should it be?

Requirements

Required Packages

The following packages are required in order to run code without errors.

Package Name	Purpose
library(tidyverse)	easy installation of packages
library(ggplot2)	plotting & visualizing data
library(maps)	for geographical data
library(DT)	to create functional tables in HTML
library(knitr)	for dynamic report generation
library(rmarkdown)	to convert R Markdown documents into a variety of formats
library(ggthemes)	to implement theme across report
library(plotly)	for dynamic plotting

library(tidyverse)     # easy installation of packages
library(ggplot2)       # plotting & visualizing data
library(maps)          # for geographical data
library(DT)            # to create functional tables in HTML
library(knitr)         # for dynamic report generation
library(rmarkdown)     # to convert R Markdown documents into a variety of formats
library(ggthemes)      # to implement theme across report
library(plotly)        # for dynamic plotting

Data Import & Prep

Data Import

The data set contains information about executions performed under civil authority in the United States between 1608 and 2002. The data was collected between 1970 and 2002 with the help of records from the State Department of Corrections, newspapers, court proceedings, and historical recordkeepers.

First, we must import the csv file and specify column names. There are several columns that have no relevance for our analysis. I have coded these columns as numbers in order to differentiate them from the variables of interest.

raw_data <- read_csv("raw_data.csv",
                     col_names = c("1", "2",
                     "3", "4",
                     "Race", "Age",
                     "Name", "5",
                     "6", "Conviction",
                     "Method", "7",
                     "8", "Year",
                     "9", "State",
                     "10", "11",
                     "Gender", "12",
                     "Occupation"),
                     skip = 1)

Data Preparation

For our purposes, we will narrow down the data to 9 key variables.

Year
State
Age
Gender
Race
Occupation
Crime
Method

scrubbed = raw_data[,c("Year", "State",
                       "Name", "Age",
                       "Gender", "Race", 
                       "Occupation", "Crime",
                       "Method")]

Key Variables

In order to help round out the data, we will introduce two new categorical variables : Region and Era. This will help us to better visualize geographical and historical trends.

1. Region

Groups states based on geographical regions specified in The US Census.

scrubbed$Region <- ifelse(scrubbed$State %in% c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin"), "East North Central", 

ifelse(scrubbed$State %in% c("Alabama", "Kentucky", "Mississippi", "Tennessee"), "East South Central",   
       
ifelse(scrubbed$State %in% c("New Jersey", "New York", "Pennsylvania"), "Middle Atlantic",
       
ifelse(scrubbed$State %in% c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming"), "Mountain",

ifelse(scrubbed$State %in% c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont"), "New England",
                                                      
ifelse(scrubbed$State %in% c("Alaska", "California", "Hawaii", "Oregon", "Washington"), "Pacific",

ifelse(scrubbed$State %in% c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "Washington, D.C.", "West Virginia"), "South Atlantic",
                                                                    
ifelse(scrubbed$State %in% c("Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota"), "West North Central",
                                                                           
ifelse(scrubbed$State %in% c("Arkansas", "Louisiana", "Oklahoma", "Texas"), "West South Central", "NA")))))))))

2. Era

A somewhat subjective grouping based on US History.

scrubbed$Era <- ifelse(scrubbed$Year < 1630, "Early America", 
                       
ifelse(scrubbed$Year >= 1630 & scrubbed$Year < 1763, "Colonial Period",
       
ifelse(scrubbed$Year >= 1763 & scrubbed$Year < 1783, "Revolutionary Period",
       
ifelse(scrubbed$Year >= 1783 & scrubbed$Year < 1815, "Young Republic",
        
ifelse(scrubbed$Year >= 1815 & scrubbed$Year < 1860, "Expansionary Period",        

ifelse(scrubbed$Year >= 1860 & scrubbed$Year < 1876, "Civil War & Reconstruction",
       
ifelse(scrubbed$Year >= 1876 & scrubbed$Year < 1914, "Second Industrial Revolution",      
       
ifelse(scrubbed$Year >= 1914 & scrubbed$Year < 1933, "WWI & Depression",

ifelse(scrubbed$Year >= 1933 & scrubbed$Year < 1945, "New Deal & WWII", 

ifelse(scrubbed$Year >= 1945 & scrubbed$Year < 1960, "Postwar America",
       
ifelse(scrubbed$Year >= 1960 & scrubbed$Year < 1980, "Vietnam Era",
        
ifelse(scrubbed$Year >= 1980 & scrubbed$Year <= 2002, "Rise of Technology", "NA"))))))))))))

Clean Data!

Data Dictionary

	Data Type	Variable Description
Year	integer	Year of Execution
State	character	State of Execution
Name	character	Name of Offender
Age	integer	Age at Execution
Gender	character	Gender of Offender
Race	character	Race of Offender
Occupation	character	Occupation of Offender
Crime	character	Crime Committed
Method	character	Method of Execution
Region	character	Region of Execution
Era	character	Era of Execution

Capital Punishment Data

Data Subsets

Subsets Based on Key Variables

To easily run reports, we will create subsets.

1. Count of Years

First, we will create a subset based on the frequency of executions by Year. To add another dimension to the data, we will also incorporate our predefined variable Era. We will use this data in conjuction with geom_point to reveal trends in the number of capital punishment executions over time.

count_Years <- scrubbed %>% group_by(Year, Era) %>%
  tally()

2. Count of Method

Next, we will create a subset that includes only the Year and Method variables. We will use this data in conjuction with geom_bar to show the prevalence of methods over time.

Method_Vars <- c("Year", "Method")
count_Method <- scrubbed[Method_Vars]
count_Method <- na.omit(count_Method)

3. Count of Crime

Next, we will create a subset based on the frequency of executions by Crime. We will use this data to assess the most prevalent convictions in capital punishment cases.

count_Crime <- scrubbed %>% group_by(Crime) %>%
  tally()

This variable is different in that some observtions have values of NA. In order to create tidy graphs, we will need to eliminate these records.

count_Crime <- na.omit(count_Crime)

4. Count of State

Next, we will create a subset based on the frequency of executions by State. We will also include region in this grouping as it helps add another dimension to the data. We will use this in conjuction with geom_polygon and geom_bar to reveal trends in capital punishment across the US.

count_State <- scrubbed %>% group_by(State, Region) %>% 
  tally()

5. Count of Gender

Next, we will create a subset based on the frequency of executions by Gender. To round out the data, we will incorporate Year and Method. We will use this in conjuction geom_point to see the breakdown of executions by male and female.

count_Gender <- scrubbed %>% group_by(Year, Gender, Method) %>%
  tally()

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Gender <- na.omit(count_Gender)

6. Count of Age

Next, we will create a subset that includes only the Age and Race variables. We will use this data in conjuction with geom_boxplot to assess the relationship between age and race.

Age_Vars <- c("Age", "Race", "Gender")
count_Age <- scrubbed[Age_Vars]

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Age <- na.omit(count_Age)

7. Count of Race

Next, we will create a subset based on the frequency of executions by Race. Here, we will use geom_point to reveal trends over time.

count_Race <- scrubbed %>% group_by(Year, Race, Era) %>% 
  tally()

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Race <- na.omit(count_Race)

8. Region and Race

Next, we will create a subset that shows Race and Region. To add historical context, we will also use our predefined variable Era. In conjuction with geom_bar, we use the data to show racial biases in capital punishment over time.

We will be creating a facet_grid by era. We will want these facets to show in sequential order. To do so, we will order the historical eras by applying levels.

RR_Vars <- c("Race", "Region", "Era")
Race_Region <- scrubbed[RR_Vars]
Race_Region$Era_order <- factor(Race_Region$Era, levels=c("Early America", "Colonial Period", 
                                  "Revolutionary Period", "Young Republic",
                                  "Expansionary Period", "Civil War & Reconstruction", 
                                  "Second Industrial Revolution", "WWI & Depression", 
                                  "New Deal & WWII","Postwar America", 
                                  "Vietnam Era","Rise of Technology"))

Like we did to achieve count_Race, we will also remove NA values.

Race_Region <- na.omit(Race_Region)

9. State Data

Finally, in order to create a frequency map of executions by state, we will merge our count_State subset with geographical data pulled from the Maps package. Using geom_polygon, we will create a heat map of the US that shows the number of executions by state.

Since our data uses a captital letter to begin each state name, we will need to create a formula to capitalize the first letter of state names in the map data before we can merge successfully.

all_states <- map_data("state")

capFirst <- function(s) {
  paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "")
} 
  
all_states$region <- capFirst(all_states$region) 


colnames(all_states) <- c("long", "lat", "group", "order", "State", "subregion")

stateMap <- merge(all_states, count_State, by="State", all.x=T)
stateMap <- stateMap[order(stateMap$order),]

Visualizations

1. Trends in Capital Punishment Over Time

To evaluate the number of executions over time, we start by plotting key variables by Year. To flesh out the numerical data with a categorical dimension, like era, race, or gender, we will utilize color.

Segmented by Era

By mapping the color aesthetic to our predefined variable, Era, we will add some historical context to the graph. We can gain insights into questions like :

Which period of US History had the greatest number of executions?
How does war affect the number of capital punishment executions?

From this graph, we learn that, after the first capital punishment execution in colonial Jamestown, the number of executions has trended upwards. That is, until WWII. Here, we see a drastic drop in the number of executions until the end of the Vietnam War. However, with the rise of technology and lethal injection, the levels return on an upward trajectory.

Year_and_Era <- 
  ggplot(data = count_Years, aes(x = Year, y = n)) +
  geom_point(aes(color = Era)) +
  theme(legend.position = "bottom") +
  ggtitle("Executions by Year") +
  theme_stata() 

ggplotly(Year_and_Era)

Segmented by Race

By mapping the color aesthetic to our predefined variable, Race, we will add some cultural context to the graph. We can gain insights into questions like :

Does the historical data reveal any racial bias?
Have these biases changed over time?

From this graph, we can see that, in general, the number of executions of black and white inmates follow approximiately the same trendline over time. However, the number of black executions appears to be higher, especially in the years leading up to the 1960s. The graph also reveals one particulary troubling year for Native Americans, with 39 executions tied to the Dakota War of 1862. We can also see the emergence of hispanic executions beginning in the early 20th century.

Year_and_Race <- 
  ggplot(data = count_Race, aes(x = Year, y = n)) +
  geom_point(aes(color = Race)) +
  ggtitle("Executions by Race") +
  theme_stata() 

ggplotly(Year_and_Race)

Segmented by Gender

By mapping the color aesthetic to our predefined variable, Gender, we will add some demographic context to the graph. We can gain insights into questions like :

Are men sentenced to death more often than women?
Have there been any reversals in this trend?

From this graph, we can see that males represent the vast majority of capital punishment executions. The only time this trend has proven untrue was in the case of the Salem Witch trials. In 1692, the US executed a total of 14 women. In the same amount of time, the US executed only one male.

Year_and_Gender <-
  ggplot(data = count_Gender, aes(x = Year, y = n)) +
  geom_point(aes(color = Gender)) +
  ggtitle("Executions by Gender") +
    theme_stata() 

ggplotly(Year_and_Gender)

Segmented by Method

By mapping the color aesthetic to our predefined variable, Method, we will add some situational context to the graph. We can gain insights into questions like :

Have execution methods changed over time?
Has technology influenced these methods?

From this graph, we can see that early Americans utilized a variety of methods. From hanging, to bludgeoning, to burning, there were a diverse range of techniques. However, as technology increased, the methods became more humane and less diversified. By the end of 2002, injection was the clear preferred method.

Year_and_Method <-
  ggplot(count_Method, aes(Year, fill = Method)) +
  geom_bar(position = "stack") +
  ggtitle("Executions by Year & Method",
           subtitle = "From 1608 - 2002") +
  theme_stata()

ggplotly(Year_and_Method)

Segmented by Crime

count_Crime <- count_Crime[order(-count_Crime$n),] %>%
  mutate(percent = n / sum(n))

kable(head(count_Crime))

Crime	n	percent
Murder	8577	0.5782767
Robbery-Murder	2670	0.1800162
Rape	947	0.0638484
Rape-Murder	472	0.0318231
Slave Revolt	277	0.0186758
Housebrkng-Burgl	251	0.0169229

2. Trends in Capital Punishment Across the U.S.

To evaluate the number of executions across the country, we will utilize the count_State subset we created. We will also incorporate our predefined variable Region to uncover if there are any biases present across regions.

Heat Map

By using the Maps package, we can merge our count_State subset with geographic coordinates in order to create a heat map of total executions by state. Using this map, we can gain insights into questions like :

Which states have performed the most executions?
Which states have not performed any executions?

From this map, we can see that Virginia has performed the most executions, followed by Texas and Pennsylvania. We also learn that, with the exception of California, the number of executions decreases as you move west.

ggplot(stateMap, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = n)) +
  geom_path() +
  scale_fill_gradientn(colours=rev("rainbow"(10)),na.value="grey90") +
  coord_map() +
  ggtitle("Executions by State",
          subtitle = "From 1608 - 2002")

Executions by State

We can also use a bar graph to represent the total executions by state.
Again, we can gain insights into questions like :

Which states have performed the most executions?
Which states have not performed any executions?

We can use this bar chart to confirm our findings from the heat map. The two plots agree, and we can confirm that Virginia has performed the most executions, followed by Texas and Pennsylvania. We also see that states in the South Atlantic, Middle Atlantic, and West South Central regions, have higher execution numbers. This contributes to the theory that, with the exception of California, capital punihsment numbers decrease as you move west.

ggplot(count_State, aes(x = reorder(State, n), y = n, fill = Region)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme(axis.title = element_blank()) +
  ggtitle("Executions by State",
          subtitle = "From 1608 - 2002")

Executions by Region

We can also represent the frequency of executions based on region. This will help us to understand if there are trends in capital punishment trends across states in close geographic relation.

We can use this to answer questions like :

Which regions stand out in terms of number of executions?
Which regions have a low incidence of executions?

We can use this facetted bar chart to confirm our findings from the other two graphs. The three plots agree, states in the South Atlantic, Middle Atlantic, and West South Central regions, have higher execution numbers. Once again, this contributes to the theory that, with the exception of California, capital punihsment numbers decrease as you move west.

ggplot(count_State, aes(x = reorder(State, n), y = n, fill = Region)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  facet_grid(~Region, labeller = label_wrap_gen(width=.1)) +
  theme(text = element_text(size = 22), axis.title.y = element_blank()) +
  ggtitle("Executions by Region",
          subtitle = "From 1608 - 2002")

3. Trends in Capital Punishment Across Race

To shine a light on racial bias, we will look at the number of executions by race. Ultimately, we will break this down further by Era, Region, and Age.

Segmented by Era

ggplot(Facet_Race_Region, aes(Race, fill = Race)) +
  geom_bar(position = "stack") +
  ggtitle("Executions by Race & Region",
          subtitle = "From 1608 - 2002") +
  facet_grid(Race~Era_order, scales = "free", labeller = label_wrap_gen(width=.1)) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        text = element_text(size = 22))

Segmented by Region

Finally, we will look at trends in executions by race in conjuction with region.

ggplot(Race_Region, aes(Region, fill = Race)) +
  geom_bar(position = "stack") +
  ggtitle("Executions by Race & Region",
          subtitle = "From 1608 - 2002") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Segmented by Age

ggplot(count_Age, aes(x=Race, y=Age)) +
  geom_boxplot(aes(color = Race))

Capital Punishment in America

Introduction

Is the Death Penalty Still Alive?

If so, for whom?

The Focus

Analytical Approach

Mission

Requirements

Required Packages

Data Import & Prep

Data Import

Data Preparation

Key Variables

1. Region

2. Era

Clean Data!

Data Dictionary

Capital Punishment Data

Data Subsets

Subsets Based on Key Variables

1. Count of Years

2. Count of Method

3. Count of Crime

4. Count of State

5. Count of Gender

6. Count of Age

7. Count of Race

8. Region and Race

9. State Data

Visualizations

1. Trends in Capital Punishment Over Time

Segmented by Era

Segmented by Race

Segmented by Gender

Segmented by Method

Segmented by Crime

2. Trends in Capital Punishment Across the U.S.

Heat Map

Executions by State

Executions by Region

3. Trends in Capital Punishment Across Race

Segmented by Era

Segmented by Region

Segmented by Age

Conclusions