Introduction

Is the Death Penalty Still Alive?

If so, for whom?

In a time of political mistrust, the debate over capital punishment is still brewing. With historical data, I hope to reveal the prevalence of capital punishment over time and any biases skewing the results. The data set is large, with over 15,000 executions occurring over almost 400 years, and ranging from firing squads in colonial Jamestown, all the way to the massive media controversies facing the early 2000s.

The scope of this study covers a lot of history, and even more legislature. I am interested to understand the trends in capital punishment over time and how these changes correlate with pivotal events in our human history.

The Focus

To explore this relationship, I will be utilizing data from Executions in the United States, 1608-2002: The ESPY File. This data was collected in partnership with the National Science Federation and the United States Department of Justice. Seeing as the United States, as we know it, did not exist until 1776, the data includes earlier executions in any territories that would later became states.

Key Variables Include:

age
race
gender
occupation
date
method of execution
convicted crime

Analytical Approach

My proposed approach is to first plot the variables of interest by year. These graphs will help to shed light on any biases in the capital punishment system and how changes in legislature through the years has affected the number or method of executions.

I will then take a closer look at the years showing major shifts. The goal here would be to see if I can isolate an event, political or otherwise, that might help explain the results.

Mission

This analysis is intended to help consumers form an opinion on capital punishment based on sound data. The issue has been hotly contested for hundreds of years, meaning there is no shortage of op-ed pieces littering the internet. I hope that my analysis can help consumers, including myself, gain a clearer understanding of capital punishment, without biased interruption.

Ultimately, I would like to understand if there were any biases still present in 2002 and, if so, do they still exist today?

In 2017, capital punishment is still legal in 31 states.

Should it be?

Requirements

Required Packages

The following packages are required in order to run code without errors.

Package Name	Purpose
library(tidyverse)	easy installation of packages
library(readr)	to easily import delimited data
library(maps)	for geographical data
library(mapproj)	to convert latitude/longitude into projected coordinates
library(DT)	to create functional tables in HTML
library(knitr)	for dynamic report generation
library(rmarkdown)	to convert R Markdown documents into a variety of formats
library(ggthemes)	to implement theme across report
library(plotly)	for dynamic plotting

# to preload necessary packages 

list.of.packages <- c("tidyverse", "readr", "maps", "DT", "knitr", "rmarkdown", "ggthemes", "plotly", "mapproj")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org" )

library(tidyverse)     # easy installation of packages
library(readr)         # to easily import delimited data
library(maps)          # for geographical data
library(mapproj)       # to convert latitude/longitude into projected coordinates
library(DT)            # to create functional tables in HTML
library(knitr)         # for dynamic report generation
library(rmarkdown)     # to convert R Markdown documents into a variety of formats
library(ggthemes)      # to implement theme across report
library(plotly)        # for dynamic plotting

Data Import & Prep

Data Import

The data set contains information about executions performed under civil authority in the United States between 1608 and 2002. The data was collected between 1970 and 2002 with the help of records from the State Department of Corrections, newspapers, court proceedings, and historical recordkeepers.

First, we must import the csv file and specify column names. There are several columns that have no relevance for our analysis. I have coded these columns as numbers in order to differentiate them from the variables of interest.

# to read in CSV

raw_data <- read_csv("raw_data.csv",
                     col_names = c("1", "2",
                     "3", "4",
                     "Race", "Age",
                     "Name", "5",
                     "6", "Conviction",
                     "Method", "7",
                     "8", "Year",
                     "9", "State",
                     "10", "11",
                     "Gender", "12",
                     "Occupation"),
                     skip = 1)

Data Preparation

For our purposes, we will narrow down the data to 9 key variables.

Year
State
Age
Gender
Race
Crime
Method

# to save select variables to new subset

scrubbed = raw_data[,c("Year", "State",
                       "Name", "Age",
                       "Gender", "Race", 
                       "Occupation", "Crime",
                       "Method")]

Key Variables

In order to help round out the data, we will introduce two new categorical variables : Region and Era. This will help us to better visualize geographical and historical trends.

1. Region

Groups states based on geographical regions specified in The US Census.

# to group states by region

scrubbed$Region <- ifelse(scrubbed$State %in% c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin"), "East North Central", 

ifelse(scrubbed$State %in% c("Alabama", "Kentucky", "Mississippi", "Tennessee"), "East South Central",   
       
ifelse(scrubbed$State %in% c("New Jersey", "New York", "Pennsylvania"), "Middle Atlantic",
       
ifelse(scrubbed$State %in% c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming"), "Mountain",

ifelse(scrubbed$State %in% c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont"), "New England",
                                                      
ifelse(scrubbed$State %in% c("Alaska", "California", "Hawaii", "Oregon", "Washington"), "Pacific",

ifelse(scrubbed$State %in% c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "Washington, D.C.", "West Virginia"), "South Atlantic",
                                                                    
ifelse(scrubbed$State %in% c("Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota"), "West North Central",
                                                                           
ifelse(scrubbed$State %in% c("Arkansas", "Louisiana", "Oklahoma", "Texas"), "West South Central", "NA")))))))))

2. Era

A somewhat subjective grouping based on US History.

# to group years by era

scrubbed$Era <- ifelse(scrubbed$Year < 1630, "Early America", 
                       
ifelse(scrubbed$Year >= 1630 & scrubbed$Year < 1763, "Colonial Period",
       
ifelse(scrubbed$Year >= 1763 & scrubbed$Year < 1783, "Revolutionary Period",
       
ifelse(scrubbed$Year >= 1783 & scrubbed$Year < 1815, "Young Republic",
        
ifelse(scrubbed$Year >= 1815 & scrubbed$Year < 1860, "Expansionary Period",        

ifelse(scrubbed$Year >= 1860 & scrubbed$Year < 1876, "Civil War & Reconstruction",
       
ifelse(scrubbed$Year >= 1876 & scrubbed$Year < 1914, "Second Industrial Revolution",      
       
ifelse(scrubbed$Year >= 1914 & scrubbed$Year < 1933, "WWI & Depression",

ifelse(scrubbed$Year >= 1933 & scrubbed$Year < 1945, "New Deal & WWII", 

ifelse(scrubbed$Year >= 1945 & scrubbed$Year < 1960, "Postwar America",
       
ifelse(scrubbed$Year >= 1960 & scrubbed$Year < 1980, "Vietnam Era",
        
ifelse(scrubbed$Year >= 1980 & scrubbed$Year <= 2002, "Rise of Technology", "NA"))))))))))))

Clean Data!

Data Dictionary

	Data Type	Variable Description
Year	integer	Year of Execution
State	character	State of Execution
Name	character	Name of Offender
Age	integer	Age at Execution
Gender	character	Gender of Offender
Race	character	Race of Offender
Occupation	character	Occupation of Offender
Crime	character	Crime Committed
Method	character	Method of Execution
Region	character	Region of Execution
Era	character	Era of Execution

Capital Punishment Data

Data Subsets

Subsets Based on Key Variables

To easily run reports, we will create subsets.

1. Count of Years

First, we will create a subset based on the frequency of executions by Year. To add another dimension to the data, we will also incorporate our predefined variable Era. We will use this data in conjuction with geom_point to reveal trends in the number of capital punishment executions over time.

# count of executions by year & era
                        
count_Years <- scrubbed %>% group_by(Year, Era) %>%
  tally()

2. Count of Method

Next, we will create a subset that includes only the Year and Method variables. We will use this data in conjuction with geom_bar to show the prevalence of methods over time.

# to save variables year & method to new subset

Method_Vars <- c("Year", "Method")
count_Method <- scrubbed[Method_Vars]
count_Method <- na.omit(count_Method)

3. Count of Crime

Next, we will create a subset based on the frequency of executions by Crime. We will use this data to assess the most prevalent convictions in capital punishment cases.

# count of executions by crime

count_Crime <- scrubbed %>% group_by(Crime) %>%
  tally()

This variable is different in that some observtions have values of NA. In order to create tidy graphs, we will need to eliminate these records.

count_Crime <- na.omit(count_Crime)

4. Count of State

Next, we will create a subset based on the frequency of executions by State. We will also include region in this grouping as it helps add another dimension to the data. We will use this in conjuction with geom_polygon and geom_bar to reveal trends in capital punishment across the US.

# count of executions by state & region

count_State <- scrubbed %>% group_by(State, Region) %>% 
  tally()

5. Count of Gender

Next, we will create a subset based on the frequency of executions by Gender. To round out the data, we will incorporate Year and Method. We will use this in conjuction geom_point to see the breakdown of executions by male and female.

# count of executions by year, gender, & method

count_Gender <- scrubbed %>% group_by(Year, Gender, Method) %>%
  tally()

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Gender <- na.omit(count_Gender)

6. Count of Age

Next, we will create a subset that includes only the Age and Race variables. We will use this data in conjuction with geom_boxplot to assess the relationship between age and race.

# to save variables age, race, & gender to new subset

Age_Vars<- c("Age", "Race", "Gender")
count_Age <- scrubbed[Age_Vars]

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Age <- na.omit(count_Age)

7. Count of Race

Next, we will create a subset based on the frequency of executions by Race. Here, we will use geom_point to reveal trends over time.

# count of exeuctions by year, era, and race

count_Race <- scrubbed %>% group_by(Year, Race, Era) %>% 
  tally()

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Race <- na.omit(count_Race)

8. Region and Race

Next, we will create a subset that shows Race and Region. To add historical context, we will also use our predefined variable Era. In conjuction with geom_bar, we use the data to show racial biases in capital punishment over time.

We will be creating a facet_grid by era. We will want these facets to show in sequential order. To do so, we will order the historical eras by applying levels.

# to save variables race, region, & era to new subset

RR_Vars <- c("Race", "Region", "Era")
Race_Region <- scrubbed[RR_Vars]



# to order variables for faceted bar chart

Race_Region$Era_order <- factor(Race_Region$Era, levels=c("Early America", "Colonial Period", 
                                  "Revolutionary Period", "Young Republic",
                                  "Expansionary Period", "Civil War & Reconstruction", 
                                  "Second Industrial Revolution", "WWI & Depression", 
                                  "New Deal & WWII","Postwar America", 
                                  "Vietnam Era","Rise of Technology"))

Like we did to achieve count_Race, we will also remove NA values.

Race_Region <- na.omit(Race_Region)

9. State Data

Finally, in order to create a frequency map of executions by state, we will merge our count_State subset with geographical data pulled from the Maps package. Using geom_polygon, we will create a heat map of the US that shows the number of executions by state.

Since our data uses a captital letter to begin each state name, we will need to create a formula to capitalize the first letter of state names in the map data before we can merge successfully.

# to save longitude/latitude data

all_states <- map_data("state")



# to capitalize first letter of state

capFirst <- function(s) {
  paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "")
} 
  
all_states$region <- capFirst(all_states$region)


colnames(all_states) <- c("long", "lat", "group", "order", "State", "subregion")



# to merge state data with count data

stateMap <- merge(all_states, count_State, by="State", all.x=T)
stateMap <- stateMap[order(stateMap$order),]

Visualizations

1. Trends in Capital Punishment Over Time

To evaluate the number of executions over time, we start by plotting key variables by Year. To flesh out the numerical data with a categorical dimension, like era, race, or gender, we will utilize color.

Segmented by Era

By mapping the color aesthetic to our predefined variable, Era, we will add some historical context to the graph. We can gain insights into questions like :

Which period of US History had the greatest number of executions?
How does war affect the number of capital punishment executions?

From this graph, we learn that, after the first capital punishment execution in colonial Jamestown, the number of executions has trended upwards. That is, until WWII. Here, we see a drastic drop in the number of executions until the end of the Vietnam War. However, with the rise of technology and lethal injection, the levels return on an upward trajectory.

# executions by year (era)

Year_and_Era <- 
  ggplot(data = count_Years, aes(x = Year, y = n)) +
  geom_point(aes(color = Era)) +
  theme(legend.position = "bottom") +
  ggtitle("Executions by Year") +
  theme_stata()

ggplotly(Year_and_Era)

Segmented by Race

By mapping the color aesthetic to our predefined variable, Race, we will add some cultural context to the graph. We can gain insights into questions like :

Does the historical data reveal any racial bias?
Have these biases changed over time?

From this graph, we can see that, in general, the number of executions of black and white inmates follow approximiately the same trendline over time. However, the number of black executions appears to be higher, especially in the years leading up to the 1960s. The graph also reveals one particulary troubling year for Native Americans, with 39 executions tied to the Dakota War of 1862. We can also see the emergence of hispanic executions beginning in the early 20th century.

# executions by year (race)

Year_and_Race <- 
  ggplot(data = count_Race, aes(x = Year, y = n)) +
  geom_point(aes(color = Race)) +
  ggtitle("Executions by Race") +
  theme_stata() 

ggplotly(Year_and_Race)

Segmented by Gender

By mapping the color aesthetic to our predefined variable, Gender, we will add some demographic context to the graph. We can gain insights into questions like :

Are men sentenced to death more often than women?
Have there been any reversals in this trend?

From this graph, we can see that males represent the vast majority of capital punishment executions. The only time this trend has proven untrue was in the case of the Salem Witch trials. In 1692, the US executed a total of 14 women. In the same amount of time, the US executed only one male.

# executions by year (gender)

Year_and_Gender <-
  ggplot(data = count_Gender, aes(x = Year, y = n)) +
  geom_point(aes(color = Gender)) +
  ggtitle("Executions by Gender") +
    theme_stata() 

ggplotly(Year_and_Gender)

Segmented by Method

By mapping the color aesthetic to our predefined variable, Method, we will add some situational context to the graph. We can gain insights into questions like :

Have execution methods changed over time?
Has technology influenced these methods?

From this graph, we can see that early Americans utilized a variety of methods. From hanging, to bludgeoning, to burning, there were a diverse range of techniques. However, as technology increased, the methods became more humane and less diversified. By the end of 2002, injection was the clear preferred method.

# executions by year (method)

Year_and_Method <-
  ggplot(count_Method, aes(Year, fill = Method)) +
  geom_bar(position = "stack") +
  ggtitle("Executions by Year & Method",
           subtitle = "From 1608 - 2002") +
  theme_stata()

ggplotly(Year_and_Method)

Segmented by Crime

Which crimes have led to the most executions?

From this table, we can see that over 50% of capital punishment executions are tied to murder convictions. If we include convictions like Robbery-Murder and Rape-Murder, the percent of executions increases to about 75%. We can also see the racial bias stemming from the Civil War era. Of the 15k + observations, from 1608 to 2002, slave revolts account for 2% of the executions.

Number of Capital Punishment Executions by Crime

# table of executions by crime

count_Crime <- count_Crime[order(-count_Crime$n),] %>%
  mutate(percent = n / sum(n))

kable(head(count_Crime))

Crime	n	percent
Murder	8577	0.5782767
Robbery-Murder	2670	0.1800162
Rape	947	0.0638484
Rape-Murder	472	0.0318231
Slave Revolt	277	0.0186758
Housebrkng-Burgl	251	0.0169229

2. Trends in Capital Punishment Across the U.S.

To evaluate the number of executions across the country, we will utilize the count_State subset we created. We will also incorporate our predefined variable Region to uncover if there are any biases present across regions.

Heat Map

By using the Maps package, we can merge our count_State subset with geographic coordinates in order to create a heat map of total executions by state. Using this map, we can gain insights into questions like :

Which states have performed the most executions?
Which states have not performed any executions?

From this map, we can see that Virginia has performed the most executions, followed by Texas and Pennsylvania. We also learn that, with the exception of California, the number of executions decreases as you move west.

# frequency map of total executions by state

ggplot(stateMap, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = n)) +
  geom_path() +
  scale_fill_gradientn(colours=rev("rainbow"(10)),na.value="grey90") +
  coord_map() +
  ggtitle("Executions by State",
          subtitle = "From 1608 - 2002")

Executions by State

We can also use a bar graph to represent the total executions by state.
Again, we can gain insights into questions like :

Which states have performed the most executions?
Which states have not performed any executions?

We can use this bar chart to confirm our findings from the heat map. The two plots agree, and we can confirm that Virginia has performed the most executions, followed by Texas and Pennsylvania. We also see that states in the South Atlantic, Middle Atlantic, and West South Central regions, have higher execution numbers. This contributes to the theory that, with the exception of California, capital punishment numbers decrease as you move west.

# bar graph of total executions by state

ggplot(count_State, aes(x = reorder(State, n), y = n, fill = Region)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme(axis.title = element_blank()) +
  ggtitle("Executions by State",
          subtitle = "From 1608 - 2002")

Executions by Region

We can also represent the frequency of executions based on region. This will help us to understand if there are trends in capital punishment trends across states in close geographic relation.

We can use this to answer questions like :

Which regions stand out in terms of number of executions?
Which regions have a low incidence of executions?

We can use this faceted bar chart to confirm our findings from the other two graphs. The three plots agree, states in the South Atlantic, Middle Atlantic, and West South Central regions, have higher execution numbers. Once again, this contributes to the theory that, with the exception of California, capital punihsment numbers decrease as you move west.

# faceted bar graph to show executions by region

ggplot(count_State, aes(x = reorder(State, n), y = n, fill = Region)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  facet_grid(~Region, labeller = label_wrap_gen(width=.1)) +
  theme(text = element_text(size = 22), axis.title.y = element_blank()) +
  ggtitle("Executions by Region",
          subtitle = "From 1608 - 2002")

3. Trends in Capital Punishment Across Race

To shine a light on racial bias, we will look at the number of executions by race. Ultimately, we will break this down further by Era, Region, and Age.

Segmented by Era

We can use a faceted bar chart to represent the frequency of executions by race across different eras. This will help us to understand the presence of racial biases across notable periods of US history.

We can use this to answer questions like :

Which races have the most executions?
Have these levels changed over time?

Across the board, we see that the Second Industrial Revolution had the highest number of executions and the execution levels tend to drop off in times of War. For Native Americans, we can see that execution numbers increase in the expansionary period, as americans begin to explore the Wild West. For Asian-Pacific Islanders, we see executions begin in the expansionary period and increase dramatically during the Second Industrial Revolution. Though the total number of Hispanic executions is relatively low, we can see a potential uptick during the Rise of Technology. As for African Americans, we can see biases start developing in the colonial period and continue to build during the Expansionary period. These numbers drop back off during the Civil War and then increase dramatically during the Second Industrial Revolution. From WWI through the end of WWII, the levels of African American executions remain evelated above the levels of other races.

# faceted bar graph to show executions by race & region

ggplot(Facet_Race_Region, aes(Race, fill = Race)) +
  geom_bar(position = "stack") +
  ggtitle("Executions by Race & Region",
          subtitle = "From 1608 - 2002") +
  facet_grid(Race~Era_order, scales = "free", labeller = label_wrap_gen(width=.1)) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        text = element_text(size = 22))

Segmented by Region

We can use a standard bar chart to represent the frequency of executions by race across different regions. This will help us to understand the presence of racial biases across the United States.

We can use this to answer questions like :

Which regions have the most executions?
Do these numbers reveal any racial biases?

Here, we can easily see the racial bias against African Americans in the South Atlantic region. Not only has the South Atlantic region performed the most executions, but the majority of these executions were carried out on African American individuals. We can also see that the western regions, including the Mountain, Pacific, and West South Central regions, have higher rates of executions carried out on Native Amerian, Hispanic, and Asian-Pacific Islander individuals.

# bar chart to show executions by region & race

ggplot(Race_Region, aes(Region, fill = Race)) +
  geom_bar(position = "stack") +
  ggtitle("Executions by Race & Region",
          subtitle = "From 1608 - 2002") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Segmented by Age

Finally, we can use a box plot to evaluate the distribution of ages across race.

We can use this to answer questions like :

What is the average age of individuals at the time of execution?
Is this age consistent across races?

By using the box plot, we can see that the median age at time of execution for White and Asian-Pacific Island individuals is around 30. For African American, Hispanic, and Native American individuals, the median age at time of execution is lower, at around 25. We can also use this graph to assess variability and outliers within each race. We can see that there are significant outliers in the African American and White box plots.

# boxplot to show distribution of age by race

ggplot(count_Age, aes(x=Race, y=Age)) +
  geom_boxplot(aes(color = Race)) +
  ggtitle("Age at Time of Execution by Race")

Conclusions

Problem Statement

This analysis is intended to help readers form an opinion on capital punishment based on sound data. Supported by graphical representation, the focus of this analyis is the prevalence of capital punishment over time and any biases skewing the results.

Methodology

In order to gain an clearer understanding of trends in capital punishment over time, we started by graphing the number of executions by year. After seeing the overall trend, we used categorical variables to add context to the graphs. Then, to gauge trends by geographical region, we looked at executions across the United States. Finally, we honed in on the interaction between race and region.

Insights

In general, the number of executions decreases during times of war
Due to advances in technology, methods have become increasingly more humane overtime
The east coast has a long history with capital punishment beginning in colonial Virginia
With the exception of California, the number of executions tends to decrease as you move West
Trends in racial prejudice vary from region to region

Implications

This analyis can be used to gain an understanding of how events in history have affected capital punishment levels. We can see that the number of executions decreases in times of war. This trend is particularly visible during the Vietnam War, a time of anti-war marches and protest songs. The loss of American soldiers on the battlefield seems to have a palpable effect on the usage of capital punishment. We can also see that racial prejudice against African Americans, cultivated during the Civil War, has persisted and that a new bias, against the Hispanic population, may have emerged.

Limitations

This analyis was limited by the lack of trial data. We know that racial prejudices have affect trends in capital punishment over time. Next, I would like to incorporate data from Capital Punishment trials to reveal if the same biases have led to convictions. To do so, we would bring in variables pertaining to the victim, like race, gender, and age.

Capital Punishment in America

Introduction

Is the Death Penalty Still Alive?

If so, for whom?

The Focus

Analytical Approach

Mission

Requirements

Required Packages

Data Import & Prep

Data Import

Data Preparation

Key Variables

1. Region

2. Era

Clean Data!

Data Dictionary

Capital Punishment Data

Data Subsets

Subsets Based on Key Variables

1. Count of Years

2. Count of Method

3. Count of Crime

4. Count of State

5. Count of Gender

6. Count of Age

7. Count of Race

8. Region and Race

9. State Data

Visualizations

1. Trends in Capital Punishment Over Time

Segmented by Era

Segmented by Race

Segmented by Gender

Segmented by Method

Segmented by Crime

2. Trends in Capital Punishment Across the U.S.

Heat Map

Executions by State

Executions by Region

3. Trends in Capital Punishment Across Race

Segmented by Era

Segmented by Region

Segmented by Age

Conclusions

Problem Statement

Methodology

Insights

Implications

Limitations