Capital Punishment in America

Are We Biased?

Introduction

Is the Death Penalty Still Alive?

If so, for whom?

In a time of political mistrust, the debate over capital punishment is still brewing. With historical data, I hope to reveal the prevalence of capital punishment over time and any biases skewing the results. The data set is large, with over 15,000 executions occurring over almost 400 years, and ranging from firing squads in colonial Jamestown, all the way to the massive media controversies facing the early 2000s.

The scope of this study covers a lot of history, and even more legislature. I am interested to understand the trends in capital punishment over time and how these changes correlate with pivotal events in our human history.

The Focus

To explore this relationship, I will be utilizing data from Executions in the United States, 1608-2002: The ESPY File. This data was collected in partnership with the National Science Federation and the United States Department of Justice. Seeing as the United States, as we know it, did not exist until 1776, the data includes earlier executions in any territories that would later became states.

Key Variables Include:

  • age
  • race
  • gender
  • occupation
  • date
  • method of execution
  • convicted crime

Analytical Approach

My proposed approach is to first plot the variables of interest by year. These graphs will help to shed light on any biases in the capital punishment system and how changes in legislature through the years has affected the number or method of executions.

I will then take a closer look at the years showing major shifts. The goal here would be to see if I can isolate an event, political or otherwise, that might help explain the results.

Mission

This analysis is intended to help consumers form an opinion on capital punishment based on sound data. The issue has been hotly contested for hundreds of years, meaning there is no shortage of op-ed pieces littering the internet. I hope that my analysis can help consumers, including myself, gain a clearer understanding of capital punishment, without biased interruption.

Ultimately, I would like to understand if there were any biases still present in 2002 and, if so, do they still exist today?


In 2017, capital punishment is still legal in 31 states.

Should it be?

Requirements

Required Packages

The following packages are required in order to run code without errors.


Package Name Purpose
library(tidyverse) easy installation of packages
library(readr) to easily import delimited data
library(maps) for geographical data
library(mapproj) to convert latitude/longitude into projected coordinates
library(DT) to create functional tables in HTML
library(knitr) for dynamic report generation
library(rmarkdown) to convert R Markdown documents into a variety of formats
library(ggthemes) to implement theme across report
library(plotly) for dynamic plotting
# to preload necessary packages 

list.of.packages <- c("tidyverse", "readr", "maps", "DT", "knitr", "rmarkdown", "ggthemes", "plotly", "mapproj")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org" )
library(tidyverse)     # easy installation of packages
library(readr)         # to easily import delimited data
library(maps)          # for geographical data
library(mapproj)       # to convert latitude/longitude into projected coordinates
library(DT)            # to create functional tables in HTML
library(knitr)         # for dynamic report generation
library(rmarkdown)     # to convert R Markdown documents into a variety of formats
library(ggthemes)      # to implement theme across report
library(plotly)        # for dynamic plotting

Data Import & Prep

Data Import

The data set contains information about executions performed under civil authority in the United States between 1608 and 2002. The data was collected between 1970 and 2002 with the help of records from the State Department of Corrections, newspapers, court proceedings, and historical recordkeepers.

First, we must import the csv file and specify column names. There are several columns that have no relevance for our analysis. I have coded these columns as numbers in order to differentiate them from the variables of interest.

# to read in CSV

raw_data <- read_csv("raw_data.csv",
                     col_names = c("1", "2",
                     "3", "4",
                     "Race", "Age",
                     "Name", "5",
                     "6", "Conviction",
                     "Method", "7",
                     "8", "Year",
                     "9", "State",
                     "10", "11",
                     "Gender", "12",
                     "Occupation"),
                     skip = 1)

Data Preparation

For our purposes, we will narrow down the data to 9 key variables.

  • Year
  • State
  • Age
  • Gender
  • Race
  • Crime
  • Method
# to save select variables to new subset

scrubbed = raw_data[,c("Year", "State",
                       "Name", "Age",
                       "Gender", "Race", 
                       "Occupation", "Crime",
                       "Method")]

Key Variables

In order to help round out the data, we will introduce two new categorical variables : Region and Era. This will help us to better visualize geographical and historical trends.

1. Region

Groups states based on geographical regions specified in The US Census.

# to group states by region

scrubbed$Region <- ifelse(scrubbed$State %in% c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin"), "East North Central", 

ifelse(scrubbed$State %in% c("Alabama", "Kentucky", "Mississippi", "Tennessee"), "East South Central",   
       
ifelse(scrubbed$State %in% c("New Jersey", "New York", "Pennsylvania"), "Middle Atlantic",
       
ifelse(scrubbed$State %in% c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming"), "Mountain",

ifelse(scrubbed$State %in% c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont"), "New England",
                                                      
ifelse(scrubbed$State %in% c("Alaska", "California", "Hawaii", "Oregon", "Washington"), "Pacific",

ifelse(scrubbed$State %in% c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "Washington, D.C.", "West Virginia"), "South Atlantic",
                                                                    
ifelse(scrubbed$State %in% c("Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota"), "West North Central",
                                                                           
ifelse(scrubbed$State %in% c("Arkansas", "Louisiana", "Oklahoma", "Texas"), "West South Central", "NA")))))))))

2. Era

A somewhat subjective grouping based on US History.

# to group years by era

scrubbed$Era <- ifelse(scrubbed$Year < 1630, "Early America", 
                       
ifelse(scrubbed$Year >= 1630 & scrubbed$Year < 1763, "Colonial Period",
       
ifelse(scrubbed$Year >= 1763 & scrubbed$Year < 1783, "Revolutionary Period",
       
ifelse(scrubbed$Year >= 1783 & scrubbed$Year < 1815, "Young Republic",
        
ifelse(scrubbed$Year >= 1815 & scrubbed$Year < 1860, "Expansionary Period",        

ifelse(scrubbed$Year >= 1860 & scrubbed$Year < 1876, "Civil War & Reconstruction",
       
ifelse(scrubbed$Year >= 1876 & scrubbed$Year < 1914, "Second Industrial Revolution",      
       
ifelse(scrubbed$Year >= 1914 & scrubbed$Year < 1933, "WWI & Depression",

ifelse(scrubbed$Year >= 1933 & scrubbed$Year < 1945, "New Deal & WWII", 

ifelse(scrubbed$Year >= 1945 & scrubbed$Year < 1960, "Postwar America",
       
ifelse(scrubbed$Year >= 1960 & scrubbed$Year < 1980, "Vietnam Era",
        
ifelse(scrubbed$Year >= 1980 & scrubbed$Year <= 2002, "Rise of Technology", "NA"))))))))))))                     

Clean Data!

Data Dictionary

Data Type Variable Description
Year integer Year of Execution
State character State of Execution
Name character Name of Offender
Age integer Age at Execution
Gender character Gender of Offender
Race character Race of Offender
Occupation character Occupation of Offender
Crime character Crime Committed
Method character Method of Execution
Region character Region of Execution
Era character Era of Execution

Capital Punishment Data

Data Subsets

Subsets Based on Key Variables

To easily run reports, we will create subsets.

1. Count of Years

First, we will create a subset based on the frequency of executions by Year. To add another dimension to the data, we will also incorporate our predefined variable Era. We will use this data in conjuction with geom_point to reveal trends in the number of capital punishment executions over time.

# count of executions by year & era
                        
count_Years <- scrubbed %>% group_by(Year, Era) %>%
  tally()

2. Count of Method

Next, we will create a subset that includes only the Year and Method variables. We will use this data in conjuction with geom_bar to show the prevalence of methods over time.

# to save variables year & method to new subset

Method_Vars <- c("Year", "Method")
count_Method <- scrubbed[Method_Vars]
count_Method <- na.omit(count_Method)

3. Count of Crime

Next, we will create a subset based on the frequency of executions by Crime. We will use this data to assess the most prevalent convictions in capital punishment cases.

# count of executions by crime

count_Crime <- scrubbed %>% group_by(Crime) %>%
  tally()

This variable is different in that some observtions have values of NA. In order to create tidy graphs, we will need to eliminate these records.

count_Crime <- na.omit(count_Crime)

4. Count of State

Next, we will create a subset based on the frequency of executions by State. We will also include region in this grouping as it helps add another dimension to the data. We will use this in conjuction with geom_polygon and geom_bar to reveal trends in capital punishment across the US.

# count of executions by state & region

count_State <- scrubbed %>% group_by(State, Region) %>% 
  tally()

5. Count of Gender

Next, we will create a subset based on the frequency of executions by Gender. To round out the data, we will incorporate Year and Method. We will use this in conjuction geom_point to see the breakdown of executions by male and female.

# count of executions by year, gender, & method

count_Gender <- scrubbed %>% group_by(Year, Gender, Method) %>%
  tally()

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Gender <- na.omit(count_Gender)

6. Count of Age

Next, we will create a subset that includes only the Age and Race variables. We will use this data in conjuction with geom_boxplot to assess the relationship between age and race.

# to save variables age, race, & gender to new subset

Age_Vars<- c("Age", "Race", "Gender")
count_Age <- scrubbed[Age_Vars]

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Age <- na.omit(count_Age)

7. Count of Race

Next, we will create a subset based on the frequency of executions by Race. Here, we will use geom_point to reveal trends over time.

# count of exeuctions by year, era, and race

count_Race <- scrubbed %>% group_by(Year, Race, Era) %>% 
  tally()

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Race <- na.omit(count_Race)

8. Region and Race

Next, we will create a subset that shows Race and Region. To add historical context, we will also use our predefined variable Era. In conjuction with geom_bar, we use the data to show racial biases in capital punishment over time.

We will be creating a facet_grid by era. We will want these facets to show in sequential order. To do so, we will order the historical eras by applying levels.

# to save variables race, region, & era to new subset

RR_Vars <- c("Race", "Region", "Era")
Race_Region <- scrubbed[RR_Vars]



# to order variables for faceted bar chart

Race_Region$Era_order <- factor(Race_Region$Era, levels=c("Early America", "Colonial Period", 
                                  "Revolutionary Period", "Young Republic",
                                  "Expansionary Period", "Civil War & Reconstruction", 
                                  "Second Industrial Revolution", "WWI & Depression", 
                                  "New Deal & WWII","Postwar America", 
                                  "Vietnam Era","Rise of Technology"))

Like we did to achieve count_Race, we will also remove NA values.

Race_Region <- na.omit(Race_Region)

9. State Data

Finally, in order to create a frequency map of executions by state, we will merge our count_State subset with geographical data pulled from the Maps package. Using geom_polygon, we will create a heat map of the US that shows the number of executions by state.

Since our data uses a captital letter to begin each state name, we will need to create a formula to capitalize the first letter of state names in the map data before we can merge successfully.

# to save longitude/latitude data

all_states <- map_data("state")



# to capitalize first letter of state

capFirst <- function(s) {
  paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "")
} 
  
all_states$region <- capFirst(all_states$region)


colnames(all_states) <- c("long", "lat", "group", "order", "State", "subregion")



# to merge state data with count data

stateMap <- merge(all_states, count_State, by="State", all.x=T)
stateMap <- stateMap[order(stateMap$order),]

Visualizations

Conclusions

Problem Statement

This analysis is intended to help readers form an opinion on capital punishment based on sound data. Supported by graphical representation, the focus of this analyis is the prevalence of capital punishment over time and any biases skewing the results.

Methodology

In order to gain an clearer understanding of trends in capital punishment over time, we started by graphing the number of executions by year. After seeing the overall trend, we used categorical variables to add context to the graphs. Then, to gauge trends by geographical region, we looked at executions across the United States. Finally, we honed in on the interaction between race and region.

Insights

  • In general, the number of executions decreases during times of war
  • Due to advances in technology, methods have become increasingly more humane overtime
  • The east coast has a long history with capital punishment beginning in colonial Virginia
  • With the exception of California, the number of executions tends to decrease as you move West
  • Trends in racial prejudice vary from region to region

Implications

This analyis can be used to gain an understanding of how events in history have affected capital punishment levels. We can see that the number of executions decreases in times of war. This trend is particularly visible during the Vietnam War, a time of anti-war marches and protest songs. The loss of American soldiers on the battlefield seems to have a palpable effect on the usage of capital punishment. We can also see that racial prejudice against African Americans, cultivated during the Civil War, has persisted and that a new bias, against the Hispanic population, may have emerged.

Limitations

This analyis was limited by the lack of trial data. We know that racial prejudices have affect trends in capital punishment over time. Next, I would like to incorporate data from Capital Punishment trials to reveal if the same biases have led to convictions. To do so, we would bring in variables pertaining to the victim, like race, gender, and age.