Capital Punishment in America

Are We Biased?

Introduction

Is the Death Penalty Still Alive?

If so, for whom?

In a time of political mistrust, the debate over capital punishment is still brewing. With historical data, I hope to reveal the prevalence of capital punishment over time and any biases skewing the results. The data set is large, with over 15,000 executions occurring over almost 400 years, and ranging from firing squads in colonial Jamestown, all the way to the massive media controversies facing the early 2000s.

The scope of this study covers a lot of history, and even more legislature. I am interested to understand the trends in capital punishment over time and how these changes correlate with pivotal events in our human history.

The Focus

To explore this relationship, I will be utilizing data from Executions in the United States, 1608-2002: The ESPY File. This data was collected in partnership with the National Science Federation and the United States Department of Justice. Seeing as the United States, as we know it, did not exist until 1776, the data includes earlier executions in any territories that would later became states.

Key Variables Include:

  • age
  • race
  • gender
  • occupation
  • date
  • method of execution
  • convicted crime

Analytical Approach

My proposed approach is to first plot the variables of interest by year. These graphs will help to shed light on any biases in the capital punishment system and how changes in legislature through the years has affected the number or method of executions.

I will then take a closer look at the years showing major shifts. The goal here would be to see if I can isolate an event, political or otherwise, that might help explain the results.

Mission

This analysis is intended to help consumers form an opinion on capital punishment based on sound data. The issue has been hotly contested for hundreds of years, meaning there is no shortage of op-ed pieces littering the internet. I hope that my analysis can help consumers, including myself, gain a clearer understanding of capital punishment, without biased interruption.

Ultimately, I would like to understand if there were any biases still present in 2002 and, if so, do they still exist today?


In 2017, capital punishment is still legal in 31 states.

Should it be?

Requirements

Required Packages

The following packages are required in order to run code without errors.


Package Name Purpose
library(tidyverse) easy installation of packages
library(ggplot2) plotting & visualizing data
library(maps) for geographical data
library(DT) to create functional tables in HTML
library(knitr) for dynamic report generation
library(rmarkdown) to convert R Markdown documents into a variety of formats
library(ggthemes) to implement theme across report
library(plotly) for dynamic plotting
library(tidyverse)     # easy installation of packages
library(ggplot2)       # plotting & visualizing data
library(maps)          # for geographical data
library(DT)            # to create functional tables in HTML
library(knitr)         # for dynamic report generation
library(rmarkdown)     # to convert R Markdown documents into a variety of formats
library(ggthemes)      # to implement theme across report
library(plotly)        # for dynamic plotting

Data Import & Prep

Data Import

The data set contains information about executions performed under civil authority in the United States between 1608 and 2002. The data was collected between 1970 and 2002 with the help of records from the State Department of Corrections, newspapers, court proceedings, and historical recordkeepers.

First, we must import the csv file and specify column names. There are several columns that have no relevance for our analysis. I have coded these columns as numbers in order to differentiate them from the variables of interest.

raw_data <- read_csv("raw_data.csv",
                     col_names = c("1", "2",
                     "3", "4",
                     "Race", "Age",
                     "Name", "5",
                     "6", "Conviction",
                     "Method", "7",
                     "8", "Year",
                     "9", "State",
                     "10", "11",
                     "Gender", "12",
                     "Occupation"),
                     skip = 1)

Data Preparation

For our purposes, we will narrow down the data to 9 key variables.

  • Year
  • State
  • Age
  • Gender
  • Race
  • Occupation
  • Crime
  • Method
scrubbed = raw_data[,c("Year", "State",
                       "Name", "Age",
                       "Gender", "Race", 
                       "Occupation", "Crime",
                       "Method")]

Key Variables

In order to help round out the data, we will introduce two new categorical variables : Region and Era. This will help us to better visualize geographical and historical trends.

1. Region

Groups states based on geographical regions specified in The US Census.

scrubbed$Region <- ifelse(scrubbed$State %in% c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin"), "East North Central", 

ifelse(scrubbed$State %in% c("Alabama", "Kentucky", "Mississippi", "Tennessee"), "East South Central",   
       
ifelse(scrubbed$State %in% c("New Jersey", "New York", "Pennsylvania"), "Middle Atlantic",
       
ifelse(scrubbed$State %in% c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming"), "Mountain",

ifelse(scrubbed$State %in% c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont"), "New England",
                                                      
ifelse(scrubbed$State %in% c("Alaska", "California", "Hawaii", "Oregon", "Washington"), "Pacific",

ifelse(scrubbed$State %in% c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "Washington, D.C.", "West Virginia"), "South Atlantic",
                                                                    
ifelse(scrubbed$State %in% c("Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota"), "West North Central",
                                                                           
ifelse(scrubbed$State %in% c("Arkansas", "Louisiana", "Oklahoma", "Texas"), "West South Central", "NA")))))))))

2. Era

A somewhat subjective grouping based on US History.

scrubbed$Era <- ifelse(scrubbed$Year < 1630, "Early America", 
                       
ifelse(scrubbed$Year >= 1630 & scrubbed$Year < 1763, "Colonial Period",
       
ifelse(scrubbed$Year >= 1763 & scrubbed$Year < 1783, "Revolutionary Period",
       
ifelse(scrubbed$Year >= 1783 & scrubbed$Year < 1815, "Young Republic",
        
ifelse(scrubbed$Year >= 1815 & scrubbed$Year < 1860, "Expansionary Period",        

ifelse(scrubbed$Year >= 1860 & scrubbed$Year < 1876, "Civil War & Reconstruction",
       
ifelse(scrubbed$Year >= 1876 & scrubbed$Year < 1914, "Second Industrial Revolution",      
       
ifelse(scrubbed$Year >= 1914 & scrubbed$Year < 1933, "WWI & Depression",

ifelse(scrubbed$Year >= 1933 & scrubbed$Year < 1945, "New Deal & WWII", 

ifelse(scrubbed$Year >= 1945 & scrubbed$Year < 1960, "Postwar America",
       
ifelse(scrubbed$Year >= 1960 & scrubbed$Year < 1980, "Vietnam Era",
        
ifelse(scrubbed$Year >= 1980 & scrubbed$Year <= 2002, "Rise of Technology", "NA"))))))))))))                     

Clean Data!

Data Dictionary

Data Type Variable Description
Year integer Year of Execution
State character State of Execution
Name character Name of Offender
Age integer Age at Execution
Gender character Gender of Offender
Race character Race of Offender
Occupation character Occupation of Offender
Crime character Crime Committed
Method character Method of Execution
Region character Region of Execution
Era character Era of Execution

Capital Punishment Data

Data Subsets

Subsets Based on Key Variables

To easily run reports, we will create subsets.

1. Count of Years

First, we will create a subset based on the frequency of executions by Year. To add another dimension to the data, we will also incorporate our predefined variable Era. We will use this data in conjuction with geom_point to reveal trends in the number of capital punishment executions over time.

count_Years <- scrubbed %>% group_by(Year, Era) %>%
  tally()

2. Count of Method

Next, we will create a subset that includes only the Year and Method variables. We will use this data in conjuction with geom_bar to show the prevalence of methods over time.

Method_Vars <- c("Year", "Method")
count_Method <- scrubbed[Method_Vars]
count_Method <- na.omit(count_Method)

3. Count of Crime

Next, we will create a subset based on the frequency of executions by Crime. We will use this data to assess the most prevalent convictions in capital punishment cases.

count_Crime <- scrubbed %>% group_by(Crime) %>%
  tally()

This variable is different in that some observtions have values of NA. In order to create tidy graphs, we will need to eliminate these records.

count_Crime <- na.omit(count_Crime)

4. Count of State

Next, we will create a subset based on the frequency of executions by State. We will also include region in this grouping as it helps add another dimension to the data. We will use this in conjuction with geom_polygon and geom_bar to reveal trends in capital punishment across the US.

count_State <- scrubbed %>% group_by(State, Region) %>% 
  tally()

5. Count of Gender

Next, we will create a subset based on the frequency of executions by Gender. To round out the data, we will incorporate Year and Method. We will use this in conjuction geom_point to see the breakdown of executions by male and female.

count_Gender <- scrubbed %>% group_by(Year, Gender, Method) %>%
  tally()

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Gender <- na.omit(count_Gender)

6. Count of Age

Next, we will create a subset that includes only the Age and Race variables. We will use this data in conjuction with geom_boxplot to assess the relationship between age and race.

Age_Vars <- c("Age", "Race", "Gender")
count_Age <- scrubbed[Age_Vars]

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Age <- na.omit(count_Age)

7. Count of Race

Next, we will create a subset based on the frequency of executions by Race. Here, we will use geom_point to reveal trends over time.

count_Race <- scrubbed %>% group_by(Year, Race, Era) %>% 
  tally()

This variable also contains observtions that have values of NA; we will need to eliminate these records.

count_Race <- na.omit(count_Race)

8. Region and Race

Next, we will create a subset that shows Race and Region. To add historical context, we will also use our predefined variable Era. In conjuction with geom_bar, we use the data to show racial biases in capital punishment over time.

We will be creating a facet_grid by era. We will want these facets to show in sequential order. To do so, we will order the historical eras by applying levels.

RR_Vars <- c("Race", "Region", "Era")
Race_Region <- scrubbed[RR_Vars]
Race_Region$Era_order <- factor(Race_Region$Era, levels=c("Early America", "Colonial Period", 
                                  "Revolutionary Period", "Young Republic",
                                  "Expansionary Period", "Civil War & Reconstruction", 
                                  "Second Industrial Revolution", "WWI & Depression", 
                                  "New Deal & WWII","Postwar America", 
                                  "Vietnam Era","Rise of Technology"))

Like we did to achieve count_Race, we will also remove NA values.

Race_Region <- na.omit(Race_Region)

9. State Data

Finally, in order to create a frequency map of executions by state, we will merge our count_State subset with geographical data pulled from the Maps package. Using geom_polygon, we will create a heat map of the US that shows the number of executions by state.

Since our data uses a captital letter to begin each state name, we will need to create a formula to capitalize the first letter of state names in the map data before we can merge successfully.

all_states <- map_data("state")

capFirst <- function(s) {
  paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "")
} 
  
all_states$region <- capFirst(all_states$region) 


colnames(all_states) <- c("long", "lat", "group", "order", "State", "subregion")

stateMap <- merge(all_states, count_State, by="State", all.x=T)
stateMap <- stateMap[order(stateMap$order),]

Visualizations

Conclusions