Are We Biased?
In a time of political mistrust, the debate over capital punishment is still brewing. With historical data, I hope to reveal the prevalence of capital punishment over time and any biases skewing the results. The data set is large, with over 15,000 executions occurring over almost 400 years, and ranging from firing squads in colonial Jamestown, all the way to the massive media controversies facing the early 2000s.
The scope of this study covers a lot of history, and even more legislature. I am interested to understand the trends in capital punishment over time and how these changes correlate with pivotal events in our human history.
To explore this relationship, I will be utilizing data from Executions in the United States, 1608-2002: The ESPY File. This data was collected in partnership with the National Science Federation and the United States Department of Justice. Seeing as the United States, as we know it, did not exist until 1776, the data includes earlier executions in any territories that would later became states.
Key Variables Include:
My proposed approach is to first plot the variables of interest by year. These graphs will help to shed light on any biases in the capital punishment system and how changes in legislature through the years has affected the number or method of executions.
I will then take a closer look at the years showing major shifts. The goal here would be to see if I can isolate an event, political or otherwise, that might help explain the results.
This analysis is intended to help consumers form an opinion on capital punishment based on sound data. The issue has been hotly contested for hundreds of years, meaning there is no shortage of op-ed pieces littering the internet. I hope that my analysis can help consumers, including myself, gain a clearer understanding of capital punishment, without biased interruption.
Ultimately, I would like to understand if there were any biases still present in 2002 and, if so, do they still exist today?
In 2017, capital punishment is still legal in 31 states.
Should it be?
The following packages are required in order to run code without errors.
| Package Name | Purpose |
|---|---|
| library(tidyverse) | easy installation of packages |
| library(readr) | to easily import delimited data |
| library(maps) | for geographical data |
| library(mapproj) | to convert latitude/longitude into projected coordinates |
| library(DT) | to create functional tables in HTML |
| library(knitr) | for dynamic report generation |
| library(rmarkdown) | to convert R Markdown documents into a variety of formats |
| library(ggthemes) | to implement theme across report |
| library(plotly) | for dynamic plotting |
# to preload necessary packages
list.of.packages <- c("tidyverse", "readr", "maps", "DT", "knitr", "rmarkdown", "ggthemes", "plotly", "mapproj")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org" )library(tidyverse) # easy installation of packages
library(readr) # to easily import delimited data
library(maps) # for geographical data
library(mapproj) # to convert latitude/longitude into projected coordinates
library(DT) # to create functional tables in HTML
library(knitr) # for dynamic report generation
library(rmarkdown) # to convert R Markdown documents into a variety of formats
library(ggthemes) # to implement theme across report
library(plotly) # for dynamic plottingThe data set contains information about executions performed under civil authority in the United States between 1608 and 2002. The data was collected between 1970 and 2002 with the help of records from the State Department of Corrections, newspapers, court proceedings, and historical recordkeepers.
First, we must import the csv file and specify column names. There are several columns that have no relevance for our analysis. I have coded these columns as numbers in order to differentiate them from the variables of interest.
# to read in CSV
raw_data <- read_csv("raw_data.csv",
col_names = c("1", "2",
"3", "4",
"Race", "Age",
"Name", "5",
"6", "Conviction",
"Method", "7",
"8", "Year",
"9", "State",
"10", "11",
"Gender", "12",
"Occupation"),
skip = 1)For our purposes, we will narrow down the data to 9 key variables.
# to save select variables to new subset
scrubbed = raw_data[,c("Year", "State",
"Name", "Age",
"Gender", "Race",
"Occupation", "Crime",
"Method")]In order to help round out the data, we will introduce two new categorical variables : Region and Era. This will help us to better visualize geographical and historical trends.
Groups states based on geographical regions specified in The US Census.
# to group states by region
scrubbed$Region <- ifelse(scrubbed$State %in% c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin"), "East North Central",
ifelse(scrubbed$State %in% c("Alabama", "Kentucky", "Mississippi", "Tennessee"), "East South Central",
ifelse(scrubbed$State %in% c("New Jersey", "New York", "Pennsylvania"), "Middle Atlantic",
ifelse(scrubbed$State %in% c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming"), "Mountain",
ifelse(scrubbed$State %in% c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont"), "New England",
ifelse(scrubbed$State %in% c("Alaska", "California", "Hawaii", "Oregon", "Washington"), "Pacific",
ifelse(scrubbed$State %in% c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "Washington, D.C.", "West Virginia"), "South Atlantic",
ifelse(scrubbed$State %in% c("Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota"), "West North Central",
ifelse(scrubbed$State %in% c("Arkansas", "Louisiana", "Oklahoma", "Texas"), "West South Central", "NA")))))))))A somewhat subjective grouping based on US History.
# to group years by era
scrubbed$Era <- ifelse(scrubbed$Year < 1630, "Early America",
ifelse(scrubbed$Year >= 1630 & scrubbed$Year < 1763, "Colonial Period",
ifelse(scrubbed$Year >= 1763 & scrubbed$Year < 1783, "Revolutionary Period",
ifelse(scrubbed$Year >= 1783 & scrubbed$Year < 1815, "Young Republic",
ifelse(scrubbed$Year >= 1815 & scrubbed$Year < 1860, "Expansionary Period",
ifelse(scrubbed$Year >= 1860 & scrubbed$Year < 1876, "Civil War & Reconstruction",
ifelse(scrubbed$Year >= 1876 & scrubbed$Year < 1914, "Second Industrial Revolution",
ifelse(scrubbed$Year >= 1914 & scrubbed$Year < 1933, "WWI & Depression",
ifelse(scrubbed$Year >= 1933 & scrubbed$Year < 1945, "New Deal & WWII",
ifelse(scrubbed$Year >= 1945 & scrubbed$Year < 1960, "Postwar America",
ifelse(scrubbed$Year >= 1960 & scrubbed$Year < 1980, "Vietnam Era",
ifelse(scrubbed$Year >= 1980 & scrubbed$Year <= 2002, "Rise of Technology", "NA")))))))))))) | Data Type | Variable Description | |
|---|---|---|
| Year | integer | Year of Execution |
| State | character | State of Execution |
| Name | character | Name of Offender |
| Age | integer | Age at Execution |
| Gender | character | Gender of Offender |
| Race | character | Race of Offender |
| Occupation | character | Occupation of Offender |
| Crime | character | Crime Committed |
| Method | character | Method of Execution |
| Region | character | Region of Execution |
| Era | character | Era of Execution |
To easily run reports, we will create subsets.
First, we will create a subset based on the frequency of executions by Year. To add another dimension to the data, we will also incorporate our predefined variable Era. We will use this data in conjuction with geom_point to reveal trends in the number of capital punishment executions over time.
# count of executions by year & era
count_Years <- scrubbed %>% group_by(Year, Era) %>%
tally()Next, we will create a subset that includes only the Year and Method variables. We will use this data in conjuction with geom_bar to show the prevalence of methods over time.
# to save variables year & method to new subset
Method_Vars <- c("Year", "Method")
count_Method <- scrubbed[Method_Vars]
count_Method <- na.omit(count_Method)Next, we will create a subset based on the frequency of executions by Crime. We will use this data to assess the most prevalent convictions in capital punishment cases.
# count of executions by crime
count_Crime <- scrubbed %>% group_by(Crime) %>%
tally()This variable is different in that some observtions have values of NA. In order to create tidy graphs, we will need to eliminate these records.
count_Crime <- na.omit(count_Crime)Next, we will create a subset based on the frequency of executions by State. We will also include region in this grouping as it helps add another dimension to the data. We will use this in conjuction with geom_polygon and geom_bar to reveal trends in capital punishment across the US.
# count of executions by state & region
count_State <- scrubbed %>% group_by(State, Region) %>%
tally()Next, we will create a subset based on the frequency of executions by Gender. To round out the data, we will incorporate Year and Method. We will use this in conjuction geom_point to see the breakdown of executions by male and female.
# count of executions by year, gender, & method
count_Gender <- scrubbed %>% group_by(Year, Gender, Method) %>%
tally()This variable also contains observtions that have values of NA; we will need to eliminate these records.
count_Gender <- na.omit(count_Gender)Next, we will create a subset that includes only the Age and Race variables. We will use this data in conjuction with geom_boxplot to assess the relationship between age and race.
# to save variables age, race, & gender to new subset
Age_Vars<- c("Age", "Race", "Gender")
count_Age <- scrubbed[Age_Vars]This variable also contains observtions that have values of NA; we will need to eliminate these records.
count_Age <- na.omit(count_Age)Next, we will create a subset based on the frequency of executions by Race. Here, we will use geom_point to reveal trends over time.
# count of exeuctions by year, era, and race
count_Race <- scrubbed %>% group_by(Year, Race, Era) %>%
tally()This variable also contains observtions that have values of NA; we will need to eliminate these records.
count_Race <- na.omit(count_Race)Next, we will create a subset that shows Race and Region. To add historical context, we will also use our predefined variable Era. In conjuction with geom_bar, we use the data to show racial biases in capital punishment over time.
We will be creating a facet_grid by era. We will want these facets to show in sequential order. To do so, we will order the historical eras by applying levels.
# to save variables race, region, & era to new subset
RR_Vars <- c("Race", "Region", "Era")
Race_Region <- scrubbed[RR_Vars]
# to order variables for faceted bar chart
Race_Region$Era_order <- factor(Race_Region$Era, levels=c("Early America", "Colonial Period",
"Revolutionary Period", "Young Republic",
"Expansionary Period", "Civil War & Reconstruction",
"Second Industrial Revolution", "WWI & Depression",
"New Deal & WWII","Postwar America",
"Vietnam Era","Rise of Technology"))Like we did to achieve count_Race, we will also remove NA values.
Race_Region <- na.omit(Race_Region)Finally, in order to create a frequency map of executions by state, we will merge our count_State subset with geographical data pulled from the Maps package. Using geom_polygon, we will create a heat map of the US that shows the number of executions by state.
Since our data uses a captital letter to begin each state name, we will need to create a formula to capitalize the first letter of state names in the map data before we can merge successfully.
# to save longitude/latitude data
all_states <- map_data("state")
# to capitalize first letter of state
capFirst <- function(s) {
paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "")
}
all_states$region <- capFirst(all_states$region)
colnames(all_states) <- c("long", "lat", "group", "order", "State", "subregion")
# to merge state data with count data
stateMap <- merge(all_states, count_State, by="State", all.x=T)
stateMap <- stateMap[order(stateMap$order),]To evaluate the number of executions over time, we start by plotting key variables by Year. To flesh out the numerical data with a categorical dimension, like era, race, or gender, we will utilize color.
By mapping the color aesthetic to our predefined variable, Era, we will add some historical context to the graph. We can gain insights into questions like :
From this graph, we learn that, after the first capital punishment execution in colonial Jamestown, the number of executions has trended upwards. That is, until WWII. Here, we see a drastic drop in the number of executions until the end of the Vietnam War. However, with the rise of technology and lethal injection, the levels return on an upward trajectory.
# executions by year (era)
Year_and_Era <-
ggplot(data = count_Years, aes(x = Year, y = n)) +
geom_point(aes(color = Era)) +
theme(legend.position = "bottom") +
ggtitle("Executions by Year") +
theme_stata()
ggplotly(Year_and_Era) By mapping the color aesthetic to our predefined variable, Race, we will add some cultural context to the graph. We can gain insights into questions like :
From this graph, we can see that, in general, the number of executions of black and white inmates follow approximiately the same trendline over time. However, the number of black executions appears to be higher, especially in the years leading up to the 1960s. The graph also reveals one particulary troubling year for Native Americans, with 39 executions tied to the Dakota War of 1862. We can also see the emergence of hispanic executions beginning in the early 20th century.
# executions by year (race)
Year_and_Race <-
ggplot(data = count_Race, aes(x = Year, y = n)) +
geom_point(aes(color = Race)) +
ggtitle("Executions by Race") +
theme_stata()
ggplotly(Year_and_Race)By mapping the color aesthetic to our predefined variable, Gender, we will add some demographic context to the graph. We can gain insights into questions like :
From this graph, we can see that males represent the vast majority of capital punishment executions. The only time this trend has proven untrue was in the case of the Salem Witch trials. In 1692, the US executed a total of 14 women. In the same amount of time, the US executed only one male.
# executions by year (gender)
Year_and_Gender <-
ggplot(data = count_Gender, aes(x = Year, y = n)) +
geom_point(aes(color = Gender)) +
ggtitle("Executions by Gender") +
theme_stata()
ggplotly(Year_and_Gender)By mapping the color aesthetic to our predefined variable, Method, we will add some situational context to the graph. We can gain insights into questions like :
From this graph, we can see that early Americans utilized a variety of methods. From hanging, to bludgeoning, to burning, there were a diverse range of techniques. However, as technology increased, the methods became more humane and less diversified. By the end of 2002, injection was the clear preferred method.
# executions by year (method)
Year_and_Method <-
ggplot(count_Method, aes(Year, fill = Method)) +
geom_bar(position = "stack") +
ggtitle("Executions by Year & Method",
subtitle = "From 1608 - 2002") +
theme_stata()
ggplotly(Year_and_Method)From this table, we can see that over 50% of capital punishment executions are tied to murder convictions. If we include convictions like Robbery-Murder and Rape-Murder, the percent of executions increases to about 75%. We can also see the racial bias stemming from the Civil War era. Of the 15k + observations, from 1608 to 2002, slave revolts account for 2% of the executions.
# table of executions by crime
count_Crime <- count_Crime[order(-count_Crime$n),] %>%
mutate(percent = n / sum(n))
kable(head(count_Crime)) | Crime | n | percent |
|---|---|---|
| Murder | 8577 | 0.5782767 |
| Robbery-Murder | 2670 | 0.1800162 |
| Rape | 947 | 0.0638484 |
| Rape-Murder | 472 | 0.0318231 |
| Slave Revolt | 277 | 0.0186758 |
| Housebrkng-Burgl | 251 | 0.0169229 |
To evaluate the number of executions across the country, we will utilize the count_State subset we created. We will also incorporate our predefined variable Region to uncover if there are any biases present across regions.
By using the Maps package, we can merge our count_State subset with geographic coordinates in order to create a heat map of total executions by state. Using this map, we can gain insights into questions like :
Which states have performed the most executions?
Which states have not performed any executions?
From this map, we can see that Virginia has performed the most executions, followed by Texas and Pennsylvania. We also learn that, with the exception of California, the number of executions decreases as you move west.
# frequency map of total executions by state
ggplot(stateMap, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = n)) +
geom_path() +
scale_fill_gradientn(colours=rev("rainbow"(10)),na.value="grey90") +
coord_map() +
ggtitle("Executions by State",
subtitle = "From 1608 - 2002") We can also use a bar graph to represent the total executions by state.
Again, we can gain insights into questions like :
Which states have performed the most executions?
Which states have not performed any executions?
We can use this bar chart to confirm our findings from the heat map. The two plots agree, and we can confirm that Virginia has performed the most executions, followed by Texas and Pennsylvania. We also see that states in the South Atlantic, Middle Atlantic, and West South Central regions, have higher execution numbers. This contributes to the theory that, with the exception of California, capital punishment numbers decrease as you move west.
# bar graph of total executions by state
ggplot(count_State, aes(x = reorder(State, n), y = n, fill = Region)) +
geom_bar(stat = "identity") +
coord_flip() +
theme(axis.title = element_blank()) +
ggtitle("Executions by State",
subtitle = "From 1608 - 2002")We can also represent the frequency of executions based on region. This will help us to understand if there are trends in capital punishment trends across states in close geographic relation.
We can use this to answer questions like :
Which regions stand out in terms of number of executions?
Which regions have a low incidence of executions?
We can use this faceted bar chart to confirm our findings from the other two graphs. The three plots agree, states in the South Atlantic, Middle Atlantic, and West South Central regions, have higher execution numbers. Once again, this contributes to the theory that, with the exception of California, capital punihsment numbers decrease as you move west.
# faceted bar graph to show executions by region
ggplot(count_State, aes(x = reorder(State, n), y = n, fill = Region)) +
geom_bar(stat = "identity") +
coord_flip() +
facet_grid(~Region, labeller = label_wrap_gen(width=.1)) +
theme(text = element_text(size = 22), axis.title.y = element_blank()) +
ggtitle("Executions by Region",
subtitle = "From 1608 - 2002")To shine a light on racial bias, we will look at the number of executions by race. Ultimately, we will break this down further by Era, Region, and Age.
We can use a faceted bar chart to represent the frequency of executions by race across different eras. This will help us to understand the presence of racial biases across notable periods of US history.
We can use this to answer questions like :
Which races have the most executions?
Have these levels changed over time?
Across the board, we see that the Second Industrial Revolution had the highest number of executions and the execution levels tend to drop off in times of War. For Native Americans, we can see that execution numbers increase in the expansionary period, as americans begin to explore the Wild West. For Asian-Pacific Islanders, we see executions begin in the expansionary period and increase dramatically during the Second Industrial Revolution. Though the total number of Hispanic executions is relatively low, we can see a potential uptick during the Rise of Technology. As for African Americans, we can see biases start developing in the colonial period and continue to build during the Expansionary period. These numbers drop back off during the Civil War and then increase dramatically during the Second Industrial Revolution. From WWI through the end of WWII, the levels of African American executions remain evelated above the levels of other races.
# faceted bar graph to show executions by race & region
ggplot(Facet_Race_Region, aes(Race, fill = Race)) +
geom_bar(position = "stack") +
ggtitle("Executions by Race & Region",
subtitle = "From 1608 - 2002") +
facet_grid(Race~Era_order, scales = "free", labeller = label_wrap_gen(width=.1)) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
text = element_text(size = 22))We can use a standard bar chart to represent the frequency of executions by race across different regions. This will help us to understand the presence of racial biases across the United States.
We can use this to answer questions like :
Which regions have the most executions?
Do these numbers reveal any racial biases?
Here, we can easily see the racial bias against African Americans in the South Atlantic region. Not only has the South Atlantic region performed the most executions, but the majority of these executions were carried out on African American individuals. We can also see that the western regions, including the Mountain, Pacific, and West South Central regions, have higher rates of executions carried out on Native Amerian, Hispanic, and Asian-Pacific Islander individuals.
# bar chart to show executions by region & race
ggplot(Race_Region, aes(Region, fill = Race)) +
geom_bar(position = "stack") +
ggtitle("Executions by Race & Region",
subtitle = "From 1608 - 2002") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) Finally, we can use a box plot to evaluate the distribution of ages across race.
We can use this to answer questions like :
What is the average age of individuals at the time of execution?
Is this age consistent across races?
By using the box plot, we can see that the median age at time of execution for White and Asian-Pacific Island individuals is around 30. For African American, Hispanic, and Native American individuals, the median age at time of execution is lower, at around 25. We can also use this graph to assess variability and outliers within each race. We can see that there are significant outliers in the African American and White box plots.
# boxplot to show distribution of age by race
ggplot(count_Age, aes(x=Race, y=Age)) +
geom_boxplot(aes(color = Race)) +
ggtitle("Age at Time of Execution by Race")This analysis is intended to help readers form an opinion on capital punishment based on sound data. Supported by graphical representation, the focus of this analyis is the prevalence of capital punishment over time and any biases skewing the results.
In order to gain an clearer understanding of trends in capital punishment over time, we started by graphing the number of executions by year. After seeing the overall trend, we used categorical variables to add context to the graphs. Then, to gauge trends by geographical region, we looked at executions across the United States. Finally, we honed in on the interaction between race and region.
This analyis can be used to gain an understanding of how events in history have affected capital punishment levels. We can see that the number of executions decreases in times of war. This trend is particularly visible during the Vietnam War, a time of anti-war marches and protest songs. The loss of American soldiers on the battlefield seems to have a palpable effect on the usage of capital punishment. We can also see that racial prejudice against African Americans, cultivated during the Civil War, has persisted and that a new bias, against the Hispanic population, may have emerged.
This analyis was limited by the lack of trial data. We know that racial prejudices have affect trends in capital punishment over time. Next, I would like to incorporate data from Capital Punishment trials to reveal if the same biases have led to convictions. To do so, we would bring in variables pertaining to the victim, like race, gender, and age.