Intro - Exploratory Data Analysis

Before we get going, I should warn you, I’m a pretty fast typist. Don’t worry about keeping up with my narrative (i.e., the white part).

Do keep up with the code!

Since this is your first time programming, there will be lots you don’t understand! That’s ok! Don’t try to memorize – just play the game with me. You’ll start to understand the logic as you get more experience.

First, before we get to work, we’ll need our “tools”. Programmers call them libraries.

library(tidyverse)

Ok, we’ve got our tools out. But, we need data! Let’s load it.

deathPenaltyData <- read_csv("death penalty data.csv")

#let's take a look:

Ok cool, we’ve got data! But, now what? The awesome and terrifying answer is: it’s up to us!

The process is called Exploratory Data Analysis:

  1. Explore! Ask question. Be creative. What looks interesting to you? What qualities and relationships do you notice? Make graphs! Compute stats! Have fun!

  2. Analyze and conclude. What behaviors did you observe? Why are those interesting? What conclusions about the dataset can you make?

  3. Communicate! What results are worthy of sharing? Why might people care? What action should we take, if any? Policy?

I notice that age, sex, and race seem interesting. Let’s start there!

Age, Sex, and Race

#plot age:

qplot(Age, data=deathPenaltyData, 
      ylab = "Count", 
      main="Age of Prisoners", 
      bins=10)

summary(deathPenaltyData$Age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.00   34.00   40.00   41.48   47.00   77.00

Observations:

  1. It looks like prisoners tend to be about 40 years old when they’re executed.

  2. There’s some ‘right skew’ in the distribution: sometimes, older folks get excuted too! Common ages range from 34 to 47.

#now, let's look at Sex


qplot(Sex, data=deathPenaltyData,
      ylab = "Count",
      main = "Sex of prisoners")

Observations:

  1. Almost all executed prisoners are men.
  2. It’s probably difficult to say too much about the women, since n=16 is such a small sample size.
qplot(Race, data=deathPenaltyData,
      ylab = "Count",
      main="Race of executed prisoners",
      fill = Race)

Observations:

  1. In truth, about 12% of americans are black. But, it looks like a much higher proportion of african americans are receiving the death penalty. Does this mean that black americans are being targeted for capitol punishment?
  2. It would be helpful to consult another source about the demographics of the US for the dates we’ve seen. How much deviation do we see here?

Executions Over Time

######### Warning:  Fancy Footwork Ahead

#goal:  find out how many executions in each year, then plot

#problem 1:  we have to count them

class(deathPenaltyData$Date)
## [1] "character"
#problem 2:  teach R the date format (data wrangling!!)


deathPenaltyData$Date <- as.Date(deathPenaltyData$Date, format = "%m/%d/%Y")

deathPenaltyData$Date <- format(deathPenaltyData$Date, "%Y")

yearTotals <- count(deathPenaltyData, Date)


qplot(Date, n, data=yearTotals, geom="line", group=1)

Woah! What happened in 2000? It seems like the death penalty became more and more “popular” up to 2000, then a sharp decline. So, wht’s the story there? Election? Policy change? Public outrage?