Lab or Lecture Title

Wakeup Challenge
Back to death penalty
Using select()
Summarise and Group_by
Faceting Graphs

Wakeup Challenge

Star wars with yellow eyes:

starwars %>% filter(eye_color == "yellow") %>% nrow()

## [1] 11

Back to death penalty

Basic plots: age, sex, race

qplot(
      deathPenaltyData$Age,
      xlab = "Age",
      ylab = "Count",
      main = "Age of executed prisoners"
      )

qplot(Sex, data=deathPenaltyData,
      xlab="Sex",
      ylab="Count",
      main = "Sex of executed prisoners")

Question: what percent of executed prisoners are women?

Answer: filter and nrow!

#how many in whole data set?

nrow(deathPenaltyData)

## [1] 1442

head(deathPenaltyData)

## # A tibble: 6 x 17
##   Date  Name    Age Sex   Race  Crime `Victim Count` `Victim Sex`
##   <chr> <chr> <dbl> <chr> <chr> <chr>          <dbl> <chr>       
## 1 01/1… Gary…    36 Male  White Murd…              1 Male        
## 2 05/2… John…    30 Male  White Murd…              1 Male        
## 3 10/2… Jess…    46 Male  White Murd…              1 Male        
## 4 03/0… Stev…    24 Male  White Murd…              4 2 Male, 2 F…
## 5 08/1… Fran…    38 Male  White Murd…              1 Male        
## 6 12/0… Char…    40 Male  Black Murd…              1 Male        
## # … with 9 more variables: `Victim Race` <chr>, County <chr>, State <chr>,
## #   Region <chr>, Method <chr>, Juvenile <chr>, Volunteer <chr>,
## #   Federal <chr>, `Foreign National` <chr>

deathPenaltyData %>% filter(Sex == "Female") %>% nrow

## [1] 16

#find the percent
16/1442

## [1] 0.0110957

Wow! Of the 1442 people who’ve received the death penalty, only 16 of them (1.1%) were women!

Using select()

Select lets us choose which variables we want:

deathPenaltyData %>% head()

## # A tibble: 6 x 17
##   Date  Name    Age Sex   Race  Crime `Victim Count` `Victim Sex`
##   <chr> <chr> <dbl> <chr> <chr> <chr>          <dbl> <chr>       
## 1 01/1… Gary…    36 Male  White Murd…              1 Male        
## 2 05/2… John…    30 Male  White Murd…              1 Male        
## 3 10/2… Jess…    46 Male  White Murd…              1 Male        
## 4 03/0… Stev…    24 Male  White Murd…              4 2 Male, 2 F…
## 5 08/1… Fran…    38 Male  White Murd…              1 Male        
## 6 12/0… Char…    40 Male  Black Murd…              1 Male        
## # … with 9 more variables: `Victim Race` <chr>, County <chr>, State <chr>,
## #   Region <chr>, Method <chr>, Juvenile <chr>, Volunteer <chr>,
## #   Federal <chr>, `Foreign National` <chr>

deathPenaltyData %>% select(Date, Age, Sex, Race, State, Method, `Victim Count`)

## # A tibble: 1,442 x 7
##    Date         Age Sex   Race  State Method           `Victim Count`
##    <chr>      <dbl> <chr> <chr> <chr> <chr>                     <dbl>
##  1 01/17/1977    36 Male  White UT    Firing Squad                  1
##  2 05/25/1979    30 Male  White FL    Electrocution                 1
##  3 10/22/1979    46 Male  White NV    Gas Chamber                   1
##  4 03/09/1981    24 Male  White IN    Electrocution                 4
##  5 08/10/1982    38 Male  White VA    Electrocution                 1
##  6 12/07/1982    40 Male  Black TX    Lethal Injection              1
##  7 04/22/1983    33 Male  White AL    Electrocution                 1
##  8 09/02/1983    34 Male  White MS    Gas Chamber                   1
##  9 11/30/1983    36 Male  White FL    Electrocution                 1
## 10 12/14/1983    31 Male  Black LA    Electrocution                 1
## # … with 1,432 more rows

#We picked our variables, but don't forget to save them with <-   !!!

deathPenaltyData %>% 
  select(Date, Age, Sex, Race, State, Method, `Victim Count`) ->
  myDeathPenaltyData

#Did it work?  Does myDeath..as;dlfja have the right stuff?

#Let's look with the head() commmand:  it shows you the first 6 rows

myDeathPenaltyData %>% head()

## # A tibble: 6 x 7
##   Date         Age Sex   Race  State Method           `Victim Count`
##   <chr>      <dbl> <chr> <chr> <chr> <chr>                     <dbl>
## 1 01/17/1977    36 Male  White UT    Firing Squad                  1
## 2 05/25/1979    30 Male  White FL    Electrocution                 1
## 3 10/22/1979    46 Male  White NV    Gas Chamber                   1
## 4 03/09/1981    24 Male  White IN    Electrocution                 4
## 5 08/10/1982    38 Male  White VA    Electrocution                 1
## 6 12/07/1982    40 Male  Black TX    Lethal Injection              1

#is there other stuff head() can do?  Use the help() command!

help("head")

Summarise and Group_by

These let us “collapse” data along a variable. Let me show you!

#Find total victim count in each state

#i.e., group by state, then summarise by total:

myDeathPenaltyData %>% group_by(State) %>% 
                       summarise(totalVictimCount = sum(`Victim Count`))

## # A tibble: 35 x 2
##    State totalVictimCount
##    <chr>            <dbl>
##  1 AL                  72
##  2 AR                  58
##  3 AZ                  57
##  4 CA                  32
##  5 CO                   1
##  6 CT                   4
##  7 DE                  25
##  8 FE                 172
##  9 FL                 145
## 10 GA                  93
## # … with 25 more rows

# what about average victim count?  Use mean() instead of sum()

myDeathPenaltyData %>% group_by(State) %>%
                       summarise(avgVictimCount = mean(`Victim Count`), totalVictimCount = sum(`Victim Count`))

## # A tibble: 35 x 3
##    State avgVictimCount totalVictimCount
##    <chr>          <dbl>            <dbl>
##  1 AL              1.24               72
##  2 AR              2.15               58
##  3 AZ              1.54               57
##  4 CA              2.46               32
##  5 CO              1                   1
##  6 CT              4                   4
##  7 DE              1.56               25
##  8 FE             57.3               172
##  9 FL              1.58              145
## 10 GA              1.35               93
## # … with 25 more rows

Useful shortcut: “count” means both group_by and summarise with total:

myDeathPenaltyData %>% count(Race, State)

## # A tibble: 81 x 3
##    Race  State     n
##    <chr> <chr> <int>
##  1 Asian CA        1
##  2 Asian NV        1
##  3 Asian OK        2
##  4 Asian TX        2
##  5 Black AL       25
##  6 Black AR        7
##  7 Black AZ        1
##  8 Black CA        2
##  9 Black DE        7
## 10 Black FE        1
## # … with 71 more rows

Faceting Graphs

Often, we want break up our visual summaries by group. Ex:

qplot(myDeathPenaltyData$State)

#Ok, what about by race?

qplot(data=myDeathPenaltyData, State, fill=Race)

The above technique (using fill=Race) is called “faceting”. It breaks up our visuals by some categorical variable.

Q: would it make sense to facet by the variable: Age?

qplot(data=myDeathPenaltyData, State, fill=Age)

It doesn’t make sense to facet on quantitative variables!!!! Need categorical, and hopefully, not too many categories!