Introduction:

When Curtis Flowers, black and an alleged murderer of four was first convicted by an all-white jury and convicted three more times by nearly all-white juries, it is hard not to suspect that having more black jurors on the jury will make a difference in the jury decision. The racial makeup of Flowers’ juries hardly represents the overall racial makeup of Mississippi’s Fifth Circuit Court District, which is 40% black. Were black jurors in the Fifth Circuit Court District indeed underrepresented in criminal court cases due to the prosecution’s race-based jury selection practices? If so, the prosecution would have violated the 1986 U. S. Supreme Court ruling in Batson v. Kentucky that it is unconstitutional to reject a juror based on their race. It is very difficult to determine the violation of Batson ruling since jury selection involves many other factors that could determine an juror’s eligibility. However we can look into the jury selection result in each criminal court case in the Fifth Circuit Court District during 1992 to 2017, the time period of Curtis Flowers’ prosecutor Doug Evan’s tenure.

With the trials dataset of 306 trials that took place between in the Fifth Circuit Court District during 1992 to 2017, and the jurors dataset of 14, 875 jury candidates who have been considered, we can examine the information of jurors in each corresponding case. The trials dataset includes information regarding the participants of the trial, such as information about the defendant, the defense attorney, the prosecution attorney, and the judge, and also information about the trial itself, such as year, offense, appeal status, Batson claim, etc. Through the trial id of each juror in the jurors dataset, we would know which trial each juror was being considered for. If there is no such race-based irregularity in the jury selection process during this period, we shall expect to see the proportion of rejected and accepted to approximately reflect the overall racial makeup of the district which is 60% white to 40% black.

Overall, the main question we were interested in investigating is how the racial makeup of the jury compares to the overall population. In particular, we were interested in which processes of the jury selection process might change the demographic make up of juries, what factors affected those processes, and if there was significant change in those factors over time.

Methodology:

As most of our efforts centered on the makeup of the jury, we focused on the jurors and trials data sets. Our first task was to join the two data sets together. This was relatively easy, as the data sets were collected together and correspond by the trial_id and id features from jurors and trials respectively. Then, we needed to determine how racial makeup of juries changed over time, so we extracted the year from the trial feature of the newly joined data table, which required some string processing. Further, we added some new features to the joined data to make finding racial proportions easy through the use of summarise from the dplyr package.

From that point on, much of the exploration was straightforward in theory, although we were often forced to deal with small inconveniences, such as oddly named variables. There were some other problematic points–we consistantly needed to filter out the unknown racial group, and unknown data also needed to be removed when examining different reasons jurors might be struck from the jury pool. Further, we were often required to rename or group together features of jurors, as the default names were unwealdy, or there were too many possible values which were not of interest. Again, this is probably most visible in our analysis of the reasons why a juror might be struck from the pool, where many different factors needed to be group under not struck or other. However, overall much of the data wrangling, once we figured out how to easily extract racial proportions, was fairly simple; rather, it was the analysis of and comparison between multiple different visualizations that took the bulk of the effortf for this project.

Results:

First of all, we examine the potential effect of missing data to make sure it does not compromise our analysis. When examining the overall juror pool, we noticed that the race information of a significant number of jurors is marked as ‘unknown’. If there are too many unknown race jury candidates in the jury pool of one trial, it will get into our way of analysing whether the jury pool represents the overall racial makeup of the district. As we can see from the following graph, the proportion of jurors with unknown race is very high in trials before 1995 and after 2012. When the proportion of jurors with unknown race information is too high, we cannot accurately assess the differentiation between the proportions of black and white jurors. However, when the proportion of unknown race information is below 25%, the proportions of black and white jurors approximately resembles the 40 - 60 percent ratio of black and white population in the district. Therefore, it is important moving forwards to note the relative proportions of different racial groups across years, rather than their absolute values over time.

After examining the baseline makeup of the jury pools, we turn to an analysis of the factors which cause jurors to be removed from the pool. Figure 2 shows the proportion of white and black jurors struck from (or kept in) the jury over time; the dashed line represents the proportion of white adults in the population of the district in question. From this, we can see that the impact of jury selection might cause the demgraphics of juries to be whiter than the overall population. This is worrying, because Figure 0 shows that black people make up a large proportion of defendents, disproportionatly so considering white people make up 60% of the district’s population. Therefore, we turn to Figure 3 for a deeper analysis of how jury selection works. There are three major ways jurors can be removed: they can be removed via peremptory strike by either the defence or the prosecution or they can be removed for cause, in which the Judge agrees that there is some legal reason (such as a conflict of interest) that a juror cannot participate. Three of the sub-figures of Figure 3 represent the causes for removal just mentioned, while the “other” category includes a variety of cases, such as where a juror was excluded due to failure to appear, or where no cause was provided in the data. We also include the jurors which are kept in the pool. Note that because we are including only black and white jurors, and the number of unknown race jurors varies wildly over time, the exact shape of the curves is less important than their relative values.

One of the first things to note from Figure 3 is that there is no strong association between juror race and whether or not a juror is struck for cause that is persistent over time. Some slight positive association may exist, but Figure 3 alone isn’t enough to show such a fact. However, what we can take from this is that any effect that strongly differentiates black and white jurors probably is not due to racial differences in being struck for cause. Further, because prosecutors and the defense council both have limited peremptory challenges, both sides are likely to try to strike jurors which they find unfavorable via strikes for cause before using a peremptory strike. Thus, it is unlikely that this lack of clear, strong association with race is caused by an alteration of the racial makeup of the jury pool caused by the use of peremptory strikes.

Taking this into account, we can move on to an analysis of the ways prosecutors and defence attorneys use their peremptory challenges based on the race of jurors. As stated above, due to the presence of a large but variable population of racially unknown jurors, the shape of the curves is difficult to draw meaning from. However, Figure 3 shows very clearly that defense attorneys are much more likely to remove white jurors than black, and the opposite holds true for the prosecutors. Indeed, this holds true regardless of the race of the defendant, as Figures 4 and 5 show. This trend is consistent over time for both groups, and, combined with the relatively weak association between race and strikes for cause, shows that there is an association between the race of the juror and being struck from the pool via a peremptory challenge. But there is a problem: the Batson v Kentucky case set the precedent that jurors cannot be removed from the pool on the basis of race alone. If this rule were perfectly enforced, we would expect that this racial disparity is caused by some other confounding factor. We therefore turn to Figures 6 and 7 to gain an understanding of how Batson challenges are used in practice.

The first thing to note from Figures 6 and 7 is that only a few Batson challenges are made (54) in comparison to the overall number of trials (305), although the number of Batson challenges recorded is likely to be less than the actual number made due to limitations in data collection. This provides some evidence that the racially disparate impact of peremptory challenges is not based on race alone. However, as the trial of a black defendant is more likely to feature a Batson challenge than the trial of a white defendant, a trend which holds for challenges made by both the prosecution and defense, we can see that the distribution of batson challenges is not random. Because trials of both white and black defendants have a disproportionately white jury, the fact that most Batson challenges occur in the trials of black defendants hints that, despite the official definition of a Batson challenge, the nature of a peremptory strike is not the only factor that causes an attorney to make a Batson challenge. This casts doubt on the initial conclusion that all unchallenged peremptory strikes are not based on race.

The fact that prosecutors seem to prefer white jurors while defense attorneys seem to ‘prefer’ black jurors indicates that black jurors may be more lenient than white jurors. Re-examining Figure 1 in comparison to the “Juror not struck” section of Figure 3, we can see that, after all the strikes are complete, the racial makeup of the jury has shifted, namely, the proportion of white jurors increases after selection, and the opposite holds true for black jurors. We found some evidence that these racially disparate effects may not be based on race alone (based on Batson challenges), but it is also clear that using Batson challenges as the sole arbitrator of what is racially biased is overly simplistic at best. It is also important to note that racially disparate impacts can be just as important as racially disparate decisions; that is, regardless of the reasons why an attorney might remove a juror, the fact of the matter is that peremptory challenges serve to shift the racial demographics of the jury pool to create a more white jury than the overall population.

This has some serious negative implications. First, this effect defies the ideal of being tried by a jury of one’s peers, because the makeup of the jury is not the same as the makeup of the overall population. This is made worse by the disproportionate representation of black people among defendants (Figure 0). Further, we observed earlier that black jurors might tend to be more lenient than white jurors, so this demographic shift might have the secondary effect of increasing the harshness of courtroom decisions. Overall, it is clear that the jury selection process causes a clear decrease in the proportion of black jurors as compared to the initial jury pool, and that this shift is driven largely by the preemptive strikes used by the prosecution, rather than preemptive strikes by the defense or from striking due to legal ineligibility. However, further research is needed to overcome the limitations of the data set used here, such as the large proportion of unknown race jurors, and the lack of records for the viore dire process for many trials. Additionally, it is important to determine what causes the association between peremptory strikes and race, and why that association differs between the strikes used by the state and the defense, work which we have only just begun here.

##R Appendix

See the PairHW2.Rmd file for the R code used to determine exact values and proportions and explore the data, but which are not used directly in the analysis above.

#Join jurors and trials data and created new indicator vairables for defendant's race

juror_trial <- left_join(jurors, trials, by = c("trial__id"="id")) %>% 
  mutate(year = as.numeric(str_sub(trial, start = 1, end = 4)),
         is_white = ifelse(race == "White", 1, 0),
         is_black=ifelse(race == "Black", 1, 0),
         is_unknown = race == "Unknown",
         is_other = !(race %in% c("White", "Black", "Unknown")),
         def_black = ifelse(defendant_race == "Black", "Black", "Not Black"))
# Filter **trials** data to examine the number of cases in which non-black defendants made Batson claim. 
trials %>%
  filter(defendant_race != "Black", batson_claim_by_defense == TRUE)
## # A tibble: 4 x 38
##      id defendant_name cause_number state_strikes defense_strikes county
##   <dbl> <chr>          <chr>        <lgl>         <lgl>           <chr> 
## 1     7 Billy Joe Bar… 1994-9943    FALSE         FALSE           Attala
## 2    64 Randy Burton   2001-0054    TRUE          TRUE            Choct…
## 3   259 Lester Austin  1993-4141    TRUE          TRUE            Webst…
## 4   291 Ricky Lenard   1995-7009[2] TRUE          TRUE            Grena…
## # … with 32 more variables: defendant_race <chr>,
## #   second_defendant_race <chr>, third_defendant_race <chr>,
## #   fourth_defendant_race <chr>, more_than_four_defendants <lgl>,
## #   judge <chr>, prosecutor_1 <chr>, prosecutor_2 <chr>,
## #   prosecutor_3 <chr>, prosecutors_more_than_three <lgl>,
## #   def_attny_1 <chr>, def_attny_2 <chr>, def_attny_3 <chr>,
## #   def_attnys_more_than_three <lgl>, offense_code_1 <chr>,
## #   offense_title_1 <chr>, offense_code_2 <chr>, offense_title_2 <chr>,
## #   offense_code_3 <chr>, offense_title_3 <chr>, offense_code_4 <chr>,
## #   offense_title_4 <chr>, offense_code_5 <chr>, offense_title_5 <chr>,
## #   offense_code_6 <chr>, offense_title_6 <chr>, more_than_six <lgl>,
## #   verdict <chr>, case_appealed <lgl>, batson_claim_by_defense <lgl>,
## #   batson_claim_by_state <lgl>, voir_dire_present <lgl>
#Validate our assumption that in the dataset there are
#   very few cases with defendant race other than black
#   and white by year. Trials with white defendants or
#   black defendants make up the vast majority of the trials
#   during this period.  

filter(trials, defendant_race %in% c("Latino", "Asian"))
## # A tibble: 2 x 38
##      id defendant_name cause_number state_strikes defense_strikes county
##   <dbl> <chr>          <chr>        <lgl>         <lgl>           <chr> 
## 1    93 Dung Tran      1999-0181    TRUE          TRUE            Grena…
## 2   207 Jose Guerrero  2004-0066    TRUE          TRUE            Attala
## # … with 32 more variables: defendant_race <chr>,
## #   second_defendant_race <chr>, third_defendant_race <chr>,
## #   fourth_defendant_race <chr>, more_than_four_defendants <lgl>,
## #   judge <chr>, prosecutor_1 <chr>, prosecutor_2 <chr>,
## #   prosecutor_3 <chr>, prosecutors_more_than_three <lgl>,
## #   def_attny_1 <chr>, def_attny_2 <chr>, def_attny_3 <chr>,
## #   def_attnys_more_than_three <lgl>, offense_code_1 <chr>,
## #   offense_title_1 <chr>, offense_code_2 <chr>, offense_title_2 <chr>,
## #   offense_code_3 <chr>, offense_title_3 <chr>, offense_code_4 <chr>,
## #   offense_title_4 <chr>, offense_code_5 <chr>, offense_title_5 <chr>,
## #   offense_code_6 <chr>, offense_title_6 <chr>, more_than_six <lgl>,
## #   verdict <chr>, case_appealed <lgl>, batson_claim_by_defense <lgl>,
## #   batson_claim_by_state <lgl>, voir_dire_present <lgl>
# Exploratory analysis used to understand how often a
# batson challenge is made.
num_batson_defense <- nrow(filter(trials, batson_claim_by_defense == TRUE))
num_batson_state <- nrow(filter(trials, batson_claim_by_state == TRUE))
num_batson_both <- nrow(filter(trials, batson_claim_by_state == TRUE, batson_claim_by_defense == TRUE))
num_batson <- num_batson_defense + num_batson_state - num_batson_both # 54