Synthetic data on bail reform in the United States

Overview

Note: This files makes use of synthetic data. Synthetic data is artificially generated data that mimics the statistical properties and patterns of real-world data without containing any actual real-world information.

This file is provided as a preliminary resource until official data is added to the critstats package. You may also use this code to gather data related to your class project, thesis, or other academic tasks beyond what is provided below. Content in this file comes from a host of different sources which you should be familiar with prior to access and analyzing any data.

Set up your work enviornment

Open up a new .Rmd file.

Use {r setup, include=F} in your first code chunk.

knitr::opts_chunk$set(echo = TRUE)

# Load necessary libraries
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lubridate)

# Set seed for reproducibility
# set_seed(123)

library(tidyverse)
library(readr) 
library(dplyr)

Reports at the Bail Project

There are resources available at the Bail Project, with some infomration on cash bail here. However, there are other data sources that exist for quick downloading and analyses.

Synthetic data on bail reform

For the purposes of code development, synthetic data is used here to help model main outcomes and understand the hypotheses.

# Create pseudo data frame
bail_reform_data <- tibble(
  case_id = 1:1000,
  arrest_date = sample(seq(as.Date('2020-01-01'), as.Date('2022-12-31'), by="day"), 1000, replace=TRUE),
  offense_type = sample(c("Misdemeanor", "Felony"), 1000, replace=TRUE, prob=c(0.7, 0.3)),
  bail_amount = sample(c(0, 500, 1000, 5000, 10000, 50000), 1000, replace=TRUE, prob=c(0.3, 0.2, 0.2, 0.15, 0.1, 0.05)),
  pretrial_release = sample(c("Yes", "No"), 1000, replace=TRUE, prob=c(0.6, 0.4)),
  days_in_jail = ifelse(pretrial_release == "No", sample(0:90, 1000, replace=TRUE), 0),
  court_appearance = sample(c("Appeared", "Failed to Appear"), 1000, replace=TRUE, prob=c(0.9, 0.1)),
  rearrest_within_30days = sample(c("Yes", "No"), 1000, replace=TRUE, prob=c(0.05, 0.95)),
  race = sample(c("White", "Black", "Hispanic", "Other"), 1000, replace=TRUE, prob=c(0.5, 0.3, 0.15, 0.05)),
  age = sample(18:70, 1000, replace=TRUE),
  gender = sample(c("Male", "Female"), 1000, replace=TRUE, prob=c(0.75, 0.25)),
  reform_period = ifelse(arrest_date < as.Date('2021-07-01'), "Pre-Reform", "Post-Reform")
)

View the data.

# Display the first few rows of the dataset
head(bail_reform_data)

## # A tibble: 6 × 12
##   case_id arrest_date offense_type bail_amount pretrial_release days_in_jail
##     <int> <date>      <chr>              <dbl> <chr>                   <dbl>
## 1       1 2021-03-01  Misdemeanor            0 No                         70
## 2       2 2020-05-27  Misdemeanor            0 Yes                         0
## 3       3 2020-12-28  Felony               500 Yes                         0
## 4       4 2022-01-19  Felony                 0 Yes                         0
## 5       5 2022-12-23  Misdemeanor        50000 Yes                         0
## 6       6 2021-07-03  Felony               500 Yes                         0
## # ℹ 6 more variables: court_appearance <chr>, rearrest_within_30days <chr>,
## #   race <chr>, age <int>, gender <chr>, reform_period <chr>

# Summary statistics
summary(bail_reform_data)

##     case_id        arrest_date         offense_type        bail_amount   
##  Min.   :   1.0   Min.   :2020-01-01   Length:1000        Min.   :    0  
##  1st Qu.: 250.8   1st Qu.:2020-10-23   Class :character   1st Qu.:    0  
##  Median : 500.5   Median :2021-07-24   Mode  :character   Median :  500  
##  Mean   : 500.5   Mean   :2021-07-05                      Mean   : 4182  
##  3rd Qu.: 750.2   3rd Qu.:2022-03-11                      3rd Qu.: 5000  
##  Max.   :1000.0   Max.   :2022-12-31                      Max.   :50000  
##  pretrial_release    days_in_jail  court_appearance   rearrest_within_30days
##  Length:1000        Min.   : 0.0   Length:1000        Length:1000           
##  Class :character   1st Qu.: 0.0   Class :character   Class :character      
##  Mode  :character   Median : 0.0   Mode  :character   Mode  :character      
##                     Mean   :19.8                                            
##                     3rd Qu.:41.0                                            
##                     Max.   :90.0                                            
##      race                age           gender          reform_period     
##  Length:1000        Min.   :18.00   Length:1000        Length:1000       
##  Class :character   1st Qu.:32.00   Class :character   Class :character  
##  Mode  :character   Median :46.00   Mode  :character   Mode  :character  
##                     Mean   :44.95                                        
##                     3rd Qu.:59.00                                        
##                     Max.   :70.00

Save the data.

write_csv(bail_reform_data, here("bail_reform_data.csv"))

Access the saved data here.