Note: This files makes use of synthetic data. Synthetic data is artificially generated data that mimics the statistical properties and patterns of real-world data without containing any actual real-world information.
This file is provided as a preliminary resource until official data
is added to the critstats
package. You may also use this
code to gather data related to your class project, thesis, or other
academic tasks beyond what is provided below. Content in this file comes
from a host of different sources which you should be familiar with prior
to access and analyzing any data.
Open up a new .Rmd file.
Use {r setup, include=F}
in your first code chunk.
knitr::opts_chunk$set(echo = TRUE)
# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
# Set seed for reproducibility
# set_seed(123)
library(tidyverse)
library(readr)
library(dplyr)
There are resources available at the Bail Project, with some infomration on cash bail here. However, there are other data sources that exist for quick downloading and analyses.
For the purposes of code development, synthetic data is used here to help model main outcomes and understand the hypotheses.
# Create pseudo data frame
bail_reform_data <- tibble(
case_id = 1:1000,
arrest_date = sample(seq(as.Date('2020-01-01'), as.Date('2022-12-31'), by="day"), 1000, replace=TRUE),
offense_type = sample(c("Misdemeanor", "Felony"), 1000, replace=TRUE, prob=c(0.7, 0.3)),
bail_amount = sample(c(0, 500, 1000, 5000, 10000, 50000), 1000, replace=TRUE, prob=c(0.3, 0.2, 0.2, 0.15, 0.1, 0.05)),
pretrial_release = sample(c("Yes", "No"), 1000, replace=TRUE, prob=c(0.6, 0.4)),
days_in_jail = ifelse(pretrial_release == "No", sample(0:90, 1000, replace=TRUE), 0),
court_appearance = sample(c("Appeared", "Failed to Appear"), 1000, replace=TRUE, prob=c(0.9, 0.1)),
rearrest_within_30days = sample(c("Yes", "No"), 1000, replace=TRUE, prob=c(0.05, 0.95)),
race = sample(c("White", "Black", "Hispanic", "Other"), 1000, replace=TRUE, prob=c(0.5, 0.3, 0.15, 0.05)),
age = sample(18:70, 1000, replace=TRUE),
gender = sample(c("Male", "Female"), 1000, replace=TRUE, prob=c(0.75, 0.25)),
reform_period = ifelse(arrest_date < as.Date('2021-07-01'), "Pre-Reform", "Post-Reform")
)
View the data.
# Display the first few rows of the dataset
head(bail_reform_data)
## # A tibble: 6 × 12
## case_id arrest_date offense_type bail_amount pretrial_release days_in_jail
## <int> <date> <chr> <dbl> <chr> <dbl>
## 1 1 2021-03-01 Misdemeanor 0 No 70
## 2 2 2020-05-27 Misdemeanor 0 Yes 0
## 3 3 2020-12-28 Felony 500 Yes 0
## 4 4 2022-01-19 Felony 0 Yes 0
## 5 5 2022-12-23 Misdemeanor 50000 Yes 0
## 6 6 2021-07-03 Felony 500 Yes 0
## # ℹ 6 more variables: court_appearance <chr>, rearrest_within_30days <chr>,
## # race <chr>, age <int>, gender <chr>, reform_period <chr>
# Summary statistics
summary(bail_reform_data)
## case_id arrest_date offense_type bail_amount
## Min. : 1.0 Min. :2020-01-01 Length:1000 Min. : 0
## 1st Qu.: 250.8 1st Qu.:2020-10-23 Class :character 1st Qu.: 0
## Median : 500.5 Median :2021-07-24 Mode :character Median : 500
## Mean : 500.5 Mean :2021-07-05 Mean : 4182
## 3rd Qu.: 750.2 3rd Qu.:2022-03-11 3rd Qu.: 5000
## Max. :1000.0 Max. :2022-12-31 Max. :50000
## pretrial_release days_in_jail court_appearance rearrest_within_30days
## Length:1000 Min. : 0.0 Length:1000 Length:1000
## Class :character 1st Qu.: 0.0 Class :character Class :character
## Mode :character Median : 0.0 Mode :character Mode :character
## Mean :19.8
## 3rd Qu.:41.0
## Max. :90.0
## race age gender reform_period
## Length:1000 Min. :18.00 Length:1000 Length:1000
## Class :character 1st Qu.:32.00 Class :character Class :character
## Mode :character Median :46.00 Mode :character Mode :character
## Mean :44.95
## 3rd Qu.:59.00
## Max. :70.00
Save the data.
write_csv(bail_reform_data, here("bail_reform_data.csv"))
Access the saved data here.