Pretrial release decisions represent a critical juncture in the criminal justice system, balancing public safety against the presumption of innocence. This project investigates the predictive power of Public Safety Assessment (PSA) like scores and case-level factors on rearrest and release decisions in Queens County, New York, from 2023 to 2024. Using comprehensive court data, we reconstructed PSA risk scores and developed local models. Our analysis reveals that while the PSA framework shows predictive value, a locally calibrated model significantly outperforms its generic application. Furthermore, we identify important disparities in prediction accuracy across racial groups. These findings highlight the necessity of localized validation and careful implementation of risk assessment tools to ensure both effectiveness and equity.
Pretrial release decisions are among the most consequential moments in the criminal justice process. Judges must determine whether to release a defendant before trial, weighing concerns for public safety against the presumption of innocence. In recent years, data-driven risk assessment tools like the Public Safety Assessment (PSA) have been introduced to guide these decisions objectively.
This project examines whether PSA scores and case level factors predict two key outcomes in Queens County between 2023 and 2024: - rearrest while on release - judicial release decisions.
The study period is particularly relevant as it follows New York’s significant bail reform measures, providing insight into how risk assessment operates in this reformed landscape.
I chose this topic not only for its pressing importance in criminal justice reform but also because of its direct alignment with my professional aspirations. As someone pursuing opportunities to become a court officer, understanding the mechanics of pretrial decision-making is highly relevant to my future work. A personal experience further motivated this research: I was recently in a car accident where the other driver fled the scene, leaving me with damages and no recourse. This incident underscored how profoundly we rely on the justice system to deliver fairness, accountability, and safety concerns that are central to pretrial risk assessment.
Pretrial risk assessment and the PSA
Actuarial pretrial tools aim to estimate a defendant’s likelihood of failing to appear and/or being rearrested pretrial. The Public Safety Assessment (PSA), developed by Arnold Ventures, uses nine criminal-history–based factors to generate two risk scores (FTA and NCA) and a violent-activity flag; it is used across hundreds of jurisdictions and comes with core implementation requirements.
Validation studies generally find that PSA scores are predictive of pretrial outcomes in multiple sites, though performance varies by locality. For example, a San Francisco validation using local data reported meaningful discrimination for FTA and NCA outcomes and discussed the importance of local calibration and policy cutoffs.
Evidence on bail reform and pretrial outcomes
New Jersey’s 2017 reform (with PSA statewide) substantially reduced the pretrial jail population while maintaining public safety and court appearance rates, according to independent evaluations (MDRC and later academic work). Recent analyses continue to find no associated increase in violent crime.
In New York, the 2019–2020 bail reforms (with subsequent amendments in 2020 and 2022) sought to limit money bail and reduce pretrial detention. Two-year follow-ups documented large shifts in release decisions, bail amounts, and racial disparities, while policy explainers clarify how “qualifying” vs. “non-qualifying” offenses structure eligibility for bail and detention today. Your Queens focus (2023–2024) lands squarely in this evolving policy period, making local, time-bounded analysis important.
Accuracy, bias, and fairness debates
Scholars have raised concerns about whether algorithmic tools outperform simple or human baselines and about potential disparate impacts. Dressel & Farid (2018) found a widely used tool (COMPAS) performed comparably to lay predictions, sparking broader debates about transparency and construct validity; while COMPAS ≠ PSA, the critique motivates careful, local validation and subgroup analyses. Theoretical work (Kleinberg, Mullainathan & Raghavan) proves that commonly desired fairness criteria cannot be simultaneously satisfied, underscoring unavoidable trade-offs policymakers face when converting scores into detention or supervision rules.
Gaps this study addresses
Most PSA research is jurisdiction-specific and often pre-2022. There is limited public analysis of PSA-like scores specifically for Queens in 2023–2024 amid New York’s post-amendment landscape. By (a) reconstructing NCA/NVCA measures from case-level data, (b) benchmarking them against realized rearrest and release outcomes, and (c) comparing simple PSA thresholds with logistic and random-forest models, this project adds a timely, local validation and explores model transparency vs. accuracy trade-offs relevant to current NYC practice.
This project addresses three primary research questions:
1- Predictive Accuracy:
Do reconstructed PSA risk scores (NCA and NVCA) predict general and violent rearrest in Queens County?
2- Model Comparison:
Does a locally trained statistical model outperform the standard PSA scoring thresholds?
3- Fairness Assessment:
Are there significant disparities in prediction accuracy across racial groups?
This project uses the NYC Pretrial Release dataset (2023–2024). The dataset includes individual level court case information, PSA risk scores, criminal history variables, release decisions, and rearrest outcomes. In this project, I focus on Queens County for three reasons:
Variables analyzed include:
The dataset comes from publicly available court and pretrial release records for NY, covering 2023–2024. Data can be downloaded from here NYC Official Pre-Trial Release Data
Primary Data Filtering and Cleaning
# Load required libraries
library(tidyverse)
library(knitr)
library(naniar)
library(caret)
library(randomForest)
library(pROC)
library(snakecase)
library(logistf)
library(gt)
library(patchwork)# Load data
NYC <- read.csv("NYS for Web 2024 copy_try copy.csv", na.strings = c("NULL", " ", "\\s+", ""))| Internal_Case_ID | Gender | Race | Ethnicity | Age_at_Crime | Age_at_Arrest | Court_Name | Court_ORI | County_Name | District | Region | Court_Type | Judge_Name | Offense_Month | Offense.Year | Arrest_Month | Arrest.Year | Arrest_Type | Top_Arrest_Law | Top_Arrest_Article_Section | Top_Arrest_Attempt_Indicator | Top_Charge_at_Arrest | Top_Charge_Severity_at_Arrest | Top_Charge_Weight_at_Arrest | Top_Charge_at_Arrest_Violent_Felony_Ind | Case_Type | First_Arraign_Date | Top_Arraign_Law | Top_Arraign_Article_Section | Top_Arraign_Attempt_Indicator | Top_Charge_at_Arraign | Top_Severity_at_Arraign | Top_Charge_Weight_at_Arraign | Top_Charge_at_Arraign_Violent_Felony_Ind | Hate_Crime_Ind | Arraign.Charge.Category | Representation_Type | App_Count_Arraign_to_Dispo_Released | App_Count_Arraign_to_Dispo_Detained | App_Count_Arraign_to_Dispo_Total | Def_Attended_Sched_Pretrials | Remanded_to_Jail_at_Arraign | ROR_at_Arraign | Bail_Set_and_Posted_at_Arraign | Bail_Set_and_Not_Posted_at_Arraign | NMR_at_Arraign | Release.Decision.at.Arraign | Representation_at_Securing_Order | Pretrial_Supervision_at_Arraign | Contact_Pretrial_Service_Agency | Electronic_Monitoring | Travel_Restrictions | Passport_Surrender | No_Firearms_or_Weapons | Maintain_Employment | Maintain_Housing | Maintain_School | Placement_in_Mandatory_Program | Removal_to_Hospital | Obey_Order_of_Protection | Obey_Court_Conditions.Family_Offense | Other_NMR | Order_of_Protection | First_Bail_Set_Cash | First_Bail_Set_Credit | First_Insurance_Company_Bail_Bond | First_Secured_Surety_Bond | First_Secured_App_Bond | First_Unsecured_Surety_Bond | First_Unsecured_App_Bond | First_Partially_Secured_Surety_Bond | Partially_Secured_Surety_Bond_Perc | First_Partially_Secured_App_Bond | Partially_Secured_App_Bond_Perc | Bail_Made_Indicator | NotRequestedFlag | RemandRequestedFlag | NMRRequestedFlag | RORRequestedFlag | UnspecifiedTypeRequestedAmount | CashRequestedAmount | CreditRequestedAmount | InsuranceCompanyRequestedAmount | SecuredSuretyRequestedAmount | SecuredAppRequestedAmount | UnsecuredSuretyRequestedAmount | UnsecuredAppRequestedAmount | PartiallySecuredSuretyRequestedAmount | PartiallySecuredAppRequestedAmount | UnspecifiedBondTypeRequestedAmount | Warrant_Ordered_btw_Arraign_and_Dispo | DAT_WO_WS_Prior_to_Arraign | First_Bench_Warrant_Month | First_Bench_Warrant_Year | Non_Stayed_WO | Num_of_Stayed_WO | Num_of_ROW | Docket_Status | Disposition_Type | Disposition_Detail | Dismissal_Reason | Disposition_Date | Most_Severe_Sentence | Top_Conviction_Law | Top_Conviction_Article_Section | Top_Conviction_Attempt_Indicator | Top_Charge_at_Conviction | Top_Charge_Severity_at_Conviction | Top_Charge_Weight_at_Conviction | Top_Charge_at_Conviction_Violent_Felony_Ind | Days_Arraign_Remand_First_Released | Known_Days_in_Custody | Days_Arraign_Bail_Set_to_First_Posted | Days_Arraign_Bail_Set_to_First_Release | Days_Arraign_to_Dispo | MinImpTopConvDays | MaxImpTopConvDays | UCMSLiveDate | prior_vfo_cnt | prior_nonvfo_cnt | prior_misd_cnt | pend_vfo | pend_nonvfo | pend_misd | supervision | rearrest | rearrest_date | rearrest_firearm | rearrest_date_firearm | arr_cycle_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4.887186e+27 | Male | White | Hispanic | 21 | 21 | New York Criminal Court | NY030033J | New York | District 1 | NYC | Local | Tatham, Beverly S. | Jan | 2024 | Jan | 2024 | Custody | PL | 215.51 | NA | PL 215.51 BII EF Crim Contempt-1st:Follows | Felony | EF | N | Docket | 1/1/2024 | PL | 215.51 | NA | PL 215.51 BII EF Crim Contempt-1st:Follows | Felony | EF | N | N | Criminal Contempt | 18B (Assigned Counsel) | 1 | 0 | 1 | 1 | Y | N | N | N | N | Remanded | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Family Offense | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | N | N | NA | NA | NA | NA | 0 | Disposed | GJ/Trans | Transfer to Superior Court | NA | 1/1/2024 | NA | NA | NA | NA | NA | NA | NA | N | 1 | 1 | NA | NA | 1 | NA | NA | NA | 0 | 0 | 0 | 0 | 0 | 1 | 0 | No Arrest | NA | 0 | NA | 1370269 |
| 2.813440e+27 | Male | Black | Non Hispanic | 19 | 19 | Kings Criminal Court | NY023033J | Kings | District 2 | NYC | Local | Tubridy, Jennifer A. | Apr | 2024 | Jun | 2024 | Custody | PL | 215.5 | NA | PL 215.50 03 AM Crim Contempt-2nd:Disobey Crt | Misdemeanor | AM | N | Docket | 6/1/2024 | PL | 215.5 | NA | PL 215.50 03 AM Crim Contempt-2nd:Disobey Crt | Misdemeanor | AM | N | N | Criminal Contempt | Legal Aid | 3 | 0 | 3 | 2 | N | Y | N | N | N | ROR | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Non-Family Offense | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | Y | N | N | N | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | N | N | NA | NA | NA | 1 | 0 | Pending | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | N | NA | 0 | NA | NA | NA | NA | NA | NA | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Misdemeanor | 7/1/2024 | 0 | NA | 1252047 |
| 3.546526e+27 | Unknown | Unknown | Unknown | 0 | 0 | Nassau District Court | NY029013J | Nassau | District 10N | ONYC | Local | Mccormack, Marie F. | Oct | 2023 | Oct | 2023 | DAT | NC-FPO | 13.11 | NA | NC-FPO 13.11 UM Failure to Comply | Misdemeanor | UM | N | Docket | 1/1/2024 | NC-FPO | 13.11 | NA | NC-FPO 13.11 UM Failure to Comply | Misdemeanor | UM | N | N | Other | Retained Attorney | 1 | 0 | 1 | NA | N | N | N | N | N | Disposed at arraign | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | N | N | NA | NA | NA | NA | 0 | Disposed | Plea | Pled Guilty | NA | 1/1/2024 | Fine | NC-FPO | 1.7 | NA | NC-FPO 1.7 V Failing To Comply | Violation | V | N | NA | 0 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| 2.159048e+27 | Male | Unknown | Unknown | 23 | 23 | Mount Vernon City Court | NY059031J | Westchester | District 9 | ONYC | Local | Johnson, Nichelle | Aug | 2022 | Aug | 2022 | DAT | VTL | 511 | NA | VTL 0511 02A2 UM Agg Unlicensed Operation-2nd | Misdemeanor | UM | N | Docket | 5/1/2024 | VTL | 511 | NA | VTL 0511 02A2 UM Agg Unlicensed Operation-2nd | Misdemeanor | UM | N | N | Unlicensed Operation | 18B (Assigned Counsel) | 1 | 0 | 1 | 1 | N | N | N | N | N | Disposed at arraign | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | N | N | NA | NA | NA | NA | 0 | Disposed | Plea | Pled Guilty | NA | 5/1/2024 | Surcharge | VTL | 509 | NA | VTL 0509 01 I MV License Viol:No License | Infraction | I | N | NA | 0 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| 4.715249e+27 | Male | White | Hispanic | 40 | 40 | Queens Criminal Court | NY040033J | Queens | District 11 | NYC | Local | Battisti, Anthony M. | Jan | 2024 | Jan | 2024 | Custody | PL | 155.25 | NA | PL 155.25 AM Petit Larceny | Misdemeanor | AM | N | Docket | 1/1/2024 | PL | 155.25 | NA | PL 155.25 AM Petit Larceny | Misdemeanor | AM | N | N | Larceny | Legal Aid | 4 | 0 | 4 | 2 | N | Y | N | N | N | ROR | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Non-Family Offense | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | Y | N | Apr | 2024 | 1 | NA | 1 | Disposed | Plea | Pled Guilty | NA | 8/1/2024 | Conditional Discharge | PL | 240.2 | NA | PL 240.20 V Disorderly Conduct | Violation | V | N | NA | 0 | NA | NA | 189 | NA | NA | NA | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Misdemeanor | 2/1/2024 | 0 | NA | 1419467 |
# Standardize column names
names(NYC) <- snakecase::to_snake_case(names(NYC))
# Check structure before filtering
# glimpse(NYC)
# Filter for Queens County 2023-2024 with key criteria
queens_data <- NYC |>
filter(county_name == "Queens",
arrest_year %in% c(2023, 2024),
arrest_type == "Custody",
docket_status == "Disposed",
!release_decision_at_arraign %in% c("Disposed at arraign", "Unknown", "Remanded"),
rearrest != "Unknown") |>
mutate(across(c(prior_vfo_cnt, prior_nonvfo_cnt, prior_misd_cnt, pend_vfo, pend_nonvfo, pend_misd),
~ as.numeric(str_replace(., "[+|>]", "")))) |>
select(-arr_cycle_id) |>
drop_na(prior_vfo_cnt, prior_nonvfo_cnt, prior_misd_cnt, pend_vfo, pend_nonvfo, pend_misd)
# Remove systematically missing variables
variables_to_remove <- c(
"first_secured_surety_bond", "first_secured_app_bond",
"first_unsecured_app_bond", "first_partially_secured_app_bond",
"partially_secured_app_bond_perc", "unsecured_surety_requested_amount",
"unsecured_app_requested_amount", "days_arraign_remand_first_released",
"secured_app_requested_amount", "unspecified_bond_type_requested_amount"
)
queens_clean <- queens_data |>
select(-any_of(variables_to_remove))
# Handle missing data and create analysis variables
queens_clean <- queens_clean |>
mutate(
across(where(is.numeric) & !contains("id") & !contains("date"),
~ifelse(is.na(.), median(., na.rm = TRUE), .)),
across(where(is.character),
~ifelse(is.na(.), "Unknown", .)),
rearrest_binary = case_when(
rearrest %in% c("No Arrest") ~ "No",
rearrest %in% c("Non-violent felony", "Violent felony", "Yes") ~ "Yes",
TRUE ~ "Unknown"
),
rearrest_type = case_when(
rearrest == "No Arrest" ~ "None",
rearrest == "Non-violent felony" ~ "Non-violent",
rearrest == "Violent felony" ~ "Violent",
rearrest == "Yes" ~ "Unknown Type",
TRUE ~ "Unknown"
)
) |>
filter(age_at_arrest >= 16, age_at_arrest <= 100,
rearrest_binary != "Unknown")
# Remove impossible age values and focus on adults (16+)
queens_psa_ready <- queens_clean |>
filter(age_at_arrest >= 16, age_at_arrest <= 100)# Save the cleaned dataset
write.csv(queens_clean, "queens_2023_2024.csv", row.names = FALSE)
kable(head(queens_clean, 5))| internal_case_id | gender | race | ethnicity | age_at_crime | age_at_arrest | court_name | court_ori | county_name | district | region | court_type | judge_name | offense_month | offense_year | arrest_month | arrest_year | arrest_type | top_arrest_law | top_arrest_article_section | top_arrest_attempt_indicator | top_charge_at_arrest | top_charge_severity_at_arrest | top_charge_weight_at_arrest | top_charge_at_arrest_violent_felony_ind | case_type | first_arraign_date | top_arraign_law | top_arraign_article_section | top_arraign_attempt_indicator | top_charge_at_arraign | top_severity_at_arraign | top_charge_weight_at_arraign | top_charge_at_arraign_violent_felony_ind | hate_crime_ind | arraign_charge_category | representation_type | app_count_arraign_to_dispo_released | app_count_arraign_to_dispo_detained | app_count_arraign_to_dispo_total | def_attended_sched_pretrials | remanded_to_jail_at_arraign | ror_at_arraign | bail_set_and_posted_at_arraign | bail_set_and_not_posted_at_arraign | nmr_at_arraign | release_decision_at_arraign | representation_at_securing_order | pretrial_supervision_at_arraign | contact_pretrial_service_agency | electronic_monitoring | travel_restrictions | passport_surrender | no_firearms_or_weapons | maintain_employment | maintain_housing | maintain_school | placement_in_mandatory_program | removal_to_hospital | obey_order_of_protection | obey_court_conditions_family_offense | other_nmr | order_of_protection | first_bail_set_cash | first_bail_set_credit | first_insurance_company_bail_bond | first_unsecured_surety_bond | first_partially_secured_surety_bond | partially_secured_surety_bond_perc | bail_made_indicator | not_requested_flag | remand_requested_flag | nmr_requested_flag | ror_requested_flag | unspecified_type_requested_amount | cash_requested_amount | credit_requested_amount | insurance_company_requested_amount | secured_surety_requested_amount | partially_secured_surety_requested_amount | partially_secured_app_requested_amount | warrant_ordered_btw_arraign_and_dispo | dat_wo_ws_prior_to_arraign | first_bench_warrant_month | first_bench_warrant_year | non_stayed_wo | num_of_stayed_wo | num_of_row | docket_status | disposition_type | disposition_detail | dismissal_reason | disposition_date | most_severe_sentence | top_conviction_law | top_conviction_article_section | top_conviction_attempt_indicator | top_charge_at_conviction | top_charge_severity_at_conviction | top_charge_weight_at_conviction | top_charge_at_conviction_violent_felony_ind | known_days_in_custody | days_arraign_bail_set_to_first_posted | days_arraign_bail_set_to_first_release | days_arraign_to_dispo | min_imp_top_conv_days | max_imp_top_conv_days | ucms_live_date | prior_vfo_cnt | prior_nonvfo_cnt | prior_misd_cnt | pend_vfo | pend_nonvfo | pend_misd | supervision | rearrest | rearrest_date | rearrest_firearm | rearrest_date_firearm | rearrest_binary | rearrest_type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.557388e+27 | Female | Black | Non Hispanic | 39 | 39 | Queens Criminal Court | NY040033J | Queens | District 11 | NYC | Local | Daniels, Edward F. | May | 2024 | May | 2024 | Custody | PL | 120.05 | Unknown | PL 120.05 12 DF Aslt-2: Injure Vic 65 Or Older | Felony | DF | Y | Docket | 5/1/2024 | PL | 120.05 | Unknown | PL 120.05 12 DF Aslt-2: Injure Vic 65 Or Older | Felony | DF | Y | N | Assault | Public Defender | 2 | 0 | 2 | 2 | N | N | N | Y | N | Bail-set | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Family Offense | 5000 | 10000 | 5000 | 1 | 5000 | 10 | Unknown | N | N | N | N | 5000 | 20000 | 60000 | 45000 | 42500 | 60000 | 45000 | N | N | Unknown | 2024 | 1 | 1 | 0 | Disposed | Plea | Pled Guilty | Unknown | 5/1/2024 | Conditional Discharge | PL | 240.2 | Unknown | PL 240.20 V Disorderly Conduct | Violation | V | N | 1 | 2 | 3 | 3 | 90 | Unknown | Unknown | 0 | 0 | 0 | 1 | 0 | 0 | 0 | No Arrest | Unknown | 0 | Unknown | No | None |
| 3.352332e+27 | Male | Black | Non Hispanic | 37 | 37 | Queens Criminal Court | NY040033J | Queens | District 11 | NYC | Local | Gershuny, Jeffrey A. | Feb | 2024 | Feb | 2024 | Custody | PL | 120 | Unknown | PL 120.00 01 AM Aslt 3-W/Int Cause Phys Injury | Misdemeanor | AM | N | Docket | 2/1/2024 | PL | 120 | Unknown | PL 120.00 01 AM Aslt 3-W/Int Cause Phys Injury | Misdemeanor | AM | N | N | Assault | Public Defender | 2 | 0 | 2 | 2 | N | Y | N | N | N | ROR | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Non-Family Offense | 10000 | 10000 | 25000 | 1 | 30000 | 10 | Unknown | N | N | N | Y | 20000 | 20000 | 60000 | 45000 | 42500 | 60000 | 45000 | N | N | Unknown | 2024 | 1 | 1 | 0 | Disposed | Plea | Pled Guilty | Unknown | 3/1/2024 | Conditional Discharge | PL | 240.2 | Unknown | PL 240.20 V Disorderly Conduct | Violation | V | N | 0 | 2 | 6 | 24 | 90 | Unknown | Unknown | 0 | 0 | 0 | 0 | 0 | 0 | 0 | No Arrest | Unknown | 0 | Unknown | No | None |
| 3.301952e+27 | Male | White | Hispanic | 31 | 32 | Queens Criminal Court | NY040033J | Queens | District 11 | NYC | Local | Gershuny, Jeffrey A. | Apr | 2023 | Jan | 2024 | Custody | PL | 120 | Unknown | PL 120.00 01 AM Aslt 3-W/Int Cause Phys Injury | Misdemeanor | AM | N | Docket | 1/1/2024 | PL | 120 | Unknown | PL 120.00 01 AM Aslt 3-W/Int Cause Phys Injury | Misdemeanor | AM | N | N | Assault | Retained Attorney | 2 | 0 | 2 | 1 | N | Y | N | N | N | ROR | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Non-Family Offense | 10000 | 10000 | 25000 | 1 | 30000 | 10 | Unknown | Unknown | Unknown | Unknown | Unknown | 20000 | 20000 | 60000 | 45000 | 42500 | 60000 | 45000 | N | N | Unknown | 2024 | 1 | 1 | 0 | Disposed | Dismissed | Dismissed | Uncooperative Witness (CPL 170.30 (1)(f)) | 4/1/2024 | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | N | 0 | 2 | 6 | 96 | 90 | Unknown | Unknown | 0 | 0 | 2 | 1 | 0 | 0 | 0 | No Arrest | Unknown | 0 | Unknown | No | None |
| 3.339975e+27 | Male | Black | Non Hispanic | 26 | 26 | Queens Criminal Court | NY040033J | Queens | District 11 | NYC | Local | Gonzalez, Maria T. | Mar | 2024 | Apr | 2024 | Custody | PL | 120 | Unknown | PL 120.00 01 AM Aslt 3-W/Int Cause Phys Injury | Misdemeanor | AM | N | Docket | 4/1/2024 | PL | 120 | Unknown | PL 120.00 01 AM Aslt 3-W/Int Cause Phys Injury | Misdemeanor | AM | N | N | Assault | Public Defender | 4 | 0 | 4 | 3 | N | N | N | Y | N | Bail-set | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Family Offense | 3000 | 10000 | 9000 | 1 | 9000 | 10 | Bond | N | N | N | N | 20000 | 3000 | 60000 | 45000 | 42500 | 90000 | 45000 | N | N | Unknown | 2024 | 1 | 1 | 0 | Disposed | Plea | Pled Guilty | Unknown | 5/1/2024 | Conditional Discharge | PL | 240.2 | Unknown | PL 240.20 V Disorderly Conduct | Violation | V | N | 6 | 6 | 6 | 33 | 90 | Unknown | Unknown | 1 | 0 | 0 | 0 | 0 | 1 | 0 | No Arrest | Unknown | 0 | Unknown | No | None |
| 1.854242e+27 | Male | White | Hispanic | 37 | 37 | Queens Criminal Court | NY040033J | Queens | District 11 | NYC | Local | Gonzalez, Maria T. | Jun | 2024 | Jun | 2024 | Custody | PL | 160.05 | Attempt | PL 110-160.05 EF Robbery-3rd | Felony | EF | N | Docket | 6/1/2024 | PL | 160.05 | Attempt | PL 110-160.05 EF Robbery-3rd | Felony | EF | N | N | Robbery | Legal Aid | 1 | 2 | 3 | 3 | N | N | N | Y | N | Bail-set | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Family Offense | 5000 | 10000 | 10000 | 1 | 10000 | 10 | Unknown | N | N | N | N | 20000 | 75000 | 60000 | 45000 | 42500 | 225000 | 45000 | N | N | Unknown | 2024 | 1 | 1 | 0 | Disposed | Plea | Pled Guilty | Unknown | 8/1/2024 | Imprisonment-Not Time Served | PL | 240.26 | Unknown | PL 240.26 01 V Harassment-2nd:Physical Cntact | Violation | V | N | 49 | 2 | 52 | 52 | 15 | 15 | Unknown | 1 | 1 | 4 | 0 | 1 | 0 | 1 | No Arrest | Unknown | 0 | Unknown | No | None |
# Calculate NCA Score (New Criminal Activity)
nca_scored <- queens_psa_ready |>
mutate(
pend_charge = ifelse(pend_vfo + pend_nonvfo + pend_misd > 0, 1, 0),
age_at_arrest_score = ifelse(age_at_arrest <= 22, 2, 0),
pending_charge_score = ifelse(pend_charge == 1, 3, 0),
prior_misd_score = ifelse(prior_misd_cnt > 0, 1, 0),
prior_felony_score = ifelse(prior_nonvfo_cnt > 0 | prior_vfo_cnt > 0, 1, 0),
prior_violent_score = case_when(
prior_vfo_cnt %in% c(1, 2) ~ 1,
prior_vfo_cnt >= 3 ~ 2,
TRUE ~ 0
),
nca_score_raw = age_at_arrest_score + pending_charge_score + prior_misd_score + prior_felony_score + prior_violent_score,
nca_score = case_when(
nca_score_raw %in% c(0, 1) ~ 1,
nca_score_raw %in% c(5, 6) ~ 5,
nca_score_raw %in% c(7, 8) ~ 6,
TRUE ~ nca_score_raw
)
)
# Calculate NVCA Score (New Violent Criminal Activity)
nvca_scored <- queens_psa_ready |>
mutate(
pend_charge = ifelse(pend_vfo + pend_nonvfo + pend_misd > 0, 1, 0),
current_violent_offense_score = ifelse(arraign_charge_category == "Violent", 2, 0),
current_violent_20_under = ifelse(arraign_charge_category == "Violent" & age_at_arrest <= 20, 1, 0),
pending_charge_score = ifelse(pend_charge == 1, 1, 0),
prior_misd_or_felony_score = ifelse(prior_misd_cnt > 0 | prior_nonvfo_cnt > 0 | prior_vfo_cnt > 0, 1, 0),
prior_violent_score = case_when(
prior_vfo_cnt %in% c(1, 2) ~ 1,
prior_vfo_cnt >= 3 ~ 2,
TRUE ~ 0
),
nvca_score_raw = current_violent_offense_score + current_violent_20_under + pending_charge_score + prior_misd_or_felony_score + prior_violent_score,
nvca_score = case_when(
nvca_score_raw %in% c(0, 1) ~ 1,
TRUE ~ nvca_score_raw
)
)
# Add scores to main dataset
queens_psa_ready$nca_score <- nca_scored$nca_score
queens_psa_ready$nvca_score <- nvca_scored$nvca_score
# Save scored data
write.csv(queens_psa_ready, "nca_scored.csv", row.names = FALSE)
write.csv(nvca_scored, "nvca_scored.csv", row.names = FALSE) # Create modeling dataframe with proper variables
model_data <- queens_psa_ready |>
mutate(
current_violent = ifelse(arraign_charge_category %in% c("Assault", "Strangulation", "Rape", "Homicide Related", "Robbery"), 1, 0),
current_property = ifelse(arraign_charge_category %in% c("Larceny", "Burglary"), 1, 0),
rearrest_binary = ifelse(rearrest_binary == "Yes", 1, 0),
rearrest_violent_binary = ifelse(rearrest_type == "Violent", 1, 0),
current_misd = ifelse(top_severity_at_arraign == "Misdemeanor", 1, 0),
gender_male = ifelse(gender == "Male", 1, 0),
race_black = ifelse(race == "Black", 1, 0),
race_white = ifelse(race == "White", 1, 0),
race_hispanic = ifelse(ethnicity == "Hispanic", 1, 0),
released_ror = ifelse(release_decision_at_arraign == "ROR", 1, 0),
released_nmr = ifelse(release_decision_at_arraign == "Nonmonetary release", 1, 0),
detained = ifelse(release_decision_at_arraign == "Bail-set", 1, 0)
) |>
select(gender_male, race_black, race_white, race_hispanic, age_at_arrest,
current_violent, current_misd, current_property,
prior_vfo_cnt, prior_nonvfo_cnt, prior_misd_cnt,
pend_vfo, pend_nonvfo, pend_misd,
nca_score, nvca_score,
released_ror, released_nmr, detained,
rearrest_binary, rearrest_violent_binary)
# Remove any remaining missing values
model_data <- model_data |> drop_na()
# Split data into training and testing sets
set.seed(613)
train_index <- createDataPartition(model_data$rearrest_binary, p = 0.8, list = FALSE)
train <- model_data[train_index, ]
test <- model_data[-train_index, ]
# Add continuous NVCA scores
train$nvca_continuous <- nvca_scored$nvca_score[train_index]
test$nvca_continuous <- nvca_scored$nvca_score[-train_index]Descriptive Statistics
# Create descriptive tables
desc_stats <- list()
# Sample characteristics
desc_stats$sample_size <- nrow(queens_psa_ready)
desc_stats$rearrest_rate <- mean(queens_psa_ready$rearrest_binary == "Yes") * 100
desc_stats$violent_rearrest_rate <- mean(queens_psa_ready$rearrest_type == "Violent", na.rm = TRUE) * 100
# Release decisions
release_dist <- queens_psa_ready |>
count(release_decision_at_arraign) |>
mutate(Percentage = round(n / sum(n) * 100, 1))
# Demographic characteristics
gender_dist <- queens_psa_ready |>
count(gender) |>
mutate(Percentage = round(n / sum(n) * 100, 1))
race_dist <- queens_psa_ready |>
count(race) |>
mutate(Percentage = round(n / sum(n) * 100, 1))
# Age distribution
age_stats <- queens_psa_ready |>
summarize(
Mean = round(mean(age_at_arrest), 1),
Median = median(age_at_arrest),
SD = round(sd(age_at_arrest), 1),
Min = min(age_at_arrest),
Max = max(age_at_arrest)
)
# Criminal history
criminal_history <- queens_psa_ready |>
summarize(
`Prior Violent Felonies` = mean(prior_vfo_cnt),
`Prior Non-Violent Felonies` = mean(prior_nonvfo_cnt),
`Prior Misdemeanors` = mean(prior_misd_cnt),
`Any Prior Record (%)` = round(mean(prior_vfo_cnt > 0 | prior_nonvfo_cnt > 0 | prior_misd_cnt > 0) * 100, 1)
)
# Display key statistics
cat("## Sample Characteristics\n")## ## Sample Characteristics
## - Total cases: 9027
## - Rearrest rate: 8.8 %
## - Violent rearrest rate: 1.9 %
## ## Release Decisions
| Release Decision | Count | Percentage |
|---|---|---|
| Bail-set | 1937 | 21.5 |
| Nonmonetary release | 1744 | 19.3 |
| ROR | 5346 | 59.2 |
##
## ## Demographic Characteristics
| Gender | Count | Percentage |
|---|---|---|
| Female | 1718 | 19 |
| Male | 7309 | 81 |
| Race | Count | Percentage |
|---|---|---|
| American Indian/Alaskan Native | 52 | 0.6 |
| Asian/Pacific Islander | 1336 | 14.8 |
| Black | 3671 | 40.7 |
| Unknown | 64 | 0.7 |
| White | 3904 | 43.2 |
##
## ## Age Distribution
| Mean | Median | SD | Min | Max |
|---|---|---|---|---|
| 35.5 | 33 | 12.1 | 16 | 88 |
##
## ## Criminal History
| Prior Violent Felonies | Prior Non-Violent Felonies | Prior Misdemeanors | Any Prior Record (%) |
|---|---|---|---|
| 0.1488867 | 0.2690816 | 1.015066 | 31.2 |
Correlation Analysis
## ### Correlation with General Rearrest (rearrest_binary)
| Correlation Coefficient | |
|---|---|
| rearrest_binary | 1.000 |
| nca_score | 0.125 |
| pend_misd | 0.101 |
| pend_nonvfo | 0.095 |
| nvca_score | 0.059 |
| prior_misd_cnt | 0.054 |
| prior_nonvfo_cnt | 0.047 |
| pend_vfo | 0.046 |
| prior_vfo_cnt | 0.022 |
| age_at_arrest | -0.034 |
PSA Score Performance
library(gridExtra)
# NCA Score performance
nca_performance <- nca_scored |>
group_by(nca_score) |>
summarize(
Cases = n(),
Rearrests = sum(rearrest_binary == "Yes"),
Rearrest_Rate = round(Rearrests / Cases * 100, 1)
)
# NVCA Score performance
nvca_performance <- nvca_scored |>
group_by(nvca_score) |>
summarize(
Cases = n(),
Violent_Rearrests = sum(rearrest_type == "Violent", na.rm = TRUE),
Violent_Rearrest_Rate = round(Violent_Rearrests / Cases * 100, 2)
)
# Plot NCA performance
nca_plot <- ggplot(nca_performance, aes(x = factor(nca_score), y = Rearrest_Rate)) +
geom_bar(stat = "identity", fill = "steelblue", alpha = 0.7) +
geom_text(aes(label = Rearrest_Rate), vjust = -0.5) +
labs(title = "Rearrest Rate by NCA Score",
x = "NCA Score", y = "Rearrest Rate (%)") +
theme_minimal()
# Plot NVCA performance
nvca_plot <- ggplot(nvca_performance, aes(x = factor(nvca_score), y = Violent_Rearrest_Rate)) +
geom_bar(stat = "identity", fill = "darkred", alpha = 0.7) +
geom_text(aes(label = Violent_Rearrest_Rate), vjust = -0.5) +
labs(title = "Violent Rearrest Rate by NVCA Score",
x = "NVCA Score", y = "Violent Rearrest Rate (%)") +
theme_minimal()
# Side by side
grid.arrange(nca_plot, nvca_plot, ncol = 2)## ### NCA Score Performance
| NCA Score | Cases | Rearrests | Rearrest Rate (%) |
|---|---|---|---|
| 1 | 4803 | 268 | 5.6 |
| 2 | 1185 | 95 | 8.0 |
| 3 | 1359 | 198 | 14.6 |
| 4 | 485 | 74 | 15.3 |
| 5 | 1162 | 151 | 13.0 |
| 6 | 33 | 8 | 24.2 |
##
## ### NVCA Score Performance
kable(nvca_performance, col.names = c("NVCA Score", "Cases", "Violent Rearrests", "Violent Rearrest Rate (%)"))| NVCA Score | Cases | Violent Rearrests | Violent Rearrest Rate (%) |
|---|---|---|---|
| 1 | 7102 | 115 | 1.62 |
| 2 | 1398 | 46 | 3.29 |
| 3 | 527 | 12 | 2.28 |
# PSA threshold approach (standard approach)
test_psa <- test |>
mutate(psa_high_risk = ifelse(nca_score >= 4, 1, 0))
roc_psa <- roc(test$rearrest_binary, test_psa$psa_high_risk)
auc_psa <- auc(roc_psa)
# Logistic regression model with class weighting
logit_model <- glm(rearrest_binary ~ current_violent + current_misd + current_property + nca_score,
data = train, family = "binomial",
weights = ifelse(train$rearrest_binary == 1, 6, 1))
prob_logit <- predict(logit_model, test, type = "response")
roc_logit <- roc(test$rearrest_binary, prob_logit)
auc_logit <- auc(roc_logit)
# Find optimal threshold for logistic model
optimal_threshold <- coords(roc_logit, "best", ret = "threshold", best.method = "youden")$threshold
pred_logit <- ifelse(prob_logit > optimal_threshold, 1, 0)
# Random Forest model with balanced sampling
set.seed(613)
rf_model <- randomForest(factor(rearrest_binary) ~ current_violent + current_misd + current_property + nca_score,
data = train,
strata = factor(train$rearrest_binary),
sampsize = rep(min(table(train$rearrest_binary)), 2),
ntree = 300)
prob_rf <- predict(rf_model, test, type = "prob")[, "1"]
roc_rf <- roc(test$rearrest_binary, prob_rf)
auc_rf <- auc(roc_rf)
# Compare all approaches
roc_comparison <- tibble(
Model = c("PSA Threshold", "Logistic Regression", "Random Forest"),
AUC = c(auc_psa, auc_logit, auc_rf)
)
# Create ROC curve plot
roc_data <- rbind(
data.frame(Sensitivity = roc_psa$sensitivities,
Specificity = roc_psa$specificities,
Model = "PSA Threshold (AUC = 0.59)"),
data.frame(Sensitivity = roc_logit$sensitivities,
Specificity = roc_logit$specificities,
Model = "Logistic Regression (AUC = 0.71)"),
data.frame(Sensitivity = roc_rf$sensitivities,
Specificity = roc_rf$specificities,
Model = "Random Forest (AUC = 0.69)")
)
roc_plot <- ggplot(roc_data, aes(x = 1 - Specificity, y = Sensitivity, color = Model)) +
geom_line(linewidth = 1) +
geom_abline(linetype = "dashed", color = "gray") +
labs(title = "ROC Curve Comparison", x = "False Positive Rate", y = "True Positive Rate") +
theme_minimal() +
theme(legend.position = "bottom")
# Display results
roc_plot## ### Model Performance Comparison
| Model | AUC |
|---|---|
| PSA Threshold | 0.560 |
| Logistic Regression | 0.633 |
| Random Forest | 0.640 |
# Statistical comparison
roc_test <- roc.test(roc_psa, roc_logit)
cat("\nStatistical comparison between PSA threshold and logistic regression: p =", format.pval(roc_test$p.value, digits = 3), "\n")##
## Statistical comparison between PSA threshold and logistic regression: p = 3.85e-05
# Add predictions to test data
test$predicted_risk <- predict(logit_model, test, type = "response")
# Evaluate fairness across racial groups
fairness_metrics <- test |>
group_by(race_black) |>
summarize(
N = n(),
Actual_Rearrest_Rate = mean(rearrest_binary) * 100,
Predicted_Risk = mean(predicted_risk) * 100,
Calibration_Error = abs(Actual_Rearrest_Rate - Predicted_Risk),
TPR = sum(rearrest_binary == 1 & predicted_risk > optimal_threshold) / sum(rearrest_binary == 1),
FPR = sum(rearrest_binary == 0 & predicted_risk > optimal_threshold) / sum(rearrest_binary == 0),
FNR = sum(rearrest_binary == 1 & predicted_risk <= optimal_threshold) / sum(rearrest_binary == 1),
PPV = sum(rearrest_binary == 1 & predicted_risk > optimal_threshold) / sum(predicted_risk > optimal_threshold)
)
# Calculate disparity ratios
disparity_ratios <- fairness_metrics |>
summarize(
FPR_Ratio = max(FPR) / min(FPR),
TPR_Ratio = max(TPR) / min(TPR),
PPV_Ratio = max(PPV) / min(PPV)
)
# Create visualization
fairness_plot <- fairness_metrics |>
select(race_black, Actual_Rearrest_Rate, Predicted_Risk) |>
pivot_longer(cols = -race_black, names_to = "Metric", values_to = "Value") |>
mutate(Metric = ifelse(Metric == "Actual_Rearrest_Rate", "Actual Rearrest Rate", "Predicted Risk"),
Race = ifelse(race_black == 1, "Black", "Non-Black")) |>
ggplot(aes(x = Race, y = Value, fill = Metric)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Actual vs Predicted Rearrest Rates by Race",
y = "Percentage", fill = "") +
theme_minimal() +
theme(legend.position = "bottom")
# Display results
fairness_plot## ### Fairness Metrics by Race
kable(fairness_metrics, col.names = c("Black", "N", "Actual Rearrest Rate", "Predicted Risk",
"Calibration Error", "TPR", "FPR", "FNR", "PPV"), digits = 3)| Black | N | Actual Rearrest Rate | Predicted Risk | Calibration Error | TPR | FPR | FNR | PPV |
|---|---|---|---|---|---|---|---|---|
| 0 | 1077 | 7.985 | 34.778 | 26.793 | 0.581 | 0.351 | 0.419 | 0.126 |
| 1 | 728 | 9.341 | 36.734 | 27.394 | 0.662 | 0.442 | 0.338 | 0.134 |
##
## ### Disparity Ratios
| FPR Ratio | TPR Ratio | PPV Ratio |
|---|---|---|
| 1.26 | 1.138 | 1.063 |
This study set out to evaluate the predictive performance of PSA-style scores in Queens County (2023–2024) and compare them with locally trained models. The descriptive statistics reveal a relatively low overall rearrest rate (8.8%) and violent rearrest rate (1.9%), suggesting that most defendants released pretrial do not reoffend during this period. This baseline is crucial: even modest predictive gains must be interpreted in the context of generally low event rates. The PSA NCA and NVCA scores showed some predictive value, with rearrest rates rising alongside higher scores (e.g., NCA score 6 had a 24.2% rearrest rate compared to 5.6% for score 1). However, the correlation coefficients between PSA scores and rearrest outcomes were modest, and overall discrimination was limited (AUC ≈ 0.56). This indicates that, while the PSA captures some risk-related variation, its predictive power in Queens is weaker than often reported in multi-jurisdiction validations.
When benchmarked against statistical models, both logistic regression (AUC ≈ 0.63) and random forests (AUC ≈ 0.64) substantially outperformed the simple PSA cutoff approach. This finding underscores the importance of local calibration and the potential for even relatively simple statistical models to achieve higher accuracy when trained on site-specific data. The statistically significant improvement of the logistic regression model over the PSA threshold suggests that courts could benefit from tailoring predictive tools to local contexts rather than relying on generic thresholds. The fairness analysis adds an important dimension. Predictions systematically overestimated risk across both Black and non-Black groups, with calibration errors of about 27 percentage points in each group. Although disparities in metrics such as TPR, FPR, and PPV were present, they were not extreme (e.g., FPR ratio of 1.26). This suggests that while racial disparities exist in predictive accuracy, they may not be the primary driver of inequities in this dataset. Still, even modest disparities can accumulate into meaningful differences in pretrial detention decisions, highlighting the need for transparency and ongoing monitoring. Overall, the results highlight a mixed picture: risk assessment tools like the PSA capture real patterns but may under perform in specific jurisdictions if applied without local validation. Local models provide measurable accuracy gains but still face challenges related to fairness and calibration.
Pretrial decision making in Queens County reflects the broader tension between public safety, fairness, and efficiency. This analysis demonstrates that while the PSA offers a structured and transparent framework, its predictive power is limited in this setting. Locally trained models, even with modest complexity, deliver stronger predictive accuracy, underscoring the value of jurisdiction-specific validation. At the same time, fairness concerns remain. Although disparities between Black and non-Black defendants were not extreme, systematic overprediction of risk across groups points to the limits of statistical models in resolving deeply rooted inequities in the justice system. Policymakers and practitioners should therefore view predictive tools as one piece of a larger decision-making process, not as replacements for judicial discretion or broader reform.
In practice, this means:Local calibration of tools like the PSA should be routine, not optional. Performance should be assessed not only on accuracy but also on fairness metrics across groups. Risk assessments should complement — rather than substitute — transparent judicial reasoning. Ultimately, the findings suggest that effective pretrial reform requires both technical improvements to predictive tools and a broader commitment to equity and accountability in decision-making.