Final Presentation CSC 530

Connie Rodriguez

2024-12-05

Issue Description

In recent years, the issue of machine learning bias in predictive algorithms has become a critical area of investigation due to its widespread implications across industries such as law enforcement, healthcare, and hiring. Machine learning models, while powerful in their predictive capabilities, can often inherit biases from the data they are trained on, leading to skewed and sometimes harmful outcomes. For instance, facial recognition systems used by police have been shown to have higher error rates for individuals with darker skin tones, which has led to wrongful arrests, as was the case for Robert Williams in 2020, which was reported by The New York Times. Additionally, Amazon’s AI recruiting tool, which was meant to streamline the hiring process, was later discovered to be biased against female candidates because the system learned from past hiring data that favored men, as Reuters highlighted in 2018. These real-world examples emphasize the urgent need to investigate the root causes of these biases and try to apply mitigation strategies for bias in machine learning models.

One of the most well-documented areas of bias is in facial recognition technology, as shown in the Gender Shades study conducted by Joy Buolamwini, and Timnit Gebru (2018). Their research found that commercial facial recognition systems had significantly higher error rates for darker-skinned women compared to lighter-skinned men, with error rates reaching as high as 34.7%. This is problematic because these systems are increasingly being used in high-stakes applications that affect people’s lives, including surveillance and identification by law enforcement. Another significant example is the COMPAS algorithm used in the U.S. criminal justice system, which ProPublica exposed, in 2016, as being racially biased against African American defendants. This labeled them as higher risks for re-offending, even when they were no more likely to re-offend than anyone else. These findings are showing how biased algorithms can reinforce existing social inequalities, making this an essential and important area of study.

The significance of studying bias in machine learning goes beyond just technical flaws; it is about ensuring fairness and equity in systems that affect people’s lives. A comprehensive analysis of biased algorithms can reveal patterns in how training data, particularly those with historical biases, influence model predictions. Understanding these biases can lead to the development of more equitable algorithms by using diverse data sets and introducing fairness constraints during model training. By investigating the common types of biases in predictive algorithms and exploring solutions like fairness-aware machine learning, my project would aim to contribute to the ongoing efforts to build more transparent and ethical AI systems for the future.

Questions

What types of biases are most prevalent in machine learning models, and how do they impact prediction accuracy?
Can training algorithms with more diverse data sets reduce biases in predictive outcomes?

Data Source

The data source for this project is the COMPAS Recidivism Data set, which was published by ProPublica as part of their investigative reporting on algorithmic bias in risk assessment tools. The data set was collected from Broward County, Florida, and includes demographic, criminal history, and risk assessment data.

Documentation

Link: https://github.com/propublica/compas-analysis File: compas-scores.csv

The file is raw data and requires cloning to a git repository to use. The file is then converted into an excel file, I then used read.csv() to view the data.

Is there a data dictionary?

Yes, the data set comes with a data dictionary, which can be found in the documentation provided by ProPublica. Below is an explanation of the key columns:

age: The age of the defendant.
sex: The gender of the defendant (Male or Female).
race: The racial category of the defendant (e.g., African-American, Caucasian).
priors_count: The number of prior offenses committed by the defendant.
c_charge_degree: The degree of the current charge (M = Misdemeanor, F = Felony).
days_b_screening_arrest: The number of days between the screening date and the arrest date.
is_recid: A binary variable indicating whether the defendant was rearrested within two years (1 = Yes, 0 = No).
score_text: The risk score category assigned by COMPAS (Low, Medium, or High).
decile_score: The numerical risk score (from 1 to 10) assigned by COMPAS.
two_year_recid: The actual outcome of whether the defendant was rearrested within two years (1 = Yes, 0 = No).

Description of the Data

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

compas_data <- read.csv("C:/Users/User/OneDrive/CSC 530 Data Analysis/R studio/compas-scores.csv", stringsAsFactors = TRUE)

# View the structure of the dataset
str(compas_data)

## 'data.frame':    11757 obs. of  47 variables:
##  $ id                     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ name                   : Factor w/ 11584 levels "aajah herrington",..: 8013 7919 6512 3255 7285 1141 7474 3334 10304 3427 ...
##  $ first                  : Factor w/ 4058 levels "aajah","aaliyah",..: 2636 2623 2103 1091 2478 410 2532 1107 3487 1141 ...
##  $ last                   : Factor w/ 5921 levels "aaron","abadia",..: 2515 4768 1551 4305 742 4323 3728 4623 5215 5376 ...
##  $ compas_screening_date  : Factor w/ 704 levels "2013-01-01","2013-01-02",..: 207 704 27 104 13 85 312 393 223 418 ...
##  $ sex                    : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 1 ...
##  $ dob                    : Factor w/ 7800 levels "1919-10-14","1929-09-14",..: 92 4777 4080 6444 6911 2483 2248 2693 2495 2983 ...
##  $ age                    : int  69 31 34 24 23 43 44 41 43 39 ...
##  $ age_cat                : Factor w/ 3 levels "25 - 45","Greater than 45",..: 2 1 1 3 3 1 1 1 1 1 ...
##  $ race                   : Factor w/ 6 levels "African-American",..: 6 3 1 1 1 6 6 3 6 3 ...
##  $ juv_fel_count          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ decile_score           : int  1 5 3 4 8 1 1 6 4 1 ...
##  $ juv_misd_count         : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ juv_other_count        : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ priors_count           : int  0 0 0 4 1 2 0 14 3 0 ...
##  $ days_b_screening_arrest: int  -1 NA -1 -1 NA NA 0 -1 -1 -1 ...
##  $ c_jail_in              : Factor w/ 10578 levels "","2013-01-01 01:31:55",..: 3469 1 510 2092 1 1 5100 6303 3745 6645 ...
##  $ c_jail_out             : Factor w/ 10518 levels "","2013-01-02 01:12:01",..: 3090 1 496 1718 1 1 4738 5984 3344 6296 ...
##  $ c_case_number          : Factor w/ 11016 levels "","00004068CF10A",..: 4173 1 1409 2812 1176 738 6182 6988 4368 7492 ...
##  $ c_offense_date         : Factor w/ 1037 levels "","1987-11-07",..: 533 1 336 413 322 1 642 722 1 747 ...
##  $ c_arrest_date          : Factor w/ 803 levels "","1997-06-18",..: 1 1 1 1 1 203 1 1 402 1 ...
##  $ c_days_from_compas     : int  1 NA 1 1 1 76 0 1 1 1 ...
##  $ c_charge_degree        : Factor w/ 3 levels "F","M","O": 1 3 1 1 1 1 2 1 1 2 ...
##  $ c_charge_desc          : Factor w/ 532 levels "","Abuse Without Great Harm",..: 21 1 213 376 373 36 62 361 36 62 ...
##  $ is_recid               : int  0 -1 1 1 0 0 0 1 0 0 ...
##  $ num_r_cases            : logi  NA NA NA NA NA NA ...
##  $ r_case_number          : Factor w/ 3704 levels "","13000349MM10A",..: 1 1 257 323 1 1 1 1161 1 1 ...
##  $ r_charge_degree        : Factor w/ 3 levels "F","M","O": 3 3 1 2 3 3 3 1 3 3 ...
##  $ r_days_from_arrest     : int  NA NA NA 0 NA NA NA 0 NA NA ...
##  $ r_offense_date         : Factor w/ 1091 levels "","2013-01-03",..: 1 1 141 126 1 1 1 401 1 1 ...
##  $ r_charge_desc          : Factor w/ 353 levels "","Agg Assault W/int Com Fel Dome",..: 1 1 121 85 1 1 1 203 1 1 ...
##  $ r_jail_in              : Factor w/ 985 levels "","2013-01-04",..: 1 1 1 107 1 1 1 354 1 1 ...
##  $ r_jail_out             : Factor w/ 954 levels "","2013-01-05",..: 1 1 1 87 1 1 1 328 1 1 ...
##  $ is_violent_recid       : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ num_vr_cases           : logi  NA NA NA NA NA NA ...
##  $ vr_case_number         : Factor w/ 883 levels "","13001383CF10A",..: 1 1 49 1 1 1 1 1 1 1 ...
##  $ vr_charge_degree       : Factor w/ 10 levels "","(F1)","(F2)",..: 1 1 4 1 1 1 1 1 1 1 ...
##  $ vr_offense_date        : Factor w/ 600 levels "","2013-01-28",..: 1 1 48 1 1 1 1 1 1 1 ...
##  $ vr_charge_desc         : Factor w/ 89 levels "","Agg Assault Law Enforc Officer",..: 1 1 61 1 1 1 1 1 1 1 ...
##  $ v_type_of_assessment   : Factor w/ 1 level "Risk of Violence": 1 1 1 1 1 1 1 1 1 1 ...
##  $ v_decile_score         : int  1 2 1 3 6 1 1 2 3 1 ...
##  $ v_score_text           : Factor w/ 4 levels "High","Low","Medium",..: 2 2 2 2 3 2 2 2 2 2 ...
##  $ v_screening_date       : Factor w/ 704 levels "2013-01-01","2013-01-02",..: 207 704 27 104 13 85 312 393 223 418 ...
##  $ type_of_assessment     : Factor w/ 1 level "Risk of Recidivism": 1 1 1 1 1 1 1 1 1 1 ...
##  $ decile_score.1         : int  1 5 3 4 8 1 1 6 4 1 ...
##  $ score_text             : Factor w/ 4 levels "High","Low","Medium",..: 2 3 2 2 1 2 2 3 2 2 ...
##  $ screening_date         : Factor w/ 704 levels "2013-01-01","2013-01-02",..: 207 704 27 104 13 85 312 393 223 418 ...

# Summarize the dataset
summary(compas_data)

##        id                        name               first      
##  Min.   :    1   carlos vasquez    :    4   michael    :  264  
##  1st Qu.: 2940   john brown        :    4   christopher:  162  
##  Median : 5879   michael cunningham:    4   anthony    :  132  
##  Mean   : 5879   robert taylor     :    4   james      :  131  
##  3rd Qu.: 8818   anthony jackson   :    3   john       :  130  
##  Max.   :11757   anthony smith     :    3   robert     :  128  
##                  (Other)           :11735   (Other)    :10810  
##        last       compas_screening_date     sex               dob       
##  williams:  145   2013-03-20:   39      Female:2421   1984-07-06:    6  
##  brown   :  130   2013-04-20:   38      Male  :9336   1986-01-03:    6  
##  johnson :  120   2013-09-23:   35                    1988-04-15:    6  
##  smith   :  112   2013-02-20:   34                    1988-07-25:    6  
##  jones   :   94   2013-02-22:   33                    1989-04-27:    6  
##  davis   :   68   2013-09-26:   33                    1989-09-27:    6  
##  (Other) :11088   (Other)   :11545                    (Other)   :11721  
##       age                   age_cat                   race     
##  Min.   :18.00   25 - 45        :6649   African-American:5813  
##  1st Qu.:25.00   Greater than 45:2668   Asian           :  58  
##  Median :32.00   Less than 25   :2440   Caucasian       :4085  
##  Mean   :35.14                          Hispanic        :1100  
##  3rd Qu.:43.00                          Native American :  40  
##  Max.   :96.00                          Other           : 661  
##                                                                
##  juv_fel_count       decile_score    juv_misd_count     juv_other_count   
##  Min.   : 0.00000   Min.   :-1.000   Min.   : 0.00000   Min.   : 0.00000  
##  1st Qu.: 0.00000   1st Qu.: 2.000   1st Qu.: 0.00000   1st Qu.: 0.00000  
##  Median : 0.00000   Median : 4.000   Median : 0.00000   Median : 0.00000  
##  Mean   : 0.06158   Mean   : 4.371   Mean   : 0.07604   Mean   : 0.09356  
##  3rd Qu.: 0.00000   3rd Qu.: 7.000   3rd Qu.: 0.00000   3rd Qu.: 0.00000  
##  Max.   :20.00000   Max.   :10.000   Max.   :13.00000   Max.   :17.00000  
##                                                                           
##   priors_count    days_b_screening_arrest               c_jail_in    
##  Min.   : 0.000   Min.   :-597.000                           : 1180  
##  1st Qu.: 0.000   1st Qu.:  -1.000        2013-01-01 01:31:55:    1  
##  Median : 1.000   Median :  -1.000        2013-01-01 03:16:15:    1  
##  Mean   : 3.082   Mean   :  -0.878        2013-01-01 03:28:03:    1  
##  3rd Qu.: 4.000   3rd Qu.:  -1.000        2013-01-01 04:17:22:    1  
##  Max.   :43.000   Max.   :1057.000        2013-01-01 04:29:04:    1  
##                   NA's   :1180            (Other)            :10572  
##                c_jail_out          c_case_number      c_offense_date
##                     : 1180                :  742             :2600  
##  2013-09-12 10:31:00:    4   00004068CF10A:    1   2013-03-20:  29  
##  2014-02-12 10:41:00:    4   00022077MM10A:    1   2013-01-14:  28  
##  2013-08-22 11:38:00:    3   00037912TC10A:    1   2013-02-22:  27  
##  2013-09-14 05:58:00:    3   01004839CF10A:    1   2013-01-09:  26  
##  2013-09-18 10:30:00:    3   01006487CF10D:    1   2013-01-11:  26  
##  (Other)            :10560   (Other)      :11010   (Other)   :9021  
##     c_arrest_date  c_days_from_compas c_charge_degree
##            :9899   Min.   :   0.00    F:7232         
##  2013-01-10:   9   1st Qu.:   1.00    M:3771         
##  2013-01-15:   9   Median :   1.00    O: 754         
##  2013-02-06:   9   Mean   :  63.59                   
##  2013-05-15:   9   3rd Qu.:   2.00                   
##  2013-02-19:   8   Max.   :9485.00                   
##  (Other)   :1814   NA's   :742                       
##                        c_charge_desc     is_recid       num_r_cases   
##  arrest case no charge        :1858   Min.   :-1.0000   Mode:logical  
##  Battery                      :1811   1st Qu.: 0.0000   NA's:11757    
##                               : 749   Median : 0.0000                 
##  Possession of Cocaine        : 703   Mean   : 0.2538                 
##  Grand Theft in the 3rd Degree: 688   3rd Qu.: 1.0000                 
##  Driving While License Revoked: 255   Max.   : 1.0000                 
##  (Other)                      :5693                                   
##        r_case_number  r_charge_degree r_days_from_arrest    r_offense_date
##               :8054   F:1202          Min.   : -1.00               :8054  
##  13000349MM10A:   1   M:2499          1st Qu.:  0.00     2014-12-08:  12  
##  13000445MM20A:   1   O:8056          Median :  0.00     2015-01-28:  11  
##  13000677MM20A:   1                   Mean   : 20.41     2015-02-10:  11  
##  13000758MM30A:   1                   3rd Qu.:  1.00     2014-04-03:  10  
##  13000785MM30A:   1                   Max.   :993.00     2014-06-05:  10  
##  (Other)      :3698                   NA's   :9297       (Other)   :3649  
##                            r_charge_desc       r_jail_in         r_jail_out  
##                                   :8114             :9297             :9297  
##  Driving License Suspended        : 279   2014-05-27:   9   2014-02-18:  10  
##  Possess Cannabis/20 Grams Or Less: 269   2014-07-10:   9   2015-05-15:  10  
##  Resist/Obstruct W/O Violence     : 209   2013-11-22:   8   2014-07-11:   9  
##  Battery                          : 207   2014-04-29:   8   2014-12-09:   9  
##  Operating W/O Valid License      : 184   2014-06-05:   8   2013-11-13:   8  
##  (Other)                          :2495   (Other)   :2418   (Other)   :2414  
##  is_violent_recid  num_vr_cases         vr_case_number  vr_charge_degree
##  Min.   :0.00000   Mode:logical                :10875          :10875   
##  1st Qu.:0.00000   NA's:11757     13001383CF10A:    1   (M1)   :  372   
##  Median :0.00000                  13001876CF10A:    1   (F3)   :  243   
##  Mean   :0.07502                  13002119CF10A:    1   (F2)   :  174   
##  3rd Qu.:0.00000                  13002277CF10A:    1   (F1)   :   43   
##  Max.   :1.00000                  13002546CF10A:    1   (M2)   :   20   
##                                   (Other)      :  877   (Other):   30   
##    vr_offense_date                         vr_charge_desc 
##            :10875                                 :10875  
##  2015-08-15:    6   Battery                       :  356  
##  2015-09-04:    5   Aggravated Assault W/Dead Weap:   42  
##  2013-11-14:    4   Felony Battery (Dom Strang)   :   41  
##  2014-02-18:    4   Battery on Law Enforc Officer :   40  
##  2014-04-05:    4   Aggrav Battery w/Deadly Weapon:   38  
##  (Other)   :  859   (Other)                       :  365  
##        v_type_of_assessment v_decile_score   v_score_text    v_screening_date
##  Risk of Violence:11757     Min.   :-1.000   High  :1116   2013-03-20:   39  
##                             1st Qu.: 1.000   Low   :7968   2013-04-20:   38  
##                             Median : 3.000   Medium:2668   2013-09-23:   35  
##                             Mean   : 3.571   N/A   :   5   2013-02-20:   34  
##                             3rd Qu.: 5.000                 2013-02-22:   33  
##                             Max.   :10.000                 2013-09-26:   33  
##                                                            (Other)   :11545  
##           type_of_assessment decile_score.1    score_text      screening_date 
##  Risk of Recidivism:11757    Min.   :-1.000   High  :2208   2013-03-20:   39  
##                              1st Qu.: 2.000   Low   :6607   2013-04-20:   38  
##                              Median : 4.000   Medium:2927   2013-09-23:   35  
##                              Mean   : 4.371   N/A   :  15   2013-02-20:   34  
##                              3rd Qu.: 7.000                 2013-02-22:   33  
##                              Max.   :10.000                 2013-09-26:   33  
##                                                             (Other)   :11545

Cleaning and Preparation

Step 1. Data Cleaning

What is done: Narrowed the data set to include only key columns relevant for analysis, removed missing or invalid values, and converted categorical variables to factors for easier analysis.

Why: Ensures that the data is clean, consistent, and focused on the variables necessary for investigating bias.

# Focus on key columns
compas_data <- compas_data %>%
  select(age, sex, race, priors_count, days_b_screening_arrest, c_charge_degree, is_recid, score_text)

# Handle missing values
compas_data <- compas_data %>%
  filter(!is.na(days_b_screening_arrest), score_text != "N/A")

# Convert factors for analysis
compas_data$score_text <- factor(compas_data$score_text, levels = c("Low", "Medium", "High"))
compas_data$is_recid <- as.factor(compas_data$is_recid)

Step 2. Bias Analysis

False Positive and False Negative Rates:

What was done: Calculated false positive rates (individuals predicted as “High” risk but did not reoffend) and false negative rates (individuals predicted as “Low” risk but reoffended) for each demographic group (race, gender).

Why: Identifies whether certain groups are disproportionately affected by prediction errors, which is a key indicator of model bias.

Comparison with Overall Rates:

What was done: Compared group-specific recidivism rates to the overall recidivism rate to measure relative disparities.

Why: Highlights whether certain groups are over or under represented in re-offending outcomes, independent of prediction errors.

False Positive by Race:

Numeric Representation

# Calculate false positive rates
bias_analysis <- compas_data %>%
  group_by(race) %>%
  summarize(
    false_positive_rate = mean(score_text == "High" & is_recid == "0"),
  )

print(bias_analysis)

## # A tibble: 6 × 2
##   race             false_positive_rate
##   <fct>                          <dbl>
## 1 African-American              0.123 
## 2 Asian                         0.0189
## 3 Caucasian                     0.0541
## 4 Hispanic                      0.0523
## 5 Native American               0     
## 6 Other                         0.0167

Graphical Representation

# Bar plot of false positive rates by race
ggplot(bias_analysis, aes(x = race, y = false_positive_rate, fill = race)) +
  geom_bar(stat = "identity") +
  labs(
    title = "False Positive Rates by Race",
    x = "Race",
    y = "False Positive Rate"
  ) +
  scale_fill_brewer(palette = "Set3") +
  theme_minimal()

Key findings for individuals incorrectly labeled “High” risk that did not re-offend:

African-Americans have the highest false positive rate (12.3%), indicating they are disproportionately predicted as “High” risk even when they do not re-offend.

Native Americans and Other groups have the lowest false positive rates 0% and 1.7%, suggesting fewer individuals from these groups are incorrectly labeled as “High” risk.

Overall: African Americans have the highest false positive rate indicates potential bias, as this group is more likely to be unfairly flagged as “High” risk. While Caucasians and Hispanics have similar false positive rates (5.4% and 5.2%).

False Negative by Race:

Numeric Representation

# Calculate false negative rates
bias_analysis <- compas_data %>%
  group_by(race) %>%
  summarize(
    false_negative_rate = mean(score_text == "Low" & is_recid == "1")
  )

print(bias_analysis)

## # A tibble: 6 × 2
##   race             false_negative_rate
##   <fct>                          <dbl>
## 1 African-American              0.118 
## 2 Asian                         0.0755
## 3 Caucasian                     0.143 
## 4 Hispanic                      0.143 
## 5 Native American               0.0606
## 6 Other                         0.165

Graphical Representation

# Bar plot of false negative rates by race
ggplot(bias_analysis, aes(x = race, y = false_negative_rate, fill = race)) +
  geom_bar(stat = "identity") +
  labs(
    title = "False Negative Rates by Race",
    x = "Race",
    y = "False Negative Rate"
  ) +
  scale_fill_brewer(palette = "Set3") +
  theme_minimal()

Key findings for individuals incorrectly labeled “Low” risk that did re-offend:

The Other racial group has the highest false negative rate at 16.5%, meaning they are more likely to be mis-classified as “Low” risk when they do re-offend.

Native Americans have the lowest false negative rate at 6.1%, showing better alignment between predictions and actual re-offending.

Overall: Caucasians and Hispanics have high false negative rates (14.3%), meaning the model frequently underestimates their risk of re-offending as compared to other races.

False Positive by Gender:

Numeric Representation:

# Calculate false positive rates by gender
gender_bias_analysis <- compas_data %>%
  filter(score_text == "High") %>%  # Focus on cases predicted as "High" risk
  group_by(sex) %>%
  summarize(
    false_positive_rate = mean(is_recid == 0)  # Proportion of false positives
  )

# View results
print(gender_bias_analysis)

## # A tibble: 2 × 2
##   sex    false_positive_rate
##   <fct>                <dbl>
## 1 Female               0.554
## 2 Male                 0.437

Graphical Representation

# Bar plot of false positive rates by gender
ggplot(gender_bias_analysis, aes(x = sex, y = false_positive_rate, fill = sex)) +
  geom_bar(stat = "identity") +
  labs(
    title = "False Positive Rates by Gender",
    x = "Gender",
    y = "False Positive Rate"
  ) +
  scale_fill_manual(values = c("Female" = "pink", "Male" = "blue")) +
  theme_minimal()

Key findings for individuals labeled as “High” risk that did not re-offend:

Females have a significantly higher false positive rate which is 11.7% higher than males.

Overall: This indicates that females are more likely than males to be unfairly flagged as “High” risk by the COMPAS model, despite not re-offending.The higher false positive rate for females suggests potential bias in how the model assigns risk scores. This could lead to unnecessary consequences for females flagged as “High” risk when they are not likely to re-offend.

False Negative by Gender:

Numeric Representation

# Calculate false negative rates by gender
gender_bias_analysis <- compas_data %>%
  filter(score_text == "Low") %>%  # Focus on cases predicted as "Low" risk
  group_by(sex) %>%
  summarize(
    false_negative_rate = mean(is_recid == 1)  # Proportion of false negatives
  )

# View results
print(gender_bias_analysis)

## # A tibble: 2 × 2
##   sex    false_negative_rate
##   <fct>                <dbl>
## 1 Female               0.175
## 2 Male                 0.252

Graphical Representation

# Bar plot of false negative rates by gender
ggplot(gender_bias_analysis, aes(x = sex, y = false_negative_rate, fill = sex)) +
  geom_bar(stat = "identity") +
  labs(
    title = "False Negative Rates by Gender",
    x = "Gender",
    y = "False Negative Rate"
  ) +
  scale_fill_manual(values = c("Female" = "red", "Male" = "blue")) +
  theme_minimal()

Key findings for individuals labeled as “Low” risk that did not re-offend:

When it comes to re-offending the algorithm underestimated both males and females, for males it underestimates about 25% of the time where females were lower at 17.52%.

Overall: The COMPAS algorithm has a higher likelihood of underestimating the risk of re-offending for males compared to females. This could lead to insufficient monitoring or intervention for males who may actually pose a higher risk.The disparity in false negative rates between genders suggests potential bias in how the algorithm evaluates male and female defendants. This bias could be rooted in the training data or the algorithm’s feature weighting.

Final Results

Approach to Questions:

What types of biases are most prevalent in machine learning models, and how do they impact prediction accuracy?

By analyzing false positive and false negative rates in the COMPAS data set, I identified disparities in how the model evaluates different demographic groups. For example, African-Americans had the highest false positive rates, while males had higher false negative rates than females.

These disparities demonstrate that the model’s biases lead to systemic mis-classifications, unfairly burdening certain groups while underestimating the risk posed by others.

Can training algorithms with more diverse data sets reduce biases in predictive outcomes?

Although this project did not retrain or adjust the algorithm due to the constraints of the data set, the results underscore the importance of representative data in training. The observed disparities likely stem from imbalanced training data that reflect historical societal biases.

Final Results

What Was Accomplished

Data Cleaning:

Cleaned the data set by focusing on key variables, removing invalid or missing values, and converting categorical variables to factors. This preparation ensured a reliable basis for statistical analysis.

Bias Analysis for False Positive Rates:

For race, African-Americans experienced the highest false positive rate (12.3%), showing they were disproportionately flagged as “High” risk despite not re-offending. Native Americans and the “Other” racial group had the lowest rates (0% and 1.7%, respectively).

For gender, females had a significantly higher false positive rate (11.7% greater than males), meaning they were more likely to be incorrectly flagged as “High” risk. False Negative Rates:

For race, the “Other” racial group had the highest false negative rate (16.5%), indicating underestimation of risk. Native Americans had the lowest rate (6.1%).

For gender, males experienced a higher false negative rate (25.16%) than females (17.52%), suggesting that the algorithm underestimated the re-offending risk for males more frequently.

Final Results

What It Means

Algorithmic Bias:

The disparities in false positive and false negative rates reflect bias in the COMPAS algorithm. These biases likely originate from historical and societal inequalities embedded in the training data.

Real-World Implications:

False Positives: Unfairly flagging African-Americans and females as “High” risk imposes unnecessary legal burdens, such as stricter parole conditions.

False Negatives: Underestimating risk for males or certain racial groups could lead to inadequate monitoring, potentially compromising public safety.

Need for Mitigation:

The results stress the importance of fairness-aware machine learning practices. Addressing these disparities requires diverse training data and fairness constraints during model development.

Final Results

Conclusion

This project provided a comprehensive analysis of biases in the COMPAS algorithm, focusing on disparities in false positive and false negative rates by race and gender.

The findings demonstrate the urgent need for fairness-aware practices in machine learning to ensure that predictive models do not reinforce societal inequities.

While this project was limited to evaluating an existing algorithm, it lays the groundwork for future efforts to retrain models with diverse datasets and implement fairness-aware techniques. These steps are critical to building more transparent and ethical AI systems. Moving forward, we must embrace the ethical responsibility to challenge biases in AI and create tools that are not only powerful but also equitable and just.