library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.2.0     âś” readr     2.1.6
## âś” forcats   1.0.1     âś” stringr   1.6.0
## âś” ggplot2   4.0.2     âś” tibble    3.3.1
## âś” lubridate 1.9.4     âś” tidyr     1.3.2
## âś” purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(ggplot2)

juvenile<-read_csv("Juvenile_Justice_Policy_and_Oversight_Committee_Equity_Metrics_-_Detentions.csv")
## Rows: 5940 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): Gender, Race/Ethnicity, Charge, Geography, Period, Year, Date
## dbl (6): Quarter, Detentions, Referrals, Period Type, Rate Per 100, Relative...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Dictionary

About the dataset

The dataset contains 108 observations and 13 variables

glimpse(juvenile)
## Rows: 5,940
## Columns: 13
## $ Gender                <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F"…
## $ `Race/Ethnicity`      <chr> "Hispanic", "Hispanic", "Hispanic", "Hispanic", …
## $ Charge                <chr> "any", "any", "any", "any", "any", "any", "any",…
## $ Geography             <chr> "Connecticut", "Connecticut", "Connecticut", "Co…
## $ Quarter               <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, …
## $ Detentions            <dbl> 5, 6, 5, 7, 5, 0, 5, 5, 5, 0, 5, 5, 5, 5, 5, 6, …
## $ Referrals             <dbl> 150, 179, 129, 182, 124, 47, 64, 55, 66, 47, 67,…
## $ Period                <chr> "quarterly", "quarterly", "quarterly", "quarterl…
## $ Year                  <chr> "2019", "2019", "2019", "2019", "2020", "2020", …
## $ Date                  <chr> "03/31/2019 12:00:00 AM", "06/30/2019 12:00:00 A…
## $ `Period Type`         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ `Rate Per 100`        <dbl> 3.333333, 3.351955, 3.100775, 3.846154, 2.419355…
## $ `Relative Rate Index` <dbl> 2.6666667, 2.6480447, 1.9689922, 0.0000000, 3.24…

The dataset includes total summaries of gender, period (quarterly or annually), and race/ethnicity. Below is a cleaned data set that only includes observations from annual data, both genders, statewide numbers, and is separated out by race/ethnicity.

juvenile_clean <- juvenile |> filter(Period=="annually", Gender=="both", `Race/Ethnicity`!="all", Geography =="Connecticut")
summary(juvenile_clean)
##     Gender          Race/Ethnicity        Charge           Geography        
##  Length:108         Length:108         Length:108         Length:108        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     Quarter      Detentions       Referrals          Period         
##  Min.   : NA   Min.   :   0.0   Min.   :   11.0   Length:108        
##  1st Qu.: NA   1st Qu.:  10.5   1st Qu.:  279.8   Class :character  
##  Median : NA   Median :  45.5   Median :  881.5   Mode  :character  
##  Mean   :NaN   Mean   : 113.3   Mean   : 1453.1                     
##  3rd Qu.: NA   3rd Qu.: 144.8   3rd Qu.: 1753.2                     
##  Max.   : NA   Max.   :1069.0   Max.   :10639.0                     
##  NA's   :108                                                        
##      Year               Date            Period Type     Rate Per 100   
##  Length:108         Length:108         Min.   :1.000   Min.   : 0.000  
##  Class :character   Class :character   1st Qu.:1.000   1st Qu.: 2.442  
##  Mode  :character   Mode  :character   Median :1.000   Median : 7.006  
##                                        Mean   :1.667   Mean   : 8.670  
##                                        3rd Qu.:1.000   3rd Qu.:12.330  
##                                        Max.   :5.000   Max.   :26.486  
##                                                                        
##  Relative Rate Index
##  Min.   :0.000      
##  1st Qu.:1.000      
##  Median :1.892      
##  Mean   :2.071      
##  3rd Qu.:2.754      
##  Max.   :6.933      
## 

The summary statistics of the Rate Per 100 variable indicate that the statewide annual detention rate per 100 youth has a mean of 8.67 and a median of 7.01, with values ranging from 0.00 to 26.47. This suggests that there is moderate variability, but extreme outliers.

summary(juvenile_clean$`Rate Per 100`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.442   7.006   8.670  12.330  26.486

The summary statistics of the Relative Rate Index variable shows a mean of 2.07 and a median of 1.89, indicating that detention rates for racial and ethnic youth is twice the rate for Non-Hispanic White youth.

summary(juvenile_clean$`Relative Rate Index`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   1.892   2.071   2.754   6.933

Histograms

This histogram shows the statewide detention rates per 100 referred youth. Most observations lie in between the 0-10 range displaying the overall modest rates of detention. The distribution has a long tail and is right skewed. A small amount of observations have a much higher detention rate.

ggplot(juvenile_clean, aes(x=`Rate Per 100`)) + geom_histogram(binwidth=0.5) + labs (main = "Histogram of Rate Per 100", x="Rate Per 100") 
## Ignoring unknown labels:
## • main : "Histogram of Rate Per 100"

This histogram shows the Relative Rate Index with a positively right skewed distribution. Most of the statewide observations occur between 0-3 RRI. This indicates that racial disparity relative to Non-Hispanic White youth is common, but extreme disparity is less common. The right tail suggests that for some years, the disparities were more extreme.

ggplot(juvenile_clean, aes(x=`Relative Rate Index`)) + geom_histogram(binwidth= 0.1) +labs (main = "Histogram of Relative Rate Index", x="Relative Rate Index") 
## Ignoring unknown labels:
## • main : "Histogram of Relative Rate Index"

Scatterplot

This scatterplot shows the relationship between the detention rate per 100 referred youths and the Relative Rate Index(RRI). The RRI measures racial disparity by comparing one group’s detention rate to the detention rate of Non-Hispanic White youths. A value of 1 indicates the same rate relative to Non-Hispanic White youth, while a value higher than 1 indicates a higher rate. The line of best fit indicates a moderate positive relationship, suggesting that higher detention rates are associated with higher levels of racial disparity relative to Non-Hispanic White youth.

ggplot(juvenile_clean, aes(x=juvenile_clean$`Rate Per 100`,y= juvenile_clean$`Relative Rate Index`)) +labs(title= "Association between Detention Rate Per 100 and Relative Rate Index", x="Rate Per 100 Referred Youth", y="Relative Rate Index") +geom_point() +geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Correlation

The correlation between the detention rate per 100 and the RRI is positive, but weak, at 0.249. Higher detention rates are associated with higher racial disparity, but the relationship is modest as the line of best fit shows.

cor(juvenile_clean$`Rate Per 100`, juvenile_clean$`Relative Rate Index`, use="complete.obs")
## [1] 0.2494242

Average detention rate per 100 referred youth:

There are clear differences in average detention rates across race and ethnicity. Hispanic youth exhibit the highest average rate at 12.35, followed by Non-Hispanic Black at 11.04. Non-Hispanic White have a substantially lower average rate at 4.52. These statistics suggest meaningful differences in detention intensity across racial and ethnic groups with Hispanic and Non-Hispanic Black youth experiencing the highest rates relative to Non-Hispanic White youth.

aggregate(`Rate Per 100`~ `Race/Ethnicity`, data=juvenile_clean, FUN=mean)
##       Race/Ethnicity Rate Per 100
## 1           Hispanic    12.345159
## 2 Non-Hispanic Black    11.043156
## 3 Non-Hispanic Other     6.767115
## 4 Non-Hispanic White     4.523815

Boxplot

This boxplot compares detention rates per 100 youth across race/ethnicity. There is clear variation of detention rates across racial/ethnic groups. Hispanic and Non-Hispanic Black juveniles show higher median detention rates with greater variability compared to Non-Hispanic Other and Non-Hispanic White juveniles. Non-Hispanic Other and Non-Hispanic White youth show a much lower median rate of detention and a narrower distribution.

boxplot(`Rate Per 100` ~ `Race/Ethnicity`, data=juvenile_clean, main="Detention Rates by Race/Ethnicity")