library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.2.0 âś” readr 2.1.6
## âś” forcats 1.0.1 âś” stringr 1.6.0
## âś” ggplot2 4.0.2 âś” tibble 3.3.1
## âś” lubridate 1.9.4 âś” tidyr 1.3.2
## âś” purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(ggplot2)
juvenile<-read_csv("Juvenile_Justice_Policy_and_Oversight_Committee_Equity_Metrics_-_Detentions.csv")
## Rows: 5940 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): Gender, Race/Ethnicity, Charge, Geography, Period, Year, Date
## dbl (6): Quarter, Detentions, Referrals, Period Type, Rate Per 100, Relative...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The dataset contains 108 observations and 13 variables
glimpse(juvenile)
## Rows: 5,940
## Columns: 13
## $ Gender <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F"…
## $ `Race/Ethnicity` <chr> "Hispanic", "Hispanic", "Hispanic", "Hispanic", …
## $ Charge <chr> "any", "any", "any", "any", "any", "any", "any",…
## $ Geography <chr> "Connecticut", "Connecticut", "Connecticut", "Co…
## $ Quarter <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, …
## $ Detentions <dbl> 5, 6, 5, 7, 5, 0, 5, 5, 5, 0, 5, 5, 5, 5, 5, 6, …
## $ Referrals <dbl> 150, 179, 129, 182, 124, 47, 64, 55, 66, 47, 67,…
## $ Period <chr> "quarterly", "quarterly", "quarterly", "quarterl…
## $ Year <chr> "2019", "2019", "2019", "2019", "2020", "2020", …
## $ Date <chr> "03/31/2019 12:00:00 AM", "06/30/2019 12:00:00 A…
## $ `Period Type` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ `Rate Per 100` <dbl> 3.333333, 3.351955, 3.100775, 3.846154, 2.419355…
## $ `Relative Rate Index` <dbl> 2.6666667, 2.6480447, 1.9689922, 0.0000000, 3.24…
The dataset includes total summaries of gender, period (quarterly or annually), and race/ethnicity. Below is a cleaned data set that only includes observations from annual data, both genders, statewide numbers, and is separated out by race/ethnicity.
juvenile_clean <- juvenile |> filter(Period=="annually", Gender=="both", `Race/Ethnicity`!="all", Geography =="Connecticut")
summary(juvenile_clean)
## Gender Race/Ethnicity Charge Geography
## Length:108 Length:108 Length:108 Length:108
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Quarter Detentions Referrals Period
## Min. : NA Min. : 0.0 Min. : 11.0 Length:108
## 1st Qu.: NA 1st Qu.: 10.5 1st Qu.: 279.8 Class :character
## Median : NA Median : 45.5 Median : 881.5 Mode :character
## Mean :NaN Mean : 113.3 Mean : 1453.1
## 3rd Qu.: NA 3rd Qu.: 144.8 3rd Qu.: 1753.2
## Max. : NA Max. :1069.0 Max. :10639.0
## NA's :108
## Year Date Period Type Rate Per 100
## Length:108 Length:108 Min. :1.000 Min. : 0.000
## Class :character Class :character 1st Qu.:1.000 1st Qu.: 2.442
## Mode :character Mode :character Median :1.000 Median : 7.006
## Mean :1.667 Mean : 8.670
## 3rd Qu.:1.000 3rd Qu.:12.330
## Max. :5.000 Max. :26.486
##
## Relative Rate Index
## Min. :0.000
## 1st Qu.:1.000
## Median :1.892
## Mean :2.071
## 3rd Qu.:2.754
## Max. :6.933
##
The summary statistics of the Rate Per 100 variable indicate that the statewide annual detention rate per 100 youth has a mean of 8.67 and a median of 7.01, with values ranging from 0.00 to 26.47. This suggests that there is moderate variability, but extreme outliers.
summary(juvenile_clean$`Rate Per 100`)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.442 7.006 8.670 12.330 26.486
The summary statistics of the Relative Rate Index variable shows a mean of 2.07 and a median of 1.89, indicating that detention rates for racial and ethnic youth is twice the rate for Non-Hispanic White youth.
summary(juvenile_clean$`Relative Rate Index`)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 1.892 2.071 2.754 6.933
This histogram shows the statewide detention rates per 100 referred youth. Most observations lie in between the 0-10 range displaying the overall modest rates of detention. The distribution has a long tail and is right skewed. A small amount of observations have a much higher detention rate.
ggplot(juvenile_clean, aes(x=`Rate Per 100`)) + geom_histogram(binwidth=0.5) + labs (main = "Histogram of Rate Per 100", x="Rate Per 100")
## Ignoring unknown labels:
## • main : "Histogram of Rate Per 100"
This histogram shows the Relative Rate Index with a positively right skewed distribution. Most of the statewide observations occur between 0-3 RRI. This indicates that racial disparity relative to Non-Hispanic White youth is common, but extreme disparity is less common. The right tail suggests that for some years, the disparities were more extreme.
ggplot(juvenile_clean, aes(x=`Relative Rate Index`)) + geom_histogram(binwidth= 0.1) +labs (main = "Histogram of Relative Rate Index", x="Relative Rate Index")
## Ignoring unknown labels:
## • main : "Histogram of Relative Rate Index"
This scatterplot shows the relationship between the detention rate per 100 referred youths and the Relative Rate Index(RRI). The RRI measures racial disparity by comparing one group’s detention rate to the detention rate of Non-Hispanic White youths. A value of 1 indicates the same rate relative to Non-Hispanic White youth, while a value higher than 1 indicates a higher rate. The line of best fit indicates a moderate positive relationship, suggesting that higher detention rates are associated with higher levels of racial disparity relative to Non-Hispanic White youth.
ggplot(juvenile_clean, aes(x=juvenile_clean$`Rate Per 100`,y= juvenile_clean$`Relative Rate Index`)) +labs(title= "Association between Detention Rate Per 100 and Relative Rate Index", x="Rate Per 100 Referred Youth", y="Relative Rate Index") +geom_point() +geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
The correlation between the detention rate per 100 and the RRI is positive, but weak, at 0.249. Higher detention rates are associated with higher racial disparity, but the relationship is modest as the line of best fit shows.
cor(juvenile_clean$`Rate Per 100`, juvenile_clean$`Relative Rate Index`, use="complete.obs")
## [1] 0.2494242
There are clear differences in average detention rates across race and ethnicity. Hispanic youth exhibit the highest average rate at 12.35, followed by Non-Hispanic Black at 11.04. Non-Hispanic White have a substantially lower average rate at 4.52. These statistics suggest meaningful differences in detention intensity across racial and ethnic groups with Hispanic and Non-Hispanic Black youth experiencing the highest rates relative to Non-Hispanic White youth.
aggregate(`Rate Per 100`~ `Race/Ethnicity`, data=juvenile_clean, FUN=mean)
## Race/Ethnicity Rate Per 100
## 1 Hispanic 12.345159
## 2 Non-Hispanic Black 11.043156
## 3 Non-Hispanic Other 6.767115
## 4 Non-Hispanic White 4.523815
This boxplot compares detention rates per 100 youth across race/ethnicity. There is clear variation of detention rates across racial/ethnic groups. Hispanic and Non-Hispanic Black juveniles show higher median detention rates with greater variability compared to Non-Hispanic Other and Non-Hispanic White juveniles. Non-Hispanic Other and Non-Hispanic White youth show a much lower median rate of detention and a narrower distribution.
boxplot(`Rate Per 100` ~ `Race/Ethnicity`, data=juvenile_clean, main="Detention Rates by Race/Ethnicity")