## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'scales'
##
##
## The following object is masked from 'package:purrr':
##
## discard
##
##
## The following object is masked from 'package:readr':
##
## col_factor
##
##
## Rows: 24676 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): RACE, GENDER, AGE_GROUP, OFFENSE, OFFENSE_TYPE, HOMICIDE_TYPE, OFF...
## dbl (14): OBJECTID, RID, GENERIC_CASE_ID, GENERIC_OFFENDER_ID, CHARGE_NUMBER...
## lgl (4): CREATOR, CREATED, EDITOR, EDITED
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The dataset used for this analysis is the “Felony Sentences” dataset from the District of Columbia, publicly available on Data.gov: https://catalog.data.gov/dataset/felony-sentences-0299e. This dataset contains felony sentencing information from 2010 onward, including offender demographics such as gender, race, and age group, as well as sentencing details such as offense type, offense severity group, and sentence length. A codebook is embedded in the data itself through column names, but not provided separately. Key variables described in the column names include, ‘OFFENSE_Type’ (categorizes the type of offense i.e. violent, property, weapon, etc.), ‘SENTENCE_TO_SERVE_MONTHS’ (records the number of months an offender must serve), and ‘AGE_GROUP’ (indicates the age range of the offender). Other important columns include ‘GENDER’ and ‘RACE’, which allow analysis of demographic patterns in sentencing. This dataset comes from the District of Columbia and was published on Data.gov by the District of Columbia Sentencing Commission. It is not directly stated why the data were collected, but most likely to promote transparency in sentencing practices and to support research in incarceration/criminal justice trends in Washington, D.C.
The felony sentences dataset was already fairly clean, but to make it easier to navigate, I made a few touch-ups. First, I removed rows with missing values in key columns such as ‘GENDER’, ‘RACE’, ‘OFFENSE_TYPE’, ‘SENTENCE_IMPOSED_MONTHS’, and ‘SENTENCE_YEAR’. This ensures that the analyses are based only on complete observations for key variables used in my visualizations. Next, I converted relevant categorical variables (OFFENSE_TYPE, OFFENSE_SEVERITY_GROUP, AGE_GROUP, GENDER, RACE, and SENTENCE_TYPE) into factors, which makes it easier to group and summarize the data. I also removed duplicate entries to ensure that each record represents a singular sentencing instance. These steps created a clean dataset that can be reliably used for plotting and further analysis.
To support the findings from my analysis of the felony sentences dataset, I incorporated four sources that relate directly to sentencing information, disparities, and policy changes in the District of Columbia. Takefuji (2025) analyzes felony sentence disparities by gender and race using a District of Columbia sentencing dataset, showing statistically significant disparity trends that parallel patterns observed in this analysis.The District of Columbia Sentencing Commission (2024) provides local evidence of sentencing length differences by race in a paper examining overall sentencing trends, offering valuable insight within the same jurisdiction as my analysis. Ulmer and Bowman (2025) present a peer-reviewed study on how race and gender shape sentencing outcomes, providing theoretical support for understanding demographic influences on sentencing decisions. Finally, the District of Columbia Sentencing Commission (2011) outlines the Fine Proportionality Act, a policy implemented to standardize fines based on offense severity, which helps explain the sharp decline in fines observed after 2010. Together, these sources provide both empirical and policy context for analyzing sentencing patterns in D.C., highlighting the importance of demographics, offense type, and legislation in shaping outcomes.
This first graph provides an overview of the dataset by visualizing the number of sentences across offense types, split by gender. This bar chart allows us to see which offenses are most common and whether sentencing patterns differ between males and females. By looking at counts across gender, we can observe general trends, such as whether certain offenses disproportionately involve one gender. For this bar chart, the whole dataset was used, but for some later graphs, the data were summarized or visualized in different ways (such as grouping by year or offense type) to highlight specific patterns across felony sentencing in Washington, D.C.
This heatmap visualizes disparities in sentencing by race across different offense types. Darker red colors represent longer average sentences, while darker green colors indicate shorter average sentences. By presenting the data this way, it is easier to identify patterns, such as whether certain racial groups receive longer sentences for specific offenses. This visualization complements the previous graph by shifting focus from counts to sentencing outcomes, helping us understand not only who is being sentenced but also how severe those sentences are across demographic groups.
The line plot shows trends over time for both average sentence length and fines. Tracking these measures year by year allows us to observe changes in sentencing patterns, such as increases or decreases in sentence severity or fine amounts. While this visualization does not account for all demographic variables, it provides context for sentencing and fine amounts across multiple years, which can be explored further in combination with offense type or race.
This violin plot examines the distribution of sentence lengths for different sentence types, specifically comparing incarceration and probation. Unlike the earlier heatmap, which focused on average sentence lengths by race and offense type, this visualization highlights the distribution and variability of sentence types and lengths. The steep decline in average fines after 2010 corresponds with the implementation of the Fine Proportionality Act in 2011, which standardized fines relative to offense severity, reducing excessively high fines for lower-level crimes.
This scatter plot explores the relationship between probation sentence length and incarceration sentence length across different offense types. The full dataset was used for this visualization because the goal is to observe how these two sentencing components interact across the entire population of felony cases provided. Because the majority of observations falls between 0 and 100 months of probation, the plot focuses on this range to better visualize the clustering of cases. Each point represents an individual case, and the use of jitter reduces overlap between points, making patterns easier to observe. Coloring the points by offense type helps reveal whether certain categories of crimes tend to receive longer probation or incarceration periods. This visualization adds to earlier graphs by examining how different sentencing components can be analyzed together.
This treemap provides a visual overview of the distribution of offense types within the cleaned sentencing dataset. Each rectangle represents an offense category, and the size of the rectangle corresponds to the number of cases in that category. Labels include both the offense type and the number of observations to make the proportions easier to interpret. This visualization uses the full cleaned dataset rather than a subset, allowing readers to quickly see which types of offenses appear most frequently in the data and how they compare to one another in terms of case volume.
The visualizations greatly enhanced my understanding of the felony sentences dataset. The bar chart showed which offense types are most common and highlighted gender differences in sentencing counts. The heatmap revealed that sentence lengths vary across racial groups and offense types, which would have been harder to detect in a table of numbers. The line plot of yearly trends made it obvious how fines dropped sharply after 2010, linking clearly to the 2011 Fine Proportionality Act (District of Columbia Sentencing Commission, 2011). Scatterplots and violin plots helped me see the distribution and interaction of sentence types, showing patterns of probation versus incarceration lengths that were not apparent from simply looking at the dataset. Overall, these visualizations provided both context and clarity, allowing me to detect patterns, disparities, and trends that might otherwise be overlooked.
Based on the observed trends in the data, several predictions can be made. First, minor offenses are likely to continue receiving probation over incarceration given the current distributions. Second, fines for low-level offenses will likely remain proportional and relatively low. Third, disparities in sentence lengths by race and offense type suggest that certain demographic groups may continue to receive longer average sentences, unless additional policy reforms or oversight measures are implemented.
This analysis demonstrates that sentencing in Washington, D.C. varies by offense type, sentence type, and demographic factors. Visualizations showed that certain offenses are more common, that sentence lengths vary widely across offense categories and demographic groups, and that legislation can directly impact sentencing outcomes. By examining trends over time, the analysis not only clarifies how sentencing operates in practice, but also provides insight into potential disparities and areas for policy attention. Overall, the dataset demonstrates the complexity of sentencing decisions in D.C., revealing predictable patterns and disparities that warrant further research.
Learning R has been a challenging, yet rewarding experience. One of the easiest parts was using ggplot2 to create clear, professional visualizations. With a few lines of code, it’s possible to generate bar charts, heatmaps, scatterplots, and violin plots that reveal patterns in an aesthetically pleasing way. At the same time, cleaning the data took more effort. Understanding how to handle missing values, interpret warnings given by R, properly knit, and making sure my plots were readable were some of the greatest challenges. However, it strengthened my problem-solving skills and patience with the software. I’ve come to see R as a versatile tool for any future research or analysis in criminal justice or other fields.
District of Columbia Sentencing Commission. (2024). An Examination of Sentencing Trends by Defendant Race. https://scdc.dc.gov/sites/default/files/dc/sites/scdc/page_content/attachments/Racial%20Disparities%20Focused%20Paper%20Final.pdf
District of Columbia Sentencing Commission (2011). An Examination of Fine Proportionality in the District of Columbia. https://scdc.dc.gov/sites/default/files/dc/sites/scdc/publication/attachments/SCCRC_2011_Issues_Paper_No_1_Fines.pdf
Takefuji, Y. (2025). Visualizing disparity trends on felony sentence-imposed months by gender and race with Generative AI - Sciencedirect. ScienceDirect. https://www.sciencedirect.com/science/article/abs/pii/S0264275125000678
Ulmer, J. T., & Bowman, R. (2025, August). Race and gender differences in the age-sentencing relationship - Jeffery T. Ulmer, Ryan Bowman, 2025. SageJournals. https://journals.sagepub.com/doi/10.1177/00111287251359281