This is a project on pattern of homicide in the United States based on Uniform Crime Reporting Program of the FBI. The data provides information regarding the homicide victims, their demographics, weapons used and location. Being aware of these trends assists in designing policies on public safety and intervention in the community.
Data Source: FBI Uniform Crime Reporting Program, U.S. Department of Justice.Data used in UCR can be provided by the law enforcement agencies in the US
Variables Used: - Victim_age: Age of victim (numeric) - Victim_sex: Gender of victim (categorical) - Victim_race: Race/ethnicity of victim (categorical) - Weapon_used: Type of weapon used (categorical) - Region: Geographic region (categorical) - Year: Year of incident (numeric)
Research Questions:
How old are people who were involved in homicides?
What is the regional difference in homicide?
What kind of weapons are the most popular?
Load Libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
New names:
Rows: 389730 Columns: 21
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(15): region, state, victim_age, victim_sex, victim_race, victim_race_pl... dbl
(5): ...1, year, month, multiple_victim_count, incident_id lgl (1):
additional_victim
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
head(homicide_data)
# A tibble: 6 × 21
...1 year month region state victim_age victim_sex victim_race
<dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 0 1985 2 Southeast AL 27 Male White (includes Mexic…
2 1 1985 3 Southeast AL 61 Male White (includes Mexic…
3 2 1985 4 Southeast AL 29 Female White (includes Mexic…
4 3 1985 5 Southeast AL 45 Male White (includes Mexic…
5 4 1985 7 Southeast AL 30 Male White (includes Mexic…
6 5 1985 7 Southeast AL 31 Male White (includes Mexic…
# ℹ 13 more variables: victim_race_plus_hispanic <chr>, victim_ethnicity <chr>,
# weapon_used <chr>, victim_offender_split <chr>,
# offenders_relationship_to_victim <chr>,
# offenders_relationship_to_victim_grouping <chr>, offender_sex <chr>,
# circumstance <chr>, circumstance_grouping <chr>,
# extra_circumstance_info <chr>, multiple_victim_count <dbl>,
# incident_id <dbl>, additional_victim <lgl>
Model Equation: victim_age = 92.13 + (-0.0433) * year
Model Interpretation: The linear regression model also indicates statistically significant yet practically insignificant correlation between year and victims age (p = 0.0063). Although the model shows that the age of the victims reduces by about 0.043 years year-in and year-out, the very low R- squared value of 0.0096 shows that the year contributes to only 0.96 percent variance in the age of the victim. It implies that on a 20-year timeframe, the mean age of victims would only change by less than one year, which is not of any significance substance.
The results recall that the demographics of victims have remained impressively consistent over the previous ages (which have supported criminological study on long-standing age patterns of violent criminal offense victims (Zahn & McCall, 1999; Zeoli, 2023). The low level of explanatory power inheres in the interpretation that the temporal causative factors contribute a little to the characteristics of homicide victims, with other characteristics (demographic, social, or geographic) playing more significant roles than the year-to-year fluctuations.
Visualization 1: Bar Graph - Homicides by Region and Gender
region_gender_summary <- clean_data %>%count(region, victim_sex) %>%group_by(region) %>%mutate(total_region =sum(n),percentage =round((n/total_region) *100, 1)) %>%ungroup()p1 <-ggplot(region_gender_summary, aes(x =reorder(region, total_region), y = n, fill = victim_sex)) +geom_bar(stat ="identity", position ="dodge", alpha =0.8) +scale_fill_manual(values =c("Male"="blue", "Female"="green", "Unknown"="orange"),name ="Gender") +labs(title ="Homicides by Region and Gender",subtitle ="Distribution showing gender patterns across US regions",x ="Region", y ="Number of Homicides",caption ="Source: FBI Uniform Crime Reporting Program") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),legend.position ="top",plot.title =element_text(size =14, face ="bold")) +geom_text(aes(label =paste0(percentage, "%")), position =position_dodge(width =0.9), vjust =-0.5, size =3) +annotate("text", x =3, y =max(region_gender_summary$n) *0.9, label ="Males represent majority\nacross all regions", color ="purple", fontface ="bold")print(p1)
It can be seen in this grouped chart that there are abhorring differences in gender ratios when it comes to homicide victimization in all parts of the US, and males are the greatest proportion of victims of homicide in the nation. The statistics indicate that men have the victimization rate of between 55% in Northeast to more than 61% in the Southeast and Midwest indicating that men are disproportionately at a higher risk of being killed no matter their territory of origin. This trend conforms well with other criminological studies, which point to the fact that males are much more likely to fall victims of violent crime especially homicide (Zahn & McCall, 1999). The Southeast has the highest numbers of homicides, as well as a high proportion of male victims (61.2%), which indicates more general trends of increased levels of violence in states of the South based on recent studies of crime trends (Lopez & Boxerman, 2024).
Visualization 2: Histogram - Age Distribution by Decade
age_decade_data <- clean_data %>%filter(!is.na(decade)) %>%select(victim_age, decade)p2 <-ggplot(age_decade_data, aes(x = victim_age, fill = decade)) +geom_histogram(bins =25, alpha =0.7, color ="white") +facet_wrap(~decade, scales ="free_y", ncol =2) +scale_fill_manual(values =c("1980s"="orange", "1990s"="yellow", "2000s"="blue", "2010s"="pink"),name ="Decade") +labs(title ="Age Distribution of Homicide Victims by Decade",subtitle ="Comparing victim age patterns across time periods",x ="Victim Age", y ="Frequency",caption ="Source: FBI Uniform Crime Reporting Program") +theme_light() +theme(strip.text =element_text(face ="bold", size =11),plot.title =element_text(size =14, face ="bold"),legend.position ="bottom") +geom_vline(xintercept =mean(clean_data$victim_age, na.rm =TRUE), linetype ="dashed", color ="red", alpha =0.8) +annotate("text", x =50, y =Inf, label ="Red line = Overall mean age", color ="purple", vjust =2, fontface ="bold")print(p2)
This illustrates that there were not only patterns in the ages of homicides victimization throughout the four decades, but it is also fixed that young adults were most vulnerable in all four decades. The age distribution has a very strong distribution of ages in the 20-30 block of years, with the most frequencies being recorded on the mid and early periods of the 20s. The red dashed line representing the average age at large is relatively unchanging within decades implying that despite the recorded variations in the total numbers of homicides in earlier studies (Zahn & McCall, 1999), the age segmentation of the homicide victims has been extremely steady. The 1990s panel presents the most frequent counts which are congruent to the highest homicide rates in the 1990s, the 2000s and 2010s panels exhibit lower frequencies, but with the same age distribution pattern. The consistency of these age patterns over time strengthens criminological research that youthful adults experience an unfair risk of being a victim of violence irrespective of associated societal and financial transformations (Zeoli, 2023). Both a steep drop in victimization rates following age 35 as well as very few victims over age 50 is replicated across all decades. This homogeneity implies that the risk factors that contribute to the experience of homicide, including lifestyle patterns, social connections, and high-risk exposures, are focused among a younger population despite a shift in the general level of crime (Lopez & Boxerman, 2024). The plot manages to convey the idea that although homicide rates have varied considerably throughout the years, basic demographic attributes of victims did not change and it gives an excellent clue in setting the focus of specific preventive approaches.
Visualization 3: Heatmap - Weapon Type by Age Group and Region (Interactive)
heatmap_data <- clean_data %>%count(age_group, weapon_simple, region) %>%group_by(age_group, region) %>%mutate(total_age_region =sum(n),percentage =round((n/total_age_region) *100, 1)) %>%ungroup() %>%filter(weapon_simple %in%c("Handgun", "Firearm (unspecified)", "Long Gun", "Knife"))p3 <-ggplot(heatmap_data, aes(x = age_group, y = weapon_simple, fill = percentage)) +geom_tile(color ="white", size =0.5) +facet_wrap(~region, scales ="free", ncol =2) +scale_fill_gradient2(low ="blue", mid ="red", high ="yellow", midpoint =50, name ="Percentage\nof Cases") +labs(title ="Weapon Types by Age Group Across US Regions",subtitle ="Percentage distribution showing regional and demographic patterns", x ="Age Group",y ="Weapon Type",caption ="Source: FBI Uniform Crime Reporting Program") +theme_classic() +theme(axis.text.x =element_text(angle =45, hjust =1),strip.text =element_text(face ="bold", size =10),plot.title =element_text(size =14, face ="bold"),legend.position ="right") +geom_text(aes(label =ifelse(percentage >10, paste0(percentage, "%"), "")), color ="green", fontface ="bold", size =3) +annotate("text", x =3, y =1, label ="Higher percentages\nindicate dominant patterns", color ="black", fontface ="bold", size =3)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
This heatmap shows the statistically substantial regional and demographical trends in the use of weapons in connection with homicide cases. The visualization shows the prevalence of handguns throughout the states and age group with handguns dominating in most of the demographic groups, where their share was about 60-70% of cases in most demographic categories, similar to those across the country (Gramlich, 2024). Southeast is reported to exhibit the highest rates of handgun usage especially in the younger victims (18-34 age groups), whose handgun usage is up to 70 cases. This trend coincides with evidence that shows that states in the south have recorded higher incidences of gun violence in the past (Zahn & McCall, 1999). Conversely, within the Northeast, the type of weaponry varied a little bit more with a slightly higher percentage of other types of weaponry.
The age-related trends demonstrate that younger victims (18-34) are more concentrated on handgun-related incidents as in all regions, which is in alignment with criminological studies that the young adults are a more disproportionately affected group in terms of gun violence (Zeoli, 2023).
Conclusion
Key Findings and Discussion
The overview of 775 homicide cases in 1985-2018 indicates the patterns that are critical and have not changed over the years. The much higher regional concentration was revealed in the Southeast part of the country with 268 cases (34.6%), the Midwest with 219 cases (28.3%) and the Southwest with 159 cases (20.5%). The pattern of use of weapons shows a dominance of handguns (506 cases or 65.3% of all cases), the long guns (162 cases or 20.9% of all cases) and unspecified firearms (97 cases or 12.5% of all cases). Statistical regression model shows that the demographic factors of a victim have not changed much over time, and over time nevertheless can explain only 0.96 percent of the differences in the age of the victim, which confirms the results that basic characteristics are constant no matter what the crime levels in society are (Zahn & McCall, 1999; Zeoli, 2023). Data preparation consisted of removing missing values and far-fetched entries in age in the initial dataset thus filtering out invalid cases of data entries to 775 valid cases. The cleaning procedure kept the cases over five region (Midwest, Northeast, Northwest, Southeast, Southwest), and developed simplified weapon categories, which combined several types of firearms into four broad categories: Handgun (506 cases), Long Gun (162 cases), Firearm unspecified (97 cases) and Other (10 cases). The age brackets were categorized in a standard way to five age chunks and decade variables introduced in order to make temporal analysis easier but at the same time retain the integrity of data. The regularities in these trends indicate that evidence-based interventions must favor its geographic patterning resource to the areas with the greatest incidences, such as in the Southeast (34.6 of cases), a consistency in prevention by long-term prevention among younger adult cohorts, and the general approach of tackling handgun availability since it is found to be used in 65.3 of the weapons. The interdisciplinary perspective recommended by the new studies (Zeoli, 2023) can be actively discussed under the conditions of the complex character of such patterns. There is also potential to undertake more research on how these demographic trends are interacting with other economic variables and shifts (such as, socioeconomic conditions and interventions at the community level) as part of a future study to help provide more insight about the dynamics of homicide.
References
Gramlich, J. (2024). What the data says about gun deaths in the U.S. *Pew Research Center.* https://www.pewresearch.org/short-reads/2025/03/05/what-the-data-says-about-gun-deaths-in-the-us/
Lopez, E., & Boxerman, B. (2024). *Crime trends in U.S. cities: Mid-year 2024 update.* Council on Criminal Justice. https://counciloncj.org/crime-trends-in-u-s-cities-mid-year-2024-update/
Zahn, M. A., & McCall, P. L. (1999). Homicide in the 20th-century United States: Trends and patterns. In *Studying and preventing homicide: Issues and challenges* (pp. 10-30). SAGE Publications, Inc.
Zeoli, A. M. (2023). Editorial introduction to homicide studies special issue: interdisciplinary and transdisciplinary approaches to the study of homicide. *Homicide Studies*, 27(4), 407-410.