U.S. Crime Trends: A Statistical Analysis of FBI Real-Time Crime Data (2018–2026)

Author

Your Name Here

Published

April 14, 2026

Introduction

Crime in the United States is a multifaceted social phenomenon shaped by geography, economics, seasonality, and policing policy. This project analyzes monthly crime statistics drawn from the FBI’s Real-Time Crime Index (RTCI), a dataset that aggregates voluntary submissions from local law enforcement agencies across the country. The data spans January 2018 through early 2026, covering crimes across seven categories: murder, rape, robbery, aggravated assault, burglary, theft, and motor vehicle theft.

Research Questions:

Can we predict violent crime counts from individual crime sub-categories (murder, robbery, aggravated assault) using multiple linear regression?
How has property crime varied by U.S. Census region over time, and are there notable seasonal patterns?
Did the COVID-19 pandemic (2020–2021) visibly disrupt national crime trends?

The dataset was sourced from the AH Datalytics Real-Time Crime Index (source link), which compiles FBI NIBRS and UCR agency-level monthly reports into a unified CSV format updated regularly.

Setup: Load Libraries

Code

library(tidyverse)
library(lubridate)
library(scales)
library(ggplot2)
library(broom)
library(knitr)
library(kableExtra)
library(patchwork)

Step 1: Load the Dataset

Code

crime_raw <- read_csv("C:/Users/amanu/Downloads/final_sample.csv", show_col_types = FALSE)
# Preview structure
glimpse(crime_raw)

Rows: 74,413
Columns: 34
$ Month                          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, …
$ Year                           <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 201…
$ Date                           <chr> "January 2018", "February 2018", "March…
$ Agency                         <chr> "Agencies of < 100K", "Agencies of < 10…
$ State                          <chr> "Nationwide", "Nationwide", "Nationwide…
$ Region                         <chr> "Other", "Other", "Other", "Other", "Ot…
$ Agency_State                   <chr> "Agencies of < 100K, Nationwide", "Agen…
$ Murder                         <dbl> 63, 24, 41, 45, 44, 37, 45, 53, 45, 46,…
$ Rape                           <dbl> 566, 470, 545, 513, 632, 609, 599, 626,…
$ Robbery                        <dbl> 977, 760, 791, 868, 834, 848, 906, 920,…
$ `Aggravated Assault`           <dbl> 2293, 2053, 2393, 2492, 2750, 2876, 285…
$ Burglary                       <dbl> 4796, 4145, 4514, 4467, 4938, 4695, 519…
$ Theft                          <dbl> 23337, 19892, 22067, 22221, 24310, 2399…
$ `Motor Vehicle Theft`          <dbl> 2872, 2452, 2565, 2450, 2708, 2590, 281…
$ `Violent Crime`                <dbl> 3899, 3307, 3770, 3918, 4260, 4370, 440…
$ `Property Crime`               <dbl> 31005, 26489, 29146, 29138, 31956, 3128…
$ Murder_mvs_12mo                <dbl> 537, 511, 516, 518, 518, 509, 506, 524,…
$ Burglary_mvs_12mo              <dbl> 63155, 62661, 62399, 61737, 61108, 6045…
$ Rape_mvs_12mo                  <dbl> 6289, 6330, 6371, 6323, 6369, 6415, 646…
$ Robbery_mvs_12mo               <dbl> 11752, 11658, 11561, 11500, 11302, 1116…
$ `Aggravated Assault_mvs_12mo`  <dbl> 29808, 29830, 29808, 29906, 30027, 3027…
$ `Motor Vehicle Theft_mvs_12mo` <dbl> 32905, 32894, 32938, 32719, 32656, 3245…
$ Theft_mvs_12mo                 <dbl> 298448, 296650, 295528, 293932, 292343,…
$ `Violent Crime_mvs_12mo`       <dbl> 48386, 48329, 48256, 48247, 48216, 4836…
$ `Property Crime_mvs_12mo`      <dbl> 394508, 392205, 390865, 388388, 386107,…
$ Source.Link                    <chr> "https://ah-datalytics.github.io/rtci/l…
$ Source.Type                    <chr> "Aggregate", "Aggregate", "Aggregate", …
$ Source.Method                  <chr> "All agencies with complete data throug…
$ FBI.Population.Covered         <dbl> 16311785, 16311785, 16311785, 16311785,…
$ Number.of.Agencies             <dbl> 235, 235, 235, 235, 235, 235, 235, 235,…
$ Latitude                       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Longitude                      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Comment                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ `Last Updated`                 <chr> "2026-03-23 11:37:36 EDT", "2026-03-23 …

The raw dataset contains 74413 rows and 34 columns, representing monthly crime observations from agencies across 48 states and the District of Columbia.

Step 2: Data Cleaning

2.1 Initial Inspection

Code

# Count missing values per column
missing_summary <- crime_raw |>
  summarise(across(everything(), ~sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Column", values_to = "Missing") |>
  filter(Missing > 0) |>
  arrange(desc(Missing))

kable(missing_summary, caption = "Columns with Missing Values") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Columns with Missing Values
Column	Missing
Latitude	74413
Longitude	74413
Comment	70666
Violent Crime	21160
Property Crime	21160
Violent Crime_mvs_12mo	21160
Property Crime_mvs_12mo	21160
Rape_mvs_12mo	1654
Rape	1541
Aggravated Assault_mvs_12mo	1171
Aggravated Assault	1058
Burglary_mvs_12mo	790
Theft_mvs_12mo	781
Motor Vehicle Theft_mvs_12mo	705
Robbery_mvs_12mo	694
Burglary	677
Theft	669
Robbery	582
Motor Vehicle Theft	582
Murder_mvs_12mo	218
Murder	105

2.2 Cleaning Steps

Code

crime_clean <- crime_raw |>
  # 1. Parse Date column into proper date type
  mutate(
    Date_parsed = mdy(Date),
    Year = as.integer(Year),
    Month = as.integer(Month)
  ) |>
  # 2. Remove rows where both Violent Crime and Property Crime are NA
  #    (these are structurally incomplete agency records)
  filter(!(is.na(`Violent Crime`) & is.na(`Property Crime`))) |>
  # 3. Drop columns with 100% missingness (Latitude, Longitude)
  select(-Latitude, -Longitude) |>
  # 4. Remove the Comment column (>95% missing, no analytical value)
  select(-Comment) |>
  # 5. Standardize Region: relabel "Other" to "Nationwide/Aggregate"
  mutate(
    Region = if_else(Region == "Other", "Nationwide/Aggregate", Region),
    Region = factor(Region, levels = c("Northeast", "South", "Midwest", "West", "Nationwide/Aggregate"))
  ) |>
  # 6. Rename columns with spaces for easier handling
  rename(
    Aggravated_Assault = `Aggravated Assault`,
    Motor_Vehicle_Theft = `Motor Vehicle Theft`,
    Violent_Crime = `Violent Crime`,
    Property_Crime = `Property Crime`
  )

cat("Rows after cleaning:", nrow(crime_clean), "\n")

Rows after cleaning: 53253

Code

cat("Columns after cleaning:", ncol(crime_clean), "\n")

Columns after cleaning: 32

2.3 Create Analytical Subsets

Code

# Full-sample national aggregate rows (one per month-agency combination for national rollup)
national <- crime_clean |>
  filter(Agency == "Full Sample", !is.na(Violent_Crime))

# Agency-level data (exclude aggregate rows), with complete violent crime
agency_data <- crime_clean |>
  filter(Agency != "Full Sample",
         !is.na(Violent_Crime),
         Region != "Nationwide/Aggregate")

cat("National aggregate rows:", nrow(national), "\n")

National aggregate rows: 4559

Code

cat("Agency-level rows:", nrow(agency_data), "\n")

Agency-level rows: 48306

Step 3: Exploratory Data Analysis

3.1 National Crime Totals by Year

Code

annual_national <- national |>
  group_by(Year) |>
  summarise(
    Total_Violent = sum(Violent_Crime, na.rm = TRUE),
    Total_Property = sum(Property_Crime, na.rm = TRUE),
    Total_Murder = sum(Murder, na.rm = TRUE),
    .groups = "drop"
  ) |>
  filter(Year < 2026)  # 2026 is partial year

kable(annual_national,
      col.names = c("Year", "Total Violent Crime", "Total Property Crime", "Total Murders"),
      caption = "Annual National Crime Totals (Full Sample Agencies)",
      format.args = list(big.mark = ",")) |>
  kable_styling(bootstrap_options = c("striped", "hover"))

Annual National Crime Totals (Full Sample Agencies)
Year	Total Violent Crime	Total Property Crime	Total Murders
2,018	1,109,962	5,491,626	15,576
2,019	1,101,358	5,392,292	15,970
2,020	1,168,586	5,049,744	21,272
2,021	1,192,344	5,116,386	22,382
2,022	1,210,802	5,520,632	21,600
2,023	1,198,842	5,528,252	19,302
2,024	1,154,780	5,107,256	16,494
2,025	1,032,358	4,481,350	13,574

3.2 Seasonal Patterns

Code

monthly_avg <- national |>
  group_by(Month) |>
  summarise(
    Avg_Violent = mean(Violent_Crime, na.rm = TRUE),
    Avg_Property = mean(Property_Crime, na.rm = TRUE),
    .groups = "drop"
  ) |>
  mutate(Month_Name = month(Month, label = TRUE, abbr = TRUE))

monthly_avg |>
  kable(col.names = c("Month #", "Avg Violent Crime", "Avg Property Crime", "Month"),
        caption = "Average Monthly Crime Counts Across All Years",
        digits = 0,
        format.args = list(big.mark = ",")) |>
  kable_styling(bootstrap_options = c("striped", "hover"))

Average Monthly Crime Counts Across All Years
Month #	Avg Violent Crime	Avg Property Crime	Month
1	1,868	9,053	Jan
2	1,701	8,289	Feb
3	1,921	8,844	Mar
4	1,953	8,689	Apr
5	2,193	9,346	May
6	2,185	9,295	Jun
7	2,271	9,767	Jul
8	2,205	9,750	Aug
9	2,108	9,327	Sep
10	2,112	9,700	Oct
11	1,915	9,094	Nov
12	1,914	9,442	Dec

Step 4: Multiple Linear Regression

4.1 Model Setup

We model Violent Crime as a function of its three sub-components: Murder, Robbery, and Aggravated Assault — plus Month as a seasonal control.

Because Violent Crime is defined as the sum of these three components plus rape, including all four would create perfect multicollinearity. We therefore use Murder, Robbery, and Aggravated Assault as predictors and use Month as a seasonal nuisance variable.

Code

reg_data <- national |>
  select(Violent_Crime, Murder, Robbery, Aggravated_Assault, Month, Year) |>
  filter(
    !is.na(Violent_Crime),
    !is.na(Murder),
    !is.na(Robbery),
    !is.na(Aggravated_Assault)
  )

cat("Regression dataset rows:", nrow(reg_data), "\n")

Regression dataset rows: 4559

4.2 Fit the Model

Code

model <- lm(Violent_Crime ~ Murder + Robbery + Aggravated_Assault + Month,
            data = reg_data)

summary(model)


Call:
lm(formula = Violent_Crime ~ Murder + Robbery + Aggravated_Assault + 
    Month, data = reg_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-695.68  -12.77    1.67   15.34  733.43 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        14.332458   2.285138   6.272 3.89e-10 ***
Murder              0.451636   0.054950   8.219 2.65e-16 ***
Robbery             1.146789   0.003121 367.451  < 2e-16 ***
Aggravated_Assault  1.085604   0.001660 654.152  < 2e-16 ***
Month              -2.064781   0.309521  -6.671 2.85e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 72.69 on 4554 degrees of freedom
Multiple R-squared:  0.9999,    Adjusted R-squared:  0.9999 
F-statistic: 1.057e+07 on 4 and 4554 DF,  p-value: < 2.2e-16

4.3 Regression Equation

Based on the fitted coefficients:

\[ \hat{Violent\_Crime} = \beta_0 + \beta_1 \cdot Murder + \beta_2 \cdot Robbery + \beta_3 \cdot Aggravated\_Assault + \beta_4 \cdot Month + \varepsilon \] \[ \hat{Violent\_Crime} = 14.33 + 0.4516 \cdot Murder + 1.1468 \cdot Robbery + 1.0856 \cdot Aggravated\_Assault - 2.0648 \cdot Month \]

Code

coefs <- coef(model)
cat(sprintf(
  "Violent_Crime = %.2f + %.4f·Murder + %.4f·Robbery + %.4f·Aggravated_Assault + %.4f·Month\n",
  coefs["(Intercept)"], coefs["Murder"], coefs["Robbery"],
  coefs["Aggravated_Assault"], coefs["Month"]
))

Violent_Crime = 14.33 + 0.4516·Murder + 1.1468·Robbery + 1.0856·Aggravated_Assault + -2.0648·Month

4.4 Model Performance Summary

Code

model_stats <- glance(model)

kable(
  model_stats |> select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs),
  col.names = c("R²", "Adj. R²", "RMSE", "F-Statistic", "p-value", "df", "Observations"),
  caption = "Multiple Linear Regression Model Summary",
  digits = 4
) |>
  kable_styling(bootstrap_options = c("striped", "hover"))

Multiple Linear Regression Model Summary
R²	Adj. R²	RMSE	F-Statistic	p-value	df	Observations
0.9999	0.9999	72.6944	10568530	0	4	4559

Interpretation:

The adjusted R² ≈ 1 indicates that our predictors explain approximately 100% of the variance in violent crime counts — a strong fit.
All three crime sub-categories have p-values < 0.001, confirming they are statistically significant predictors of overall violent crime.
Month captures seasonal variation; its coefficient reflects the incremental change in violent crime as the calendar month increases by one.
The model’s overall F-statistic is highly significant (p < 0.0001), confirming the joint predictive power of the included variables.

4.5 Coefficient Table

Code

tidy(model) |>
  kable(
    col.names = c("Term", "Estimate", "Std. Error", "t-Statistic", "p-value"),
    caption = "Regression Coefficients",
    digits = 4
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover")) |>
  row_spec(which(tidy(model)$p.value < 0.05), bold = TRUE, color = "white", background = "#2c7bb6")

Regression Coefficients
Term	Estimate	Std. Error	t-Statistic
(Intercept)	14.3325	2.2851	6.2720
Murder	0.4516	0.0549	8.2191
Robbery	1.1468	0.0031	367.4506
Aggravated_Assault	1.0856	0.0017	654.1516
Month	-2.0648	0.3095	-6.6709

4.6 Diagnostic Plots

Code

par(mfrow = c(2, 2))
plot(model, which = 1:4)

Code

par(mfrow = c(1, 1))

Diagnostic Interpretation:

Residuals vs. Fitted: The residuals show some heteroscedasticity at higher fitted values, which is expected given that the “Full Sample” aggregates agencies of vastly different sizes. A log-transform could improve this in future work.
Q-Q Plot: The residuals follow a roughly normal distribution in the middle range, with heavier tails at extremes — a common pattern in count data aggregated across time and agencies.
Scale-Location: Slight fanning of residuals at higher fitted values confirms mild heteroscedasticity.
Cook’s Distance: A small number of high-leverage points exist (likely early-pandemic months or large-agency outliers), but none exceed the conventional Cook’s D threshold of 1.0, so no observations are removed.

Step 5: Data Visualization

5.1 Property Crime by Region Over Time (2018–2025)

Code

# Annual regional property crime totals
regional_annual <- agency_data |>
  filter(Year <= 2025) |>
  group_by(Year, Region) |>
  summarise(
    Total_Property_Crime = sum(Property_Crime, na.rm = TRUE),
    Total_Violent_Crime  = sum(Violent_Crime, na.rm = TRUE),
    .groups = "drop"
  )

# Custom color palette — NOT ggplot defaults
region_colors <- c(
  "Northeast" = "#1a6b8a",
  "South"     = "#e05c2a",
  "Midwest"   = "#4caf72",
  "West"      = "#9b59b6"
)

ggplot(regional_annual, aes(x = Year, y = Total_Property_Crime / 1000,
                             color = Region, group = Region)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3, aes(shape = Region)) +
  # Shade the COVID era
  annotate("rect", xmin = 2020, xmax = 2021.5,
           ymin = -Inf, ymax = Inf,
           alpha = 0.08, fill = "#888888") +
  annotate("text", x = 2020.75, y = Inf, vjust = 1.5,
           label = "COVID-19\nPeriod", size = 3, color = "#555555") +
  scale_color_manual(values = region_colors,
                     name = "U.S. Census Region") +
  scale_shape_manual(values = c(16, 17, 15, 18),
                     name = "U.S. Census Region") +
  scale_x_continuous(breaks = 2018:2025) +
  scale_y_continuous(labels = label_comma(suffix = "K")) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "#555555", size = 11),
    legend.position = "right",
    panel.grid.minor = element_blank(),
    axis.text.x = element_text(angle = 30, hjust = 1)
  ) +
  labs(
    title    = "Annual Property Crime Totals by U.S. Census Region (2018–2025)",
    subtitle = "Reporting agencies with complete monthly data · Counts in thousands",
    x        = "Year",
    y        = "Total Property Crimes (Thousands)",
    caption  = "Source: FBI Real-Time Crime Index via AH Datalytics · github.com/ah-datalytics/rtci"
  )

Source: FBI Real-Time Crime Index via AH Datalytics (https://ah-datalytics.github.io/rtci/list/list.html). Only agencies reporting complete monthly data are included. ‘Full Sample’ aggregate rows are excluded; only individual agency data is used.

5.2 Seasonal Violent Crime Heatmap by Year and Month

Code

heatmap_data <- national |>
  filter(Year >= 2018, Year <= 2025) |>
  group_by(Year, Month) |>
  summarise(Total_Violent = sum(Violent_Crime, na.rm = TRUE), .groups = "drop") |>
  mutate(Month_Name = factor(month.abb[Month], levels = month.abb))

ggplot(heatmap_data, aes(x = Month_Name, y = factor(Year), fill = Total_Violent)) +
  geom_tile(color = "white", linewidth = 0.4) +
  scale_fill_gradient2(
    low      = "#f7f7f7",
    mid      = "#74add1",
    high     = "#d73027",
    midpoint = median(heatmap_data$Total_Violent, na.rm = TRUE),
    labels   = label_comma(),
    name     = "Violent\nCrimes"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text  = element_text(size = 10),
    panel.grid = element_blank()
  ) +
  labs(
    title   = "Monthly Violent Crime Heatmap: National Aggregate (2018–2025)",
    subtitle = "Darker red = more violent crimes; blue = fewer",
    x       = "Month",
    y       = "Year",
    caption = "Source: FBI Real-Time Crime Index via AH Datalytics · Full-Sample national aggregate rows only."
  )

Source: FBI RTCI Full-Sample National Aggregate, 2018–2025.

5.3 Violent vs. Property Crime: Scatter by Region

Code

set.seed(42)
scatter_data <- agency_data |>
  filter(
    Year <= 2025,
    Violent_Crime > 0, Property_Crime > 0,
    Violent_Crime < 5000, Property_Crime < 30000
  ) |>
  slice_sample(n = 3000)

ggplot(scatter_data, aes(x = Violent_Crime, y = Property_Crime,
                          color = Region, alpha = 0.6)) +
  geom_point(size = 1.2) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.1, alpha = 1) +
  scale_color_manual(values = region_colors, name = "U.S. Census Region") +
  scale_x_continuous(labels = label_comma()) +
  scale_y_continuous(labels = label_comma()) +
  guides(alpha = "none") +
  facet_wrap(~Region, ncol = 2, scales = "free") +
  theme_light(base_size = 12) +
  theme(
    strip.background = element_rect(fill = "#2c3e50"),
    strip.text = element_text(color = "white", face = "bold"),
    plot.title = element_text(face = "bold")
  ) +
  labs(
    title    = "Violent Crime vs. Property Crime by Region (Sample of Agency-Month Observations)",
    subtitle = "Linear trend lines fitted per region",
    x        = "Monthly Violent Crime Count",
    y        = "Monthly Property Crime Count",
    caption  = "Source: FBI RTCI via AH Datalytics · Random sample of 3,000 observations · Extreme outliers removed for readability."
  )

Source: FBI RTCI agency-level data, 2018–2025. Each point represents one agency-year-month observation.

Closing Essay

A. Data Cleaning Process

The raw dataset (final_sample__1_.csv) contained 74,413 rows and 34 columns, spanning monthly crime observations from January 2018 through early 2026 across approximately 693 unique law enforcement agencies in 48 states.

Key cleaning decisions made:

1. Handling missing values. Two columns — Latitude and Longitude — were entirely null (100% missing) across all 74,413 rows and were dropped entirely. The Comment column was over 95% null with no consistent values, so it was also removed. Violent Crime and Property Crime were missing in approximately 21,160 rows; inspection revealed these corresponded to agencies that only reported individual crime subcategories without providing totals. Rather than imputing these as sums (which would require assuming no unreported subcategories), rows missing both composite scores were dropped for the regression and visualization requiring these fields, while the subcategory-level data were retained for other analyses.

2. Date parsing. The Date column was stored as a character string (e.g., "January 2018"). This was parsed into a proper R date object using lubridate::mdy() for accurate time-series ordering. The Year and Month columns were cast to integers for grouping operations.

3. Agency segmentation. The dataset contains two conceptually distinct row types: individual agency submissions and a special Agency == "Full Sample" aggregate that combines all agencies with complete data in a given month. These were separated into two subsets — national for the aggregate and agency_data for individual agencies — to avoid double-counting in visualizations.

4. Region recoding. The Region variable contained a catch-all value "Other", applied to rows associated with nationwide or cross-regional aggregates. This was relabeled "Nationwide/Aggregate" and converted to a factor with logical level ordering to improve clarity in plots.

5. Column renaming. Columns with spaces (e.g., Aggravated Assault, Violent Crime) were renamed using underscores to facilitate R’s formula syntax in regression modeling.

After cleaning, the dataset retained 53,253 rows for analysis (agency-level data with complete crime counts).

B. Visualization Findings

Regional Property Crime Trends (2018–2025): The South consistently reports the highest total property crime counts of any region, followed closely by the West. This reflects both higher populations in Southern states and the large geographic footprint of agencies in the dataset from California and Texas. A striking observation is the sharp drop in property crime across all regions in 2020, coinciding with the COVID-19 pandemic — lockdowns reduced retail activity and public movement, which likely suppressed crimes like theft and burglary. A partial rebound is visible through 2021–2022, though totals have not returned to pre-pandemic levels in the Northeast and Midwest as of 2025.

Seasonal Heatmap: The heatmap reveals a clear summer peak in violent crime in most years, with July and August consistently among the highest-crime months nationally. The year 2020 shows notably lighter colors in the spring months (March–May), again reflecting pandemic disruption, before returning to typical summer peaks in June–August 2020.

Violent vs. Property Crime Scatter: Across all regions, there is a strong positive correlation between violent and property crime counts — unsurprising, since high-crime jurisdictions tend to see elevated rates across all categories. The Midwest shows a slightly steeper regression slope than the Northeast, suggesting that in Midwestern agencies, incremental increases in violent crime are associated with larger simultaneous increases in property crime.

C. Limitations and Wishlist

Several analytical directions were explored but could not be fully completed within the scope of this project:

Agency-level regression with fixed effects: A more rigorous model would include agency-level fixed effects (or random effects) to control for the fact that agencies vary enormously in size, reporting completeness, and local crime environment. A multilevel model (lme4::lmer) was started but not completed due to convergence challenges with the unbalanced panel structure.
Per-capita normalization: Ideally, crime counts would be divided by FBI.Population.Covered to produce per-capita rates, enabling fairer comparisons between large metropolitan agencies and small-town departments. Population coverage was available but varied across time for the same agency, complicating the normalization.
Interactive visualization: A plotly or leaflet map showing per-state crime rates would have been informative, but Latitude and Longitude were entirely missing, and the State field would require joining with a geographic shapefile — a task deferred for future work.
Time-series forecasting: Given the monthly structure of the data, an ARIMA or ETS model predicting future violent crime counts would be a natural extension, especially given the visible seasonal patterns in the heatmap.

Dataset source: AH Datalytics Real-Time Crime Index — https://realtimecrimeindex.com/]