U.S. Crime Trends: A Statistical Analysis of FBI Real-Time Crime Data (2018–2026)

Author

Your Name Here

Published

April 14, 2026

Introduction

Crime in the United States is a multifaceted social phenomenon shaped by geography, economics, seasonality, and policing policy. This project analyzes monthly crime statistics drawn from the FBI’s Real-Time Crime Index (RTCI), a dataset that aggregates voluntary submissions from local law enforcement agencies across the country. The data spans January 2018 through early 2026, covering crimes across seven categories: murder, rape, robbery, aggravated assault, burglary, theft, and motor vehicle theft.

Research Questions:

  1. Can we predict violent crime counts from individual crime sub-categories (murder, robbery, aggravated assault) using multiple linear regression?
  2. How has property crime varied by U.S. Census region over time, and are there notable seasonal patterns?
  3. Did the COVID-19 pandemic (2020–2021) visibly disrupt national crime trends?

The dataset was sourced from the AH Datalytics Real-Time Crime Index (source link), which compiles FBI NIBRS and UCR agency-level monthly reports into a unified CSV format updated regularly.


Setup: Load Libraries

Code
library(tidyverse)
library(lubridate)
library(scales)
library(ggplot2)
library(broom)
library(knitr)
library(kableExtra)
library(patchwork)

Step 1: Load the Dataset

Code
crime_raw <- read_csv("C:/Users/amanu/Downloads/final_sample.csv", show_col_types = FALSE)
# Preview structure
glimpse(crime_raw)
Rows: 74,413
Columns: 34
$ Month                          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, …
$ Year                           <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 201…
$ Date                           <chr> "January 2018", "February 2018", "March…
$ Agency                         <chr> "Agencies of < 100K", "Agencies of < 10…
$ State                          <chr> "Nationwide", "Nationwide", "Nationwide…
$ Region                         <chr> "Other", "Other", "Other", "Other", "Ot…
$ Agency_State                   <chr> "Agencies of < 100K, Nationwide", "Agen…
$ Murder                         <dbl> 63, 24, 41, 45, 44, 37, 45, 53, 45, 46,…
$ Rape                           <dbl> 566, 470, 545, 513, 632, 609, 599, 626,…
$ Robbery                        <dbl> 977, 760, 791, 868, 834, 848, 906, 920,…
$ `Aggravated Assault`           <dbl> 2293, 2053, 2393, 2492, 2750, 2876, 285…
$ Burglary                       <dbl> 4796, 4145, 4514, 4467, 4938, 4695, 519…
$ Theft                          <dbl> 23337, 19892, 22067, 22221, 24310, 2399…
$ `Motor Vehicle Theft`          <dbl> 2872, 2452, 2565, 2450, 2708, 2590, 281…
$ `Violent Crime`                <dbl> 3899, 3307, 3770, 3918, 4260, 4370, 440…
$ `Property Crime`               <dbl> 31005, 26489, 29146, 29138, 31956, 3128…
$ Murder_mvs_12mo                <dbl> 537, 511, 516, 518, 518, 509, 506, 524,…
$ Burglary_mvs_12mo              <dbl> 63155, 62661, 62399, 61737, 61108, 6045…
$ Rape_mvs_12mo                  <dbl> 6289, 6330, 6371, 6323, 6369, 6415, 646…
$ Robbery_mvs_12mo               <dbl> 11752, 11658, 11561, 11500, 11302, 1116…
$ `Aggravated Assault_mvs_12mo`  <dbl> 29808, 29830, 29808, 29906, 30027, 3027…
$ `Motor Vehicle Theft_mvs_12mo` <dbl> 32905, 32894, 32938, 32719, 32656, 3245…
$ Theft_mvs_12mo                 <dbl> 298448, 296650, 295528, 293932, 292343,…
$ `Violent Crime_mvs_12mo`       <dbl> 48386, 48329, 48256, 48247, 48216, 4836…
$ `Property Crime_mvs_12mo`      <dbl> 394508, 392205, 390865, 388388, 386107,…
$ Source.Link                    <chr> "https://ah-datalytics.github.io/rtci/l…
$ Source.Type                    <chr> "Aggregate", "Aggregate", "Aggregate", …
$ Source.Method                  <chr> "All agencies with complete data throug…
$ FBI.Population.Covered         <dbl> 16311785, 16311785, 16311785, 16311785,…
$ Number.of.Agencies             <dbl> 235, 235, 235, 235, 235, 235, 235, 235,…
$ Latitude                       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Longitude                      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Comment                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ `Last Updated`                 <chr> "2026-03-23 11:37:36 EDT", "2026-03-23 …

The raw dataset contains 74413 rows and 34 columns, representing monthly crime observations from agencies across 48 states and the District of Columbia.


Step 2: Data Cleaning

2.1 Initial Inspection

Code
# Count missing values per column
missing_summary <- crime_raw |>
  summarise(across(everything(), ~sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Column", values_to = "Missing") |>
  filter(Missing > 0) |>
  arrange(desc(Missing))

kable(missing_summary, caption = "Columns with Missing Values") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Columns with Missing Values
Column Missing
Latitude 74413
Longitude 74413
Comment 70666
Violent Crime 21160
Property Crime 21160
Violent Crime_mvs_12mo 21160
Property Crime_mvs_12mo 21160
Rape_mvs_12mo 1654
Rape 1541
Aggravated Assault_mvs_12mo 1171
Aggravated Assault 1058
Burglary_mvs_12mo 790
Theft_mvs_12mo 781
Motor Vehicle Theft_mvs_12mo 705
Robbery_mvs_12mo 694
Burglary 677
Theft 669
Robbery 582
Motor Vehicle Theft 582
Murder_mvs_12mo 218
Murder 105

2.2 Cleaning Steps

Code
crime_clean <- crime_raw |>
  # 1. Parse Date column into proper date type
  mutate(
    Date_parsed = mdy(Date),
    Year = as.integer(Year),
    Month = as.integer(Month)
  ) |>
  # 2. Remove rows where both Violent Crime and Property Crime are NA
  #    (these are structurally incomplete agency records)
  filter(!(is.na(`Violent Crime`) & is.na(`Property Crime`))) |>
  # 3. Drop columns with 100% missingness (Latitude, Longitude)
  select(-Latitude, -Longitude) |>
  # 4. Remove the Comment column (>95% missing, no analytical value)
  select(-Comment) |>
  # 5. Standardize Region: relabel "Other" to "Nationwide/Aggregate"
  mutate(
    Region = if_else(Region == "Other", "Nationwide/Aggregate", Region),
    Region = factor(Region, levels = c("Northeast", "South", "Midwest", "West", "Nationwide/Aggregate"))
  ) |>
  # 6. Rename columns with spaces for easier handling
  rename(
    Aggravated_Assault = `Aggravated Assault`,
    Motor_Vehicle_Theft = `Motor Vehicle Theft`,
    Violent_Crime = `Violent Crime`,
    Property_Crime = `Property Crime`
  )

cat("Rows after cleaning:", nrow(crime_clean), "\n")
Rows after cleaning: 53253 
Code
cat("Columns after cleaning:", ncol(crime_clean), "\n")
Columns after cleaning: 32 

2.3 Create Analytical Subsets

Code
# Full-sample national aggregate rows (one per month-agency combination for national rollup)
national <- crime_clean |>
  filter(Agency == "Full Sample", !is.na(Violent_Crime))

# Agency-level data (exclude aggregate rows), with complete violent crime
agency_data <- crime_clean |>
  filter(Agency != "Full Sample",
         !is.na(Violent_Crime),
         Region != "Nationwide/Aggregate")

cat("National aggregate rows:", nrow(national), "\n")
National aggregate rows: 4559 
Code
cat("Agency-level rows:", nrow(agency_data), "\n")
Agency-level rows: 48306 

Step 3: Exploratory Data Analysis

3.1 National Crime Totals by Year

Code
annual_national <- national |>
  group_by(Year) |>
  summarise(
    Total_Violent = sum(Violent_Crime, na.rm = TRUE),
    Total_Property = sum(Property_Crime, na.rm = TRUE),
    Total_Murder = sum(Murder, na.rm = TRUE),
    .groups = "drop"
  ) |>
  filter(Year < 2026)  # 2026 is partial year

kable(annual_national,
      col.names = c("Year", "Total Violent Crime", "Total Property Crime", "Total Murders"),
      caption = "Annual National Crime Totals (Full Sample Agencies)",
      format.args = list(big.mark = ",")) |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Annual National Crime Totals (Full Sample Agencies)
Year Total Violent Crime Total Property Crime Total Murders
2,018 1,109,962 5,491,626 15,576
2,019 1,101,358 5,392,292 15,970
2,020 1,168,586 5,049,744 21,272
2,021 1,192,344 5,116,386 22,382
2,022 1,210,802 5,520,632 21,600
2,023 1,198,842 5,528,252 19,302
2,024 1,154,780 5,107,256 16,494
2,025 1,032,358 4,481,350 13,574

3.2 Seasonal Patterns

Code
monthly_avg <- national |>
  group_by(Month) |>
  summarise(
    Avg_Violent = mean(Violent_Crime, na.rm = TRUE),
    Avg_Property = mean(Property_Crime, na.rm = TRUE),
    .groups = "drop"
  ) |>
  mutate(Month_Name = month(Month, label = TRUE, abbr = TRUE))

monthly_avg |>
  kable(col.names = c("Month #", "Avg Violent Crime", "Avg Property Crime", "Month"),
        caption = "Average Monthly Crime Counts Across All Years",
        digits = 0,
        format.args = list(big.mark = ",")) |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Average Monthly Crime Counts Across All Years
Month # Avg Violent Crime Avg Property Crime Month
1 1,868 9,053 Jan
2 1,701 8,289 Feb
3 1,921 8,844 Mar
4 1,953 8,689 Apr
5 2,193 9,346 May
6 2,185 9,295 Jun
7 2,271 9,767 Jul
8 2,205 9,750 Aug
9 2,108 9,327 Sep
10 2,112 9,700 Oct
11 1,915 9,094 Nov
12 1,914 9,442 Dec

Step 4: Multiple Linear Regression

4.1 Model Setup

We model Violent Crime as a function of its three sub-components: Murder, Robbery, and Aggravated Assault — plus Month as a seasonal control.

Because Violent Crime is defined as the sum of these three components plus rape, including all four would create perfect multicollinearity. We therefore use Murder, Robbery, and Aggravated Assault as predictors and use Month as a seasonal nuisance variable.

Code
reg_data <- national |>
  select(Violent_Crime, Murder, Robbery, Aggravated_Assault, Month, Year) |>
  filter(
    !is.na(Violent_Crime),
    !is.na(Murder),
    !is.na(Robbery),
    !is.na(Aggravated_Assault)
  )

cat("Regression dataset rows:", nrow(reg_data), "\n")
Regression dataset rows: 4559 

4.2 Fit the Model

Code
model <- lm(Violent_Crime ~ Murder + Robbery + Aggravated_Assault + Month,
            data = reg_data)

summary(model)

Call:
lm(formula = Violent_Crime ~ Murder + Robbery + Aggravated_Assault + 
    Month, data = reg_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-695.68  -12.77    1.67   15.34  733.43 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        14.332458   2.285138   6.272 3.89e-10 ***
Murder              0.451636   0.054950   8.219 2.65e-16 ***
Robbery             1.146789   0.003121 367.451  < 2e-16 ***
Aggravated_Assault  1.085604   0.001660 654.152  < 2e-16 ***
Month              -2.064781   0.309521  -6.671 2.85e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 72.69 on 4554 degrees of freedom
Multiple R-squared:  0.9999,    Adjusted R-squared:  0.9999 
F-statistic: 1.057e+07 on 4 and 4554 DF,  p-value: < 2.2e-16

4.3 Regression Equation

Based on the fitted coefficients:

\[ \hat{Violent\_Crime} = \beta_0 + \beta_1 \cdot Murder + \beta_2 \cdot Robbery + \beta_3 \cdot Aggravated\_Assault + \beta_4 \cdot Month + \varepsilon \] \[ \hat{Violent\_Crime} = 14.33 + 0.4516 \cdot Murder + 1.1468 \cdot Robbery + 1.0856 \cdot Aggravated\_Assault - 2.0648 \cdot Month \]

Code
coefs <- coef(model)
cat(sprintf(
  "Violent_Crime = %.2f + %.4f·Murder + %.4f·Robbery + %.4f·Aggravated_Assault + %.4f·Month\n",
  coefs["(Intercept)"], coefs["Murder"], coefs["Robbery"],
  coefs["Aggravated_Assault"], coefs["Month"]
))
Violent_Crime = 14.33 + 0.4516·Murder + 1.1468·Robbery + 1.0856·Aggravated_Assault + -2.0648·Month

4.4 Model Performance Summary

Code
model_stats <- glance(model)

kable(
  model_stats |> select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs),
  col.names = c("R²", "Adj. R²", "RMSE", "F-Statistic", "p-value", "df", "Observations"),
  caption = "Multiple Linear Regression Model Summary",
  digits = 4
) |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Multiple Linear Regression Model Summary
Adj. R² RMSE F-Statistic p-value df Observations
0.9999 0.9999 72.6944 10568530 0 4 4559

Interpretation:

  • The adjusted R² ≈ 1 indicates that our predictors explain approximately 100% of the variance in violent crime counts — a strong fit.
  • All three crime sub-categories have p-values < 0.001, confirming they are statistically significant predictors of overall violent crime.
  • Month captures seasonal variation; its coefficient reflects the incremental change in violent crime as the calendar month increases by one.
  • The model’s overall F-statistic is highly significant (p < 0.0001), confirming the joint predictive power of the included variables.

4.5 Coefficient Table

Code
tidy(model) |>
  kable(
    col.names = c("Term", "Estimate", "Std. Error", "t-Statistic", "p-value"),
    caption = "Regression Coefficients",
    digits = 4
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover")) |>
  row_spec(which(tidy(model)$p.value < 0.05), bold = TRUE, color = "white", background = "#2c7bb6")
Regression Coefficients
Term Estimate Std. Error t-Statistic p-value
(Intercept) 14.3325 2.2851 6.2720 0
Murder 0.4516 0.0549 8.2191 0
Robbery 1.1468 0.0031 367.4506 0
Aggravated_Assault 1.0856 0.0017 654.1516 0
Month -2.0648 0.3095 -6.6709 0

4.6 Diagnostic Plots

Code
par(mfrow = c(2, 2))
plot(model, which = 1:4)

Regression Diagnostic Plots
Code
par(mfrow = c(1, 1))

Diagnostic Interpretation:

  • Residuals vs. Fitted: The residuals show some heteroscedasticity at higher fitted values, which is expected given that the “Full Sample” aggregates agencies of vastly different sizes. A log-transform could improve this in future work.
  • Q-Q Plot: The residuals follow a roughly normal distribution in the middle range, with heavier tails at extremes — a common pattern in count data aggregated across time and agencies.
  • Scale-Location: Slight fanning of residuals at higher fitted values confirms mild heteroscedasticity.
  • Cook’s Distance: A small number of high-leverage points exist (likely early-pandemic months or large-agency outliers), but none exceed the conventional Cook’s D threshold of 1.0, so no observations are removed.

Step 5: Data Visualization

5.1 Property Crime by Region Over Time (2018–2025)

Code
# Annual regional property crime totals
regional_annual <- agency_data |>
  filter(Year <= 2025) |>
  group_by(Year, Region) |>
  summarise(
    Total_Property_Crime = sum(Property_Crime, na.rm = TRUE),
    Total_Violent_Crime  = sum(Violent_Crime, na.rm = TRUE),
    .groups = "drop"
  )

# Custom color palette — NOT ggplot defaults
region_colors <- c(
  "Northeast" = "#1a6b8a",
  "South"     = "#e05c2a",
  "Midwest"   = "#4caf72",
  "West"      = "#9b59b6"
)

ggplot(regional_annual, aes(x = Year, y = Total_Property_Crime / 1000,
                             color = Region, group = Region)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3, aes(shape = Region)) +
  # Shade the COVID era
  annotate("rect", xmin = 2020, xmax = 2021.5,
           ymin = -Inf, ymax = Inf,
           alpha = 0.08, fill = "#888888") +
  annotate("text", x = 2020.75, y = Inf, vjust = 1.5,
           label = "COVID-19\nPeriod", size = 3, color = "#555555") +
  scale_color_manual(values = region_colors,
                     name = "U.S. Census Region") +
  scale_shape_manual(values = c(16, 17, 15, 18),
                     name = "U.S. Census Region") +
  scale_x_continuous(breaks = 2018:2025) +
  scale_y_continuous(labels = label_comma(suffix = "K")) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "#555555", size = 11),
    legend.position = "right",
    panel.grid.minor = element_blank(),
    axis.text.x = element_text(angle = 30, hjust = 1)
  ) +
  labs(
    title    = "Annual Property Crime Totals by U.S. Census Region (2018–2025)",
    subtitle = "Reporting agencies with complete monthly data · Counts in thousands",
    x        = "Year",
    y        = "Total Property Crimes (Thousands)",
    caption  = "Source: FBI Real-Time Crime Index via AH Datalytics · github.com/ah-datalytics/rtci"
  )

Source: FBI Real-Time Crime Index via AH Datalytics (https://ah-datalytics.github.io/rtci/list/list.html). Only agencies reporting complete monthly data are included. ‘Full Sample’ aggregate rows are excluded; only individual agency data is used.

5.2 Seasonal Violent Crime Heatmap by Year and Month

Code
heatmap_data <- national |>
  filter(Year >= 2018, Year <= 2025) |>
  group_by(Year, Month) |>
  summarise(Total_Violent = sum(Violent_Crime, na.rm = TRUE), .groups = "drop") |>
  mutate(Month_Name = factor(month.abb[Month], levels = month.abb))

ggplot(heatmap_data, aes(x = Month_Name, y = factor(Year), fill = Total_Violent)) +
  geom_tile(color = "white", linewidth = 0.4) +
  scale_fill_gradient2(
    low      = "#f7f7f7",
    mid      = "#74add1",
    high     = "#d73027",
    midpoint = median(heatmap_data$Total_Violent, na.rm = TRUE),
    labels   = label_comma(),
    name     = "Violent\nCrimes"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text  = element_text(size = 10),
    panel.grid = element_blank()
  ) +
  labs(
    title   = "Monthly Violent Crime Heatmap: National Aggregate (2018–2025)",
    subtitle = "Darker red = more violent crimes; blue = fewer",
    x       = "Month",
    y       = "Year",
    caption = "Source: FBI Real-Time Crime Index via AH Datalytics · Full-Sample national aggregate rows only."
  )

Source: FBI RTCI Full-Sample National Aggregate, 2018–2025.

5.3 Violent vs. Property Crime: Scatter by Region

Code
set.seed(42)
scatter_data <- agency_data |>
  filter(
    Year <= 2025,
    Violent_Crime > 0, Property_Crime > 0,
    Violent_Crime < 5000, Property_Crime < 30000
  ) |>
  slice_sample(n = 3000)

ggplot(scatter_data, aes(x = Violent_Crime, y = Property_Crime,
                          color = Region, alpha = 0.6)) +
  geom_point(size = 1.2) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.1, alpha = 1) +
  scale_color_manual(values = region_colors, name = "U.S. Census Region") +
  scale_x_continuous(labels = label_comma()) +
  scale_y_continuous(labels = label_comma()) +
  guides(alpha = "none") +
  facet_wrap(~Region, ncol = 2, scales = "free") +
  theme_light(base_size = 12) +
  theme(
    strip.background = element_rect(fill = "#2c3e50"),
    strip.text = element_text(color = "white", face = "bold"),
    plot.title = element_text(face = "bold")
  ) +
  labs(
    title    = "Violent Crime vs. Property Crime by Region (Sample of Agency-Month Observations)",
    subtitle = "Linear trend lines fitted per region",
    x        = "Monthly Violent Crime Count",
    y        = "Monthly Property Crime Count",
    caption  = "Source: FBI RTCI via AH Datalytics · Random sample of 3,000 observations · Extreme outliers removed for readability."
  )

Source: FBI RTCI agency-level data, 2018–2025. Each point represents one agency-year-month observation.

Closing Essay

A. Data Cleaning Process

The raw dataset (final_sample__1_.csv) contained 74,413 rows and 34 columns, spanning monthly crime observations from January 2018 through early 2026 across approximately 693 unique law enforcement agencies in 48 states.

Key cleaning decisions made:

1. Handling missing values. Two columns — Latitude and Longitude — were entirely null (100% missing) across all 74,413 rows and were dropped entirely. The Comment column was over 95% null with no consistent values, so it was also removed. Violent Crime and Property Crime were missing in approximately 21,160 rows; inspection revealed these corresponded to agencies that only reported individual crime subcategories without providing totals. Rather than imputing these as sums (which would require assuming no unreported subcategories), rows missing both composite scores were dropped for the regression and visualization requiring these fields, while the subcategory-level data were retained for other analyses.

2. Date parsing. The Date column was stored as a character string (e.g., "January 2018"). This was parsed into a proper R date object using lubridate::mdy() for accurate time-series ordering. The Year and Month columns were cast to integers for grouping operations.

3. Agency segmentation. The dataset contains two conceptually distinct row types: individual agency submissions and a special Agency == "Full Sample" aggregate that combines all agencies with complete data in a given month. These were separated into two subsets — national for the aggregate and agency_data for individual agencies — to avoid double-counting in visualizations.

4. Region recoding. The Region variable contained a catch-all value "Other", applied to rows associated with nationwide or cross-regional aggregates. This was relabeled "Nationwide/Aggregate" and converted to a factor with logical level ordering to improve clarity in plots.

5. Column renaming. Columns with spaces (e.g., Aggravated Assault, Violent Crime) were renamed using underscores to facilitate R’s formula syntax in regression modeling.

After cleaning, the dataset retained 53,253 rows for analysis (agency-level data with complete crime counts).


B. Visualization Findings

Regional Property Crime Trends (2018–2025): The South consistently reports the highest total property crime counts of any region, followed closely by the West. This reflects both higher populations in Southern states and the large geographic footprint of agencies in the dataset from California and Texas. A striking observation is the sharp drop in property crime across all regions in 2020, coinciding with the COVID-19 pandemic — lockdowns reduced retail activity and public movement, which likely suppressed crimes like theft and burglary. A partial rebound is visible through 2021–2022, though totals have not returned to pre-pandemic levels in the Northeast and Midwest as of 2025.

Seasonal Heatmap: The heatmap reveals a clear summer peak in violent crime in most years, with July and August consistently among the highest-crime months nationally. The year 2020 shows notably lighter colors in the spring months (March–May), again reflecting pandemic disruption, before returning to typical summer peaks in June–August 2020.

Violent vs. Property Crime Scatter: Across all regions, there is a strong positive correlation between violent and property crime counts — unsurprising, since high-crime jurisdictions tend to see elevated rates across all categories. The Midwest shows a slightly steeper regression slope than the Northeast, suggesting that in Midwestern agencies, incremental increases in violent crime are associated with larger simultaneous increases in property crime.


C. Limitations and Wishlist

Several analytical directions were explored but could not be fully completed within the scope of this project:

  • Agency-level regression with fixed effects: A more rigorous model would include agency-level fixed effects (or random effects) to control for the fact that agencies vary enormously in size, reporting completeness, and local crime environment. A multilevel model (lme4::lmer) was started but not completed due to convergence challenges with the unbalanced panel structure.

  • Per-capita normalization: Ideally, crime counts would be divided by FBI.Population.Covered to produce per-capita rates, enabling fairer comparisons between large metropolitan agencies and small-town departments. Population coverage was available but varied across time for the same agency, complicating the normalization.

  • Interactive visualization: A plotly or leaflet map showing per-state crime rates would have been informative, but Latitude and Longitude were entirely missing, and the State field would require joining with a geographic shapefile — a task deferred for future work.

  • Time-series forecasting: Given the monthly structure of the data, an ARIMA or ETS model predicting future violent crime counts would be a natural extension, especially given the visible seasonal patterns in the heatmap.


Dataset source: AH Datalytics Real-Time Crime Index — https://realtimecrimeindex.com/]