To what extent do murder frequencies vary by time of day and borough in New York City?

Author

SID 560617919

Client: Jessica S. Tisch

Jessica S. Tisch is the police commissioner of New York as of November 2024. As police commissioner, She is responsible for: overall crime fighting strategies, budget management and the enforcement of all laws in the NYPD

Code
knitr::include_graphics("image.png")

Recommendation

Jessica S. Tisch is recommended implement a targeted policing strategy that allocates increased patrols during 19:00–23:00 and 00:00–04:00 (52.8% of murders occur during these periods) and concentrates murder task forces in The Bronx and Queens as they account for 64% of murders in NYC.

Evidence

Research Question: To what extent do murder frequencies vary by time of day and borough in New York City?

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(lubridate)
library(ggplot2)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
Code
data = read.csv("NYPD_Complaint_Data_2024.csv")
data2 <- data %>%
  filter(OFNS_DESC == "MURDER & NON-NEGL. MANSLAUGHTER") %>%
  mutate(parsed = parse_date_time(CMPLNT_FR_TM, orders = c("HMS","HM","IMp","H"))) %>%
  mutate(hour = hour(parsed)) %>%
  filter(!is.na(hour))
p = ggplot(data2, aes(x = hour)) + 
    geom_histogram(bins = 24, fill = "lavenderblush", colour = "black") + # Change number of bins from 30 (default) to 15
    scale_x_continuous(breaks = 0:23) +
    labs(x = "Time Of Day", y = "Number of muder incidents", title = "Figure 1: When during the day do murders most commonly occur? ") 
ggplotly(p)

Figure 1 suggests that murder incidents are not evenly distributed throughout the day. 26.4% of the murders occurred between 19:00 and 23:00 , despite this time period only representing 20.8% of the day. This statistic is also seen as 26.4% of murders occur from 0:00 to 4:00 this suggests that murders are disproportionately concentrated during later evening hours and early morning hours, this aligns with The Australian institute of Criminology that found when participating in night time leisure and intoxication, there is an increased risk of homicide to occur. Therefore increasing police patrols during late evening periods is supported by statistical evidence.

Code
data = read.csv("NYPD_Complaint_Data_2024.csv")
murders <- data %>%
  filter(OFNS_DESC == "MURDER & NON-NEGL. MANSLAUGHTER")

murders_by_boro <- murders %>%
  count(BORO_NM, sort = TRUE) %>%
  slice_head(n = 20)   
print(murders_by_boro)
        BORO_NM  n
1         BRONX 30
2      BROOKLYN 24
3     MANHATTAN 16
4        QUEENS 14
5 STATEN ISLAND  3
Code
p = ggplot(murders_by_boro, aes(x = fct_reorder(BORO_NM, n), y = n)) +
  geom_col(fill = "plum", colour = "black") +
  coord_flip() +
  labs(x = "Borough", y = "Number of Murders", title = "Figure 2: Distribution of Murders Across NYC Boroughs") +
  theme_minimal()
ggplotly(p)

Figure 2 shows that murders in NYC are not evenly distributed across boroughs, therefore police task forces targeting murders should be targeted rather than equally allocated. Together, The Bronx and Queens account for 54 out of 84 murders that occurred, representing 62% of all murders in the data set. This indicates that these boproughs should be prioritized for intervention strategies and a larger quantity of police task forces. The Bronx accounts for 34.5% of total murders and Queens accounts for 27.6% of total murders suggesting that directing additional specialized murder task forces within the NYPD in the Bronx toward The Bronx be the most effective strategy for addressing the high concentration of murders in the area. External research supports this recommendation: according to The National Institute of Justice, concentrating police resources and specialised violent crime units often reduce violent crime rates, Since The Bronx and Queens reported the highest number of murders in the dataset, increasing specialized murder task forces in these Boroughs is supported by both statistical analysis and external criminology research

Limitations

The key limitation of this recommendation is that the source of the data set is filtered to only include the last 3 months of 2024. This is a short time period and may not accurately represent murder trends across NYC boroughs. The data set only contains 84 murders which is a relatively small sample size, reducing the generalizability of the conclusions. While the data supports prioritising specialized muder task forces in The Bronx and Queens, further analysis with a larger sample size would strengthen the validity of the recommendation # Ethics Statement This investigation adhered to the ethical principle of objectivity by making recommendations based entirely on statistical evidence and reliable external research rather than personal opinion. The shared value of professionalism was applied by using a reliable statistical model, and supporting recommendations with credible external articles.

AI usage statement

Chat GPT 5.5 published by OpenAI (URL:https://chatgpt.com/) was used to help me find and understand the outsourced articles.

Acknowledgements

  1. Long, J. (2010, May 17). Combine a list of data frames into one data frame by row. Stack Overflow https://stackoverflow.com/questions/2851327/combine-a-list-of-data-frames-into-one-data-frame-by-row
  2. Nicholas, T. (2025, April 9). 12 Citing Articles & Bibliography Styles. Quarto for Scientists https://qmd4sci.njtierney.com/citations-and-styles.html

References

  1. Tomsen, S. and Payne, J. (2016) Homicide and the nighttime economy. Trends & issuesin crime and criminal justice, 521, https://www.aic.gov.au/sites/default/files/2020-05/ti521_homicide_and_the_night-time_economy.pdf

  2. Braga., A. Turchan. B., Papachristos. A., Hureau. D. (2019). Hot Spots Policing and Crime Reduction: An Update of an Ongoing Systematic Review and Meta-Analysis Journal of Experimental Criminology , 15(3), https://nij.ojp.gov/library/publications/hot-spots-policing-and-crime-reduction-update-ongoing-systematic-review-and

Appendix

The research question for each analysis were: Are murder incidents more likely to occur during certain times of the day in specific boroughs of NYC rather than being evenly distributed?

H0: Murders are evenly distributed across boroughs

pHighrisk = pother

H1 : Murders are not evenly distributed and are more likely to occur during late-night/early-morning hours and in specific boroughs

pHigh-risk > pother

Assumptions:

  • Observations are independent(each murder is counted once)

  • The sample is representative of NYC murder distributions during the studyy period

TEST STATISTIC

A chi square was made for the murders by borough and time of murders in the NYC data set

When do murders occur during the day?

Code
library(tidyverse)
library(lubridate)
# Load data
data <- read.csv("NYPD_Complaint_Data_2024.csv")

# Filter murders and extract hour
data2 <- data %>%
  filter(OFNS_DESC == "MURDER & NON-NEGL. MANSLAUGHTER") %>%
  mutate(parsed = parse_date_time(CMPLNT_FR_TM, orders = c("HMS","HM","IMp","H"))) %>%
  mutate(hour = hour(parsed)) %>%
  filter(!is.na(hour))

# -----------------------------
# 1. CREATE TIME GROUPS
# -----------------------------
data2 <- data2 %>%
  mutate(time_group = case_when(
    hour >= 0  & hour <= 4  ~ "00:00–04:00",
    hour >= 5  & hour <= 18 ~ "05:00–18:00",
    hour >= 19 & hour <= 23 ~ "19:00–23:00"
  ))

# -----------------------------
# 2. OBSERVED FREQUENCIES
# -----------------------------
obs_table <- data2 %>%
  count(time_group) %>%
  complete(time_group = c("00:00–04:00", "05:00–18:00", "19:00–23:00"),
           fill = list(n = 0)) %>%
  arrange(time_group)

# -----------------------------
# 3. EXPECTED FREQUENCIES (uniform across time)
# -----------------------------
total_murders <- sum(obs_table$n)

exp_table <- obs_table %>%
  mutate(expected = total_murders / 3)

# -----------------------------
# 4. CHI-SQUARE WORKING TABLE
# -----------------------------
chi_table <- exp_table %>%
  mutate(
    diff = n - expected,
    chi_component = (diff^2) / expected
  )

# View table (THIS is what you paste into your report)
chi_table
# A tibble: 3 × 5
  time_group      n expected  diff chi_component
  <chr>       <int>    <dbl> <dbl>         <dbl>
1 00:00–04:00    23       29    -6          1.24
2 05:00–18:00    41       29    12          4.97
3 19:00–23:00    23       29    -6          1.24
Code
# -----------------------------
# 5. TEST STATISTIC
# -----------------------------
chi_square <- sum(chi_table$chi_component)
chi_square
[1] 7.448276
Code
# -----------------------------
# 6. P-VALUE
# -----------------------------
p_value <- pchisq(chi_square, df = 2, lower.tail = FALSE)
p_value
[1] 0.0241339

Murders by borough

Code
library(tidyverse)

# Load data
data <- read.csv("NYPD_Complaint_Data_2024.csv")

# Filter murders
data2 <- data %>%
  filter(OFNS_DESC == "MURDER & NON-NEGL. MANSLAUGHTER")

# -----------------------------
# 1. OBSERVED FREQUENCIES
# -----------------------------
obs_borough <- data2 %>%
  count(BORO_NM) %>%
  arrange(desc(n))

# -----------------------------
# 2. EXPECTED FREQUENCIES (uniform distribution)
# -----------------------------
total_murders <- sum(obs_borough$n)

exp_borough <- obs_borough %>%
  mutate(expected = total_murders / n())

# -----------------------------
# 3. CHI-SQUARE WORKING TABLE
# -----------------------------
chi_borough <- exp_borough %>%
  mutate(
    diff = n - expected,
    chi_component = (diff^2) / expected
  )

# View final table (THIS is what you paste into report)
chi_borough
        BORO_NM  n expected  diff chi_component
1         BRONX 30     17.4  12.6     9.1241379
2      BROOKLYN 24     17.4   6.6     2.5034483
3     MANHATTAN 16     17.4  -1.4     0.1126437
4        QUEENS 14     17.4  -3.4     0.6643678
5 STATEN ISLAND  3     17.4 -14.4    11.9172414
Code
# -----------------------------
# 4. TEST STATISTIC
# -----------------------------
chi_square_borough <- sum(chi_borough$chi_component)
chi_square_borough
[1] 24.32184
Code
# -----------------------------
# 5. P-VALUE
# -----------------------------
p_value_borough <- pchisq(chi_square_borough,
                          df = nrow(obs_borough) - 1,
                          lower.tail = FALSE)

p_value_borough
[1] 6.884398e-05

This test statistic was used to determine whether the observed proportion differed significantly from the null hypothesis.

Code
# 6. P-VALUE
# -----------------------------
p_value <- pchisq(chi_square, df = 2, lower.tail = FALSE)
p_value
[1] 0.0241339

For the time of day analysis of the murder distrubution, the p-value was p = 0.0241. Since this valueis less than p = 0.05, we reject the null hypothesis, meaning murders are not evenly distributed across the times of the day

Code
p_value_borough <- pchisq(chi_square_borough,
                          df = nrow(obs_borough) - 1,
                          lower.tail = FALSE)

p_value_borough
[1] 6.884398e-05

For the borough analysis, the p-value was p = 6.88 × 10⁻⁵ this value is belouw 0.05 so we reject the null hypothesis. provides very strong statistical evidence that murders are not evenly distributed across NYC boroughs

Statistical conclusion: Reject H0, the data provides sufficient evidence against H0.

Scientific conclusion: There is an uneven distribution of murders across the time of day and across boroughs in NYC