DATA1001 Project 2 EDA

Author

560094224

Code

##|message = false
##|warning = false
library(tidyverse)# Load required libraries

Warning: package 'tidyverse' was built under R version 4.5.3

Warning: package 'ggplot2' was built under R version 4.5.3

Warning: package 'tibble' was built under R version 4.5.3

Warning: package 'tidyr' was built under R version 4.5.3

Warning: package 'readr' was built under R version 4.5.3

Warning: package 'purrr' was built under R version 4.5.3

Warning: package 'dplyr' was built under R version 4.5.3

Warning: package 'stringr' was built under R version 4.5.3

Warning: package 'forcats' was built under R version 4.5.3

Warning: package 'lubridate' was built under R version 4.5.3

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(readr)
library(dplyr)
library(ggplot2)
library(lubridate)

# 1. Load the dataset
df <- read_csv("NYPD_Complaint_Data_2024.csv")

New names:
Rows: 136402 Columns: 36
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(24): CMPLNT_NUM, CMPLNT_FR_DT, CMPLNT_TO_DT, CMPLNT_TO_TM, RPT_DT, OFN... dbl
(11): ...1, ADDR_PCT_CD, KY_CD, PD_CD, JURISDICTION_CODE, HOUSING_PSA, ... time
(1): CMPLNT_FR_TM
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

Code

# 2. Crimes by Borough
df_boro <- df %>%
  filter(!is.na(BORO_NM) & BORO_NM != "(null)") %>%
  count(BORO_NM) %>%
  arrange(desc(n))

ggplot(df_boro, aes(x = reorder(BORO_NM, -n), y = n, fill = BORO_NM)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  theme_minimal() +
  labs(title = "Total Crime Complaints by Borough (2024)",
       x = "Borough",
       y = "Number of Complaints") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Code

# 3. Top 10 Offenses
df_offense <- df %>%
  filter(!is.na(OFNS_DESC) & OFNS_DESC != "(null)") %>%
  count(OFNS_DESC) %>%
  top_n(10, n) %>%
  arrange(n)

ggplot(df_offense, aes(x = n, y = reorder(OFNS_DESC, n))) +
  geom_bar(stat = "identity", fill = "steelblue") +
  theme_minimal() +
  labs(title = "Top 10 Crime Offenses Reported in NYC (2024)",
       x = "Number of Complaints",
       y = "Offense Type")

Code

# 4. Crimes by Time of Day
df_time <- df %>%
  filter(!is.na(CMPLNT_FR_TM) & CMPLNT_FR_TM != "(null)") %>%
  mutate(Hour = hour(hms(CMPLNT_FR_TM))) %>%
  count(Hour)

This R script uses the tidyverse toolkit to dive into the 2024 NYPD complaint dataset. After pulling in the raw data, it breaks the analysis down into three easy-to-follow steps.

First, it tallies up the total complaints across the different NYC boroughs, tossing out any missing data to create a colorful bar chart that shows exactly where incidents happen the most. Next, it zeroes in on the specific types of crimes being reported. It grabs the top 10 most common offenses and lines them up in a neat horizontal chart so they are simple to compare. Finally, the code looks at when these crimes actually occur. By pulling the specific hour from each incident’s timestamp, it counts up the complaints hour by hour, setting the perfect stage to see how crime patterns change from morning to night.

Acknowledgements

Includes an acknowledgment of anything that informed the content in the Report. E.g. url of stack overflow, url of Ed post, date and details of drop-in session with tutor.

An AI usage statement including: - gemini ai - the publisher - the URL - a brief description of the context in which the tool was used.

If no AI was used, this must be clearly stated.