This report explores the NYPD Shooting Incidents Data using R. The workflow demonstrates how to acquire, clean, and visualize open-source data in a reproducible way.
# API call to NYC Open Data
url <- "https://data.cityofnewyork.us/resource/833y-fsy8.json"
shooting_data <- jsonlite::fromJSON(url)
# Peek at data
head(shooting_data)
##   incident_key              occur_date occur_time     boro loc_of_occur_desc
## 1    298699604 2024-12-31T00:00:00.000   19:16:00 BROOKLYN           OUTSIDE
## 2    298699604 2024-12-31T00:00:00.000   19:16:00 BROOKLYN           OUTSIDE
## 3    298672095 2024-12-30T00:00:00.000   20:32:00    BRONX            INSIDE
## 4    298672096 2024-12-30T00:00:00.000   16:45:00    BRONX           OUTSIDE
## 5    298672096 2024-12-30T00:00:00.000   16:45:00    BRONX           OUTSIDE
## 6    298672096 2024-12-30T00:00:00.000   16:45:00    BRONX           OUTSIDE
##   precinct jurisdiction_code loc_classfctn_desc           location_desc
## 1       69                 0             STREET                  (null)
## 2       69                 0             STREET                  (null)
## 3       41                 0           DWELLING MULTI DWELL - APT BUILD
## 4       47                 0             STREET                  (null)
## 5       47                 0             STREET                  (null)
## 6       47                 0             STREET                  (null)
##   statistical_murder_flag perp_age_group perp_sex perp_race vic_age_group
## 1                   FALSE          25-44        M     BLACK         25-44
## 2                   FALSE          25-44        M     BLACK         18-24
## 3                    TRUE          18-24        M     BLACK         25-44
## 4                   FALSE         (null)   (null)    (null)         25-44
## 5                   FALSE         (null)   (null)    (null)           <18
## 6                   FALSE         (null)   (null)    (null)         18-24
##   vic_sex       vic_race x_coord_cd y_coord_cd  latitude  longitude
## 1       M          BLACK  1,015,120    173,870 40.643866 -73.888761
## 2       M          BLACK  1,015,120    173,870 40.643866 -73.888761
## 3       M          BLACK  1,012,201    240,878 40.827795 -73.899003
## 4       F WHITE HISPANIC  1,021,316    259,277 40.878261 -73.865964
## 5       F WHITE HISPANIC  1,021,316    259,277 40.878261 -73.865964
## 6       M          BLACK  1,021,316    259,277 40.878261 -73.865964
##   geocoded_column.type geocoded_column.coordinates :@computed_region_yeji_bk3q
## 1                Point         -73.88876, 40.64387                           2
## 2                Point         -73.88876, 40.64387                           2
## 3                Point           -73.8990, 40.8278                           5
## 4                Point         -73.86596, 40.87826                           5
## 5                Point         -73.86596, 40.87826                           5
## 6                Point         -73.86596, 40.87826                           5
##   :@computed_region_92fq_4b7q :@computed_region_sbqj_enih
## 1                           8                          42
## 2                           8                          42
## 3                          43                          25
## 4                           2                          30
## 5                           2                          30
## 6                           2                          30
##   :@computed_region_efsh_h5xi :@computed_region_f5dn_yrer
## 1                       13827                           5
## 2                       13827                           5
## 3                       10937                          34
## 4                       11605                          29
## 5                       11605                          29
## 6                       11605                          29
The dataset currently contains 1000 rows of shooting incident records.
shooting_clean <- shooting_data %>%
  # Step 1: Remove rows missing occur_date
  filter(!is.na(occur_date)) %>%
  # Step 2: Create time_of_day variable
  mutate(
    occur_date = ymd(occur_date),
    occur_time = hm(occur_time),
    hour = hour(occur_time),
    time_of_day = case_when(
      hour >= 5 & hour < 12 ~ "Morning",
      hour >= 12 & hour < 17 ~ "Afternoon",
      hour >= 17 & hour < 21 ~ "Evening",
      TRUE ~ "Night"
    ),
    borough = str_to_title(boro)
  )
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `occur_date = ymd(occur_date)`.
## Caused by warning:
## ! All formats failed to parse. No formats found.
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
I cleaned the dataset by dropping rows with missing dates, creating a
time_of_day variable based on the hour of the incident, and
standardizing borough names.
# Distribution of incidents by borough
borough_counts <- shooting_clean %>%
  count(borough, sort = TRUE)
borough_counts
##         borough   n
## 1         Bronx 366
## 2      Brooklyn 296
## 3     Manhattan 190
## 4        Queens 137
## 5 Staten Island  11
The table above shows the distribution of incidents across boroughs. The borough with the most shootings is Bronx.
# Create a clean table with kable
kable(borough_counts, caption = "Distribution of Shooting Incidents by Borough")
| borough | n | 
|---|---|
| Bronx | 366 | 
| Brooklyn | 296 | 
| Manhattan | 190 | 
| Queens | 137 | 
| Staten Island | 11 | 
ggplot(shooting_clean, aes(x = time_of_day)) +
  geom_bar(fill = "steelblue") +
  labs(title = "Shooting Incidents by Time of Day",
       x = "Time of Day",
       y = "Count of Incidents") +
  theme_minimal()
The bar plot above shows how shootings are distributed across different times of the day.
ggplot(shooting_clean, aes(x = borough)) +
  geom_bar(fill = "darkred") +
  labs(title = "Shooting Incidents by Borough",
       x = "Borough",
       y = "Count of Incidents") +
  theme_minimal()
The second plot shows which boroughs experience the highest number of incidents.
Learning R Markdown this week has shown me how to keep my code and explanations together in one place, which helps make my work easier to follow. I can see how this would help me with my thesis on climate change and peer anxiety/influence because I’ll be able to keep track of my analysis steps and show exactly how I got my results. It also makes me think more carefully about how to explain what the numbers and graphs mean and not just how to calculate them. Using R Markdown has helped me also learn the best ways to share my results and findings with others.