Data source: NYC Open Data Portal — NYPD Shooting Incident Data (Historic)
Let’s retrieve the data.
endpoint <- "https://data.cityofnewyork.us/resource/833y-fsy8.json"
resp <- httr::GET(
endpoint,
query = list(
"$select" = paste(
c("occur_date","occur_time","boro","precinct",
"perp_race","vic_race","vic_sex","longitude","latitude"),
collapse = ","
),
"$limit" = 30000,
"$order" = "occur_date DESC"
)
)
httr::stop_for_status(resp)
txt <- httr::content(resp, as = "text", encoding = "UTF-8")
shooting_data <- jsonlite::fromJSON(txt, flatten = TRUE)
names(shooting_data) <- make.names(names(shooting_data), unique = TRUE) # safety
Let’s make all boroughs in the dataset lowercase. Let’s also turn the ‘occur_time’ column into three columns to separate the hour, minute, and second the shooting occurred. This will help us determine whether the shooting occurred in the morning, afternoon, or evening.
shooting_data <- shooting_data %>%
mutate(boro = str_to_lower(boro))
shooting_data <- shooting_data %>% separate(
col = occur_time,
into = c("Hour","Minute","Second"),
sep = ":"
)
shooting_data <- shooting_data %>%
mutate(time_of_day = case_when(
Hour >= 4 & Hour < 12 ~ "Morning",
Hour >= 12 & Hour < 20 ~ "Afternoon",
Hour >= 20 | Hour < 4 ~ "Night"
))
shooting_data <- shooting_data %>%
mutate(
Hour_num = as.numeric(Hour),
time_of_day = case_when(
Hour_num >= 4 & Hour_num < 12 ~ "Morning",
Hour_num >= 12 & Hour_num < 20 ~ "Afternoon",
TRUE ~ "Night"
),
time_of_day = factor(time_of_day, levels = c("Morning", "Afternoon", "Night"))
)
Here’s some basic insights into the data. You can see which borough had the most shootings in raw numbers and in percentage. We can also see which precincts had the most shootings.
shooting_data %>%
count(boro)
## boro n
## 1 bronx 8834
## 2 brooklyn 11685
## 3 manhattan 3977
## 4 queens 4426
## 5 staten island 822
shooting_data %>%
count(boro) %>%
mutate(pct = n / sum(n) * 100)
## boro n pct
## 1 bronx 8834 29.700108
## 2 brooklyn 11685 39.285234
## 3 manhattan 3977 13.370764
## 4 queens 4426 14.880312
## 5 staten island 822 2.763583
shooting_data %>%
count(precinct) %>%
arrange(desc(n)) %>%
head(10)
## precinct n
## 1 75 1680
## 2 73 1561
## 3 67 1288
## 4 44 1159
## 5 79 1073
## 6 47 1048
## 7 46 1044
## 8 40 1002
## 9 42 936
## 10 48 879
shooting_data %>%
filter(boro == "brooklyn") %>%
count(precinct) %>%
arrange(desc(n)) %>%
head(10)
## precinct n
## 1 75 1680
## 2 73 1561
## 3 67 1288
## 4 79 1073
## 5 77 856
## 6 81 839
## 7 71 609
## 8 83 528
## 9 69 503
## 10 70 491
Feast your eyes on this table and these incredible graphs.
tod_summary <- shooting_data %>%
count(time_of_day, name = "n") %>%
mutate(pct = round(100 * n / sum(n), 1))
knitr::kable(
tod_summary,
caption = "Shootings by Time of Day",
col.names = c("Time of Day", "Count", "Percent")
)
Time of Day | Count | Percent |
---|---|---|
Morning | 4263 | 14.3 |
Afternoon | 8460 | 28.4 |
Night | 17021 | 57.2 |
ggplot(shooting_data, aes(x = time_of_day, fill = time_of_day))+
geom_bar() +
labs(
title = "Shootings by Time of Day",
x = "Time",
y = "# of Shootings"
)
ggplot(shooting_data, aes(x = time_of_day, fill = boro))+
geom_bar(position = "dodge")+
facet_wrap(~ boro)+
labs(
title = "Shootings by Time of Day Across Boroughs",
x = "Time",
y = "# of Shootings",
fill = "Borough"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 16),
axis.title = element_text(size = 14),
axis.text = element_text(size = 12),
strip.text = element_text(face = "bold", size = 13) # facet labels
) +
scale_fill_brewer(palette = "Set1")
Honestly, I’m not certain how I plan to use R markdown to aid my thesis research. Assuming I will be publishing research in the future, it could help me organize my data, codes, analyses, and visuals all in one place that I can upload as supplementary material in a way that supports open science practices. R markdown may be helpful to save future analyses and visual creation processes on. This may prevent me from having to re-analyze and re-create visuals in the future.