R Markdown Assignment

Here’s the data source:

Data source: NYC Open Data Portal — NYPD Shooting Incident Data (Historic)

Let’s retrieve the data.

endpoint <- "https://data.cityofnewyork.us/resource/833y-fsy8.json"

resp <- httr::GET(
  endpoint,
  query = list(
    "$select" = paste(
      c("occur_date","occur_time","boro","precinct",
        "perp_race","vic_race","vic_sex","longitude","latitude"),
      collapse = ","
    ),
    "$limit" = 30000,
    "$order" = "occur_date DESC"
  )
)
httr::stop_for_status(resp)

txt <- httr::content(resp, as = "text", encoding = "UTF-8")
shooting_data <- jsonlite::fromJSON(txt, flatten = TRUE)
names(shooting_data) <- make.names(names(shooting_data), unique = TRUE)  # safety

Let’s make all boroughs in the dataset lowercase. Let’s also turn the ‘occur_time’ column into three columns to separate the hour, minute, and second the shooting occurred. This will help us determine whether the shooting occurred in the morning, afternoon, or evening.

shooting_data <- shooting_data %>%
  mutate(boro = str_to_lower(boro))

shooting_data <- shooting_data %>% separate(
  col = occur_time, 
  into = c("Hour","Minute","Second"),
  sep = ":"
)

shooting_data <- shooting_data %>%
  mutate(time_of_day = case_when(
    Hour >= 4 & Hour < 12 ~ "Morning",
    Hour >= 12 & Hour < 20 ~ "Afternoon",
    Hour >= 20 | Hour < 4 ~ "Night"
  ))

shooting_data <- shooting_data %>%
  mutate(
    Hour_num = as.numeric(Hour),   
    time_of_day = case_when(
      Hour_num >= 4  & Hour_num < 12 ~ "Morning",
      Hour_num >= 12 & Hour_num < 20 ~ "Afternoon",
      TRUE ~ "Night"   
    ),
    time_of_day = factor(time_of_day, levels = c("Morning", "Afternoon", "Night"))
  )

Here’s some basic insights into the data. You can see which borough had the most shootings in raw numbers and in percentage. We can also see which precincts had the most shootings.

shooting_data %>%
  count(boro) 
##            boro     n
## 1         bronx  8834
## 2      brooklyn 11685
## 3     manhattan  3977
## 4        queens  4426
## 5 staten island   822
shooting_data %>% 
  count(boro) %>% 
  mutate(pct = n / sum(n) * 100)
##            boro     n       pct
## 1         bronx  8834 29.700108
## 2      brooklyn 11685 39.285234
## 3     manhattan  3977 13.370764
## 4        queens  4426 14.880312
## 5 staten island   822  2.763583
shooting_data %>%
  count(precinct) %>%
  arrange(desc(n)) %>%
  head(10)
##    precinct    n
## 1        75 1680
## 2        73 1561
## 3        67 1288
## 4        44 1159
## 5        79 1073
## 6        47 1048
## 7        46 1044
## 8        40 1002
## 9        42  936
## 10       48  879
shooting_data %>%
  filter(boro == "brooklyn") %>%
  count(precinct) %>%
  arrange(desc(n)) %>%
  head(10)
##    precinct    n
## 1        75 1680
## 2        73 1561
## 3        67 1288
## 4        79 1073
## 5        77  856
## 6        81  839
## 7        71  609
## 8        83  528
## 9        69  503
## 10       70  491

Feast your eyes on this table and these incredible graphs.

tod_summary <- shooting_data %>%
  count(time_of_day, name = "n") %>%
  mutate(pct = round(100 * n / sum(n), 1))

knitr::kable(
  tod_summary,
  caption = "Shootings by Time of Day",
  col.names = c("Time of Day", "Count", "Percent")
)
Shootings by Time of Day
Time of Day Count Percent
Morning 4263 14.3
Afternoon 8460 28.4
Night 17021 57.2
ggplot(shooting_data, aes(x = time_of_day, fill = time_of_day))+
  geom_bar() +
  labs(
    title = "Shootings by Time of Day",
    x = "Time",
    y = "# of Shootings"
  )

ggplot(shooting_data, aes(x = time_of_day, fill = boro))+
  geom_bar(position = "dodge")+
  facet_wrap(~ boro)+
  labs(
    title = "Shootings by Time of Day Across Boroughs",
    x = "Time",
    y = "# of Shootings",
    fill = "Borough"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12),
    strip.text = element_text(face = "bold", size = 13)  # facet labels
  ) +
  scale_fill_brewer(palette = "Set1")

Honestly, I’m not certain how I plan to use R markdown to aid my thesis research. Assuming I will be publishing research in the future, it could help me organize my data, codes, analyses, and visuals all in one place that I can upload as supplementary material in a way that supports open science practices. R markdown may be helpful to save future analyses and visual creation processes on. This may prevent me from having to re-analyze and re-create visuals in the future.