Introduction

In this analysis we will be taking a look at the Baltimore City 911 calls that were taken in 2022. There are a number a graphs represented in this analysis that represent things like the district the calls were placed in, the priority the calls were listed at, and the time of day the calls are going through. Through these graphs we can gain an understanding of what kind of 911 calls are going through Baltimore as well as when the calls come in. This analysis can also help better understand the safest areas in Baltimore as well as the safest times to be outside during the day. Overall understanding the times and causes of 911 calls can help individuals identify safety concerns in the community.

Dataset

The data used in this analysis was taken from the Baltimore City Government website. This site provides a great amount of public data from Baltimore from government spending to traffic violations. The main variables that were used in this analysis were call description, this represents the classification of the reasoning for the call. Another is priority which shows the degree to which the caller needs to be attended to. The final major characteristic is time, which is represented in the data set as Call Date Time. 4 out of 5 graphs in the analysis utilize some measure of time weather it is days, hours, months, etc.

Findings

District Calls

This graph demonstrates the frequency of calls within certain districts in Baltimore city. The y-axis represents the total call count within that district, while the x-axis shows the 9 main districts represented in the data set. The Southern and Southeastern district were the leading districts in terms of total volume of 911 calls placed, Western and Southwestern had the least calls placed.

callcount <- data.frame(count(df, PoliceDistrict))
callcount <- callcount[order(callcount$n, decreasing = TRUE), ]


ggplot(callcount[1:9,], aes(x = reorder(PoliceDistrict, -n), y = n)) +
  geom_bar(color="darkgreen", fill="lightgreen", stat="identity") +
  labs(title = "Top Baltimore City District 911 Calls 2022", x = "District", y = "Call Count") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels=comma)

Call Description

This graph shows the description of calls made as well as the month they were placed in. The top 10 reasons for 911 calls are listed as well as an “Other” category which is reserved for calls that may fall into more than one category. Other is the leading category as the majority of calls have specific circumstances that can make them hard to classify. The other leading reasons are things like Directed Patrol and Business Checks which are used to deter potential crime.

df_call <- count(df, description)
df_call <- df_call[order(df_call$n, decreasing = TRUE), ]


top_calls <- df_call$description[1:10]


new_df <- df %>%
  filter(description %in% top_calls) %>%
  select(callDateTime, description) %>%
  mutate(month = month(ymd_hms(callDateTime))) %>%
  group_by(description, month) %>%
  summarise(n = length(description), .groups = 'keep') %>%
  data.frame()

other_df <- df %>%
  filter(!description %in% top_calls) %>%
  select(callDateTime) %>%
  mutate(month = month(ymd_hms(callDateTime)), description = "Other") %>%
  group_by(description, month) %>%
  summarise(n = length(description), .groups = 'keep') %>%
  data.frame()


new_df <- rbind(new_df, other_df)


df_tot <- new_df %>%
  select(description, n) %>%
  group_by(description) %>%
  summarise(tot = sum(n), .groups = 'keep') %>%
  data.frame



new_df$month <- as.factor(new_df$month)
max_y <- round_any(max(df_tot$tot), 100000, ceiling)
ggplot(new_df, aes(x = reorder(description, n, sum), y = n, fill = month)) +
  geom_bar(stat="identity", position = position_stack(reverse = TRUE)) + 
  coord_flip() +
  labs(title = "Call Count by Call Description", x = "", y = "Call Count", fill = "Month") +
  theme_tufte() + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette="Set3") +
  geom_text(data = df_tot, aes(x = description, y = tot, label = scales::comma(tot), fill = NULL), hjust = -0.1, size=4) +
  scale_y_continuous(labels = comma, 
                     breaks = seq(0, max_y, by= 50000),
                     limits=c(0, max_y))

Calls by Hour

The graph below represents the volume of 911 calls in Baltimore based on the time of day. As demonstrated the lowest volume of calls comes around 11am-12pm, while the highest volume of calls comes in between 10pm-11pm. Typically the most calls come in when it is dark outside.

hours_df <- df %>%
  select(callDateTime) %>%
  mutate(hour24 = hour(ymd_hms(callDateTime))) %>%
  group_by(hour24) %>%
  summarise(n = length(callDateTime), .groups = 'keep') %>%
  data.frame()




x_axis_labels = min(hours_df$hour24):max(hours_df$hour24)


hi_low <- hours_df %>%
  filter(n == min(n) | n == max(n)) %>%
  data.frame()

ggplot(hours_df, aes(x = hour24, y = n)) + 
  geom_line(color='black', linewidth=1) +
  geom_point(shape=21, size=4, color='red', fill='white') +
  labs(x="Hour", y = "Call Count", title= "Call Count by Hour", caption="Source: Baltimore City Website") +
  scale_y_continuous(labels = comma) +
  theme_light() +
  theme(plot.title = element_text(hjust=0.5)) +
  scale_x_continuous(labels = x_axis_labels, breaks = x_axis_labels, minor_breaks = NULL) +
  geom_point(data = hi_low, aes(x = hour24, y = n), shape=21, size=4, fill='red', color='red') +
  geom_label_repel(aes(label= ifelse(n == max(n) | n == min(n), scales::comma(n) , "") ), 
                   box.padding = .75, 
                   point.padding = .75, 
                   size = 4, color='gray50', 
                   segment.color='darkblue')

Call Priority

This pie chart represents the prioritization of 911 calls in Baltimore. Over 2/3 of calls were classified as Non-emergencies. Low and Medium priority calls each represented around 14% of the chart. The lowest percentage of calls were classified as high priority or emergency. The “Other” and “null” categories represent non callers or calls that could not be classified.

priority_df <- count(df, priority)
priority_df <- priority_df[order(-priority_df$n), ]


priority_df <- priority_df[!is.na(priority_df$myPriority), ]

priority_df <- df %>%
  select(priority, callDateTime) %>%
  mutate(month = month(ymd_hms(callDateTime)),
         myPriority = ifelse(priority=="Non-Emergency", "Non-Emergency", ifelse(priority=="Medium", "Medium", ifelse(priority=="Low", "Low", ifelse(priority=="High", "High", ifelse(priority=="Emergency", "Emergency", "Other")))))) %>%
  group_by(month, myPriority) %>%
  summarise(n=length(myPriority), .groups='keep') %>%
  data.frame()

priority_df %>%
  plot_ly(., labels = ~myPriority, values = ~n, type = "pie",
          textposition = "outside", textinfo = "label + percent") %>%
  layout(title="Call Priority in Baltimore (2022)")

Heatmap

This is a Heatmap that represents the number of calls that came in on a certain day during a given month. During an average month the calls are relatively even given a certain day. What is interesting to take note of is that in 2022 Halloween was on a Monday which is represented by the spike on the Monday in October. There is also a spike on Wednesday’s in the month of November most likely because it is the day before Thanksgiving. It is also interesting to note that Christmas fell on a Sunday in 2022 and Sunday had the lowest volume of calls in the month of December.

days_df <- df %>%
  select(callDateTime) %>%
  mutate(month = month(ymd_hms(callDateTime)),
         dayoftheweek = weekdays(ymd_hms(callDateTime), abbreviate = TRUE)) %>%
  group_by(month, dayoftheweek) %>%
  summarise(n = length(callDateTime), .groups='keep') %>%
  data.frame()
  

 

mylevels <- c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')
days_df$dayoftheweek <- factor(days_df$dayoftheweek, levels = mylevels)


ggplot(days_df, aes(x = month, y = dayoftheweek, fill=n)) + 
  geom_tile(color="black") + 
  geom_text(aes(label=comma(n))) +
  coord_equal(ratio=1) + 
  labs(title="Heatmap: Calls by Day of the Week",
       x = "Month", 
       y = "Days of the Week", 
       fill = "Call Count") + 
  theme_minimal() + 
  theme(plot.title = element_text(hjust=0.5)) +
  scale_y_discrete(limits = rev(levels(days_df$dayoftheweek)))+
  scale_x_continuous(labels = x_axis_labels[x_axis_labels %in% c(1:12)], breaks = x_axis_labels[x_axis_labels %in% c(1:12)], minor_breaks = NULL) +
  scale_fill_continuous(low="white", high="red") + 
  guides(fill = guide_legend(reverse=TRUE, override.aes=list(color="black")))

Summary

It is important to bring this data and analysis together to understand the big picture in terms of where and when people are calling 911 in Baltimore. Ultimately this can allow people to be more safe in the city while also understanding why the majority of 911 calls are made. The bar chart representing the major districts in Baltimore showed that most calls came from the Southern and Southeastern districts, the least calls came from the Southwestern district. For calls that were not categorized as “Other” the majority of 911 calls were made for Directed Patrol and Business tracks, largely practices used to deter potential crime. As represented in the pie chart over 2/3 of calls are considered non-emergencies, high priority and emergency calls are by far the least common. The heatmap was able to demonstrate that the largest spikes in call volume are seen during holiday’s, except for Christmas which saw relatively low call volume in 2022. This analysis suggests the most dangerous time in Baltimore in 2022 was during the holiday season, in the Southern district from 10pm-11pm. The key takeaway from this data is the 911 call frequency is heavily influenced by district location, time of day, and seasonal holiday’s. Another interesting takeaway is that the large majority of 911 calls were labeled as non-emergencies. Most calls were being made to either deter or mitigate potential incidents in the city.