baltimore911set

Introduction

My name is Ruby Nguyen and I will be working with the Baltimore 911 dataset. With this data set, I have created countless of data visualizations such as multiple line charts, bar charts, and heat-maps to convey my findings about what type of calls are being placed by Baltimore residents and when and where they come from. By analyzing this data, we can provide more resources for each type of call and see what patterns occur with emergency calls, common types of accidents, and more.

Dataset

This dataset came from Baltimore City and it showcases the amount of 911 calls and the type of calls. My dataset includes variables such as callDateTime, Neighborhoods, priority, description, and more. This dataset ranges from 2017 through 2023.

Descriptive Statistics

Below are the mean, mode, and median of the hours that 911 calls are made during.

baltimore911$hr <- hour(ymd_hms(baltimore911$callDateTime))

mean(baltimore911$hr)

## [1] 12.08505

mode(baltimore911$hr)

## [1] "numeric"

median(baltimore911$hr)

## [1] 13

Findings

With my general findings, I’ve noticed that a lot of the calls that 911 operators recieve are non-emergency calls. I’ve also noticed the calls decrease when it gets to the weekends. The most major topics for calls are business checks.

Tab 1

This is my graph showing 911 Calls by the Hour. During 11am, 911 calls are at it’s lowest while around 10pm is when 911 calls reach their peak. During the late night time and early mornings is when 911 dispatchers are expecting the most calls.

library(lubridate)
library(dplyr)
library(scales)
library(ggplot2)
library(ggrepel)
hours_df <- baltimore911 %>% 
  select(callDateTime) %>%
  mutate(hour24 = hour(callDateTime)) %>%
  group_by(hour24) %>%
  summarise(n = length(callDateTime), .groups = 'keep') %>%
  data.frame()



x_axis_labels = min(hours_df$hour24):max(hours_df$hour24)



hi_lo <- hours_df %>%
  filter(n == min(n) | n == max(n)) %>%
  data.frame()

ggplot(hours_df, aes(x = hour24, y = n)) + 
  geom_line(color = 'black', size=1) + geom_point(shape=21, size = 4, color = 'red', fill='white') + 
  labs(x="Hour", y = "Call Count", title = "911 Calls by Hour", caption = "Source: Baltimore City") +
  scale_y_continuous(labels=comma) + 
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_x_continuous(labels=x_axis_labels, breaks = x_axis_labels, minor_breaks = NULL) +
  geom_point(data= hi_lo, aes(x=hour24, y= n), shape=21, size=4, fill = 'red', color = 'red') +
  geom_label_repel(aes(label=n), box.padding = 1, point.padding = 1, size=4, color = 'Grey50', segment.color = 'darkblue')

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Tab 2

In my second graph, I created a heat-map that shows the density of the 911 calls throughout the years, specifically by days of the week. In every square contains the actual call count. The color legend shows lightest to darkest indicating darkest being the highest amount. As the years go on, the 911 calls become more and more apparent throughout the week. It seems as though Wednesday through Friday are when most 911 calls are being made.

days_df <- baltimore911 %>%
  select(callDateTime) %>% 
  mutate(year = year(ymd_hms(callDateTime)),
          dayoftheweek = weekdays(ymd_hms(callDateTime), abbreviate = TRUE)) %>%
  group_by(year, dayoftheweek) %>%
  summarise(n= length(callDateTime), .groups = 'keep')%>%
  data.frame()
  

days_df$year <- as.factor(days_df$year)
day_order <- factor(days_df$dayoftheweek, level = c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'))



mylevels <-c('Mon','Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')
days_df$dayoftheweek <- factor(days_df$dayoftheweek, levels = mylevels)
#------ helps break up the data to help buffer
breaks <- c(seq(0, max(days_df$n), by = 25000))


ggplot(days_df, aes(x = year, y =dayoftheweek, fill =n)) +
  geom_tile(color = "black") + 
  geom_text(aes(label= comma(n)), size = 2) +
  coord_equal(ratio=1) + 
  labs(title = "Heatmap: Call count by day of the week", 
       x = "Year", 
       y = "Days of the week",
       fill = "Call Count") +
  theme_minimal() + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_discrete(limits = rev(levels(days_df$dayoftheweek))) +
  scale_fill_continuous(low="white",high="red", breaks = breaks) +
  guides(fill = guide_legend(reverse = TRUE, override.aes=list(colour = "black")))

Tab 3

In the third graph, I decided to use a pie chart to demonstrate my findings. I used a trellis of pie charts. This gives me all the years and the priority level of each 911 call. I used Medium and Non-Emergency as my two main comparisons to the “other” category. As the years go by, more of the 911 calls are becoming considered “Non-Emergency”. This could help the police department figure out another way to filter these types of calls to another department.

library(plotly)
top_priority <- count(baltimore911, priority)
top_priority <- top_priority[order(-top_priority$n), ]


priority_df <- baltimore911 %>%
  select(priority, callDateTime) %>%
  mutate (year= year(ymd_hms(callDateTime)), 
          myPriority = ifelse(priority == "Non-Emergency", "Non-Emergency", ifelse(priority == "Medium", "Medium", "other"))) %>%
  group_by(year, myPriority) %>%
  summarise(n=length(myPriority), .groups = 'keep')%>%
  group_by(year)%>%
  mutate(percent_of_total = round(100*n/sum(n), 1))%>%
  ungroup() %>%
  data.frame()




ggplot(data = priority_df, aes(x="", y=n, fill = myPriority)) + 
  geom_bar(stat = "identity", position = "fill") +
  coord_polar(theta = "y", start = 0) +
  labs(fill= "priority", x =NULL, y = NULL, title = "Count of calls by Year and by Priority")+
  theme_light() + 
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        panel.grid = element_blank()) +
  facet_wrap(~year, ncol=3, nrow=3) +
  scale_fill_brewer(palette = "Reds") +
  geom_text(aes(x=1.7, label = paste0(percent_of_total, "%")), 
            size=2,
            position = position_fill(vjust = 0.5))

Tab 4

For my fourth graph, I used a bar graph to display the call count of call types in the top 5 neighborhoods of Baltimore. There are a lot of neighborhoods in Baltimore so I neded up picking the ones who had the most calls in the area. The most popular types of calls across 5 neighborhoods were Business checks and directed patrols. This graph uses information gathered from all the years 2017-2023. This model is also interactive and lets you hover over each bar to see the call count, the neighborhood, and call type. You can also click on the square legends to the right to hide a bar if needed.

library(plotly)

df_Reason_for_call <- baltimore911 %>%
  filter(grepl("^[A-Za-z]", description)) %>%  
  count(description, sort = TRUE)              



df_Reason_for_call <- df_Reason_for_call[order(df_Reason_for_call$n, decreasing = TRUE),]

top_reasons <- df_Reason_for_call$description[1:10]


top_neighborhoods <-count(baltimore911, Neighborhood) 


top_neighborhoods <- top_neighborhoods[order(-top_neighborhoods$n), ]

top5neighborhoods<- top_neighborhoods %>%
  filter(Neighborhood %in% c("Downtown",
                             "Sandtown-Winchester",
                             "Brooklyn",
                             "Frankford",
                             "Belair-Edison")) %>%
  mutate(prop = n / sum(n))
heat_df <- baltimore911 %>%
  filter(
    Neighborhood %in% top5neighborhoods$Neighborhood,
    description %in% top_reasons) %>%
  group_by(Neighborhood, description) %>%
  summarise(n = n(), .groups = "drop")
  data.frame()

## data frame with 0 columns and 0 rows

p <- ggplot(heat_df, aes(x = Neighborhood, y = n, fill = description, text = paste0( "Neighborhood: ", Neighborhood, "<br>", "Call Type: ", description, "<br>", "Count: ", n ))) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  labs(title = "Call Types in Top 5 Neighborhoods",
       x = "Neighborhood",
       y = "Call Count",
       fill = "Call Type") +
  theme_minimal() +
  theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_brewer(palette = "Paired")
ggplotly(p, tooltip = "text")

Tab 5

For my last graph, I picked a multiple line chart to illustrate the trend of the call count throughout the years. The 911 calls increase tremendously from 2017 and 2018. However, the trend is the same in every increase. The most 911 calls come from Tuesday to Friday while it starts to drop during the weekends. Since the weekends are mostly everyone’s rest day, the chances of someone going out and getting hurt/needing assistance are a little lower than the weekdays.

day_dfbaltimore <- baltimore911 %>% 
  select(callDateTime) %>%
  mutate(year = year(ymd_hms(callDateTime)),
         dayoftheweek = weekdays(ymd_hms(callDateTime), abbreviate = TRUE)) %>%
  group_by(year, dayoftheweek) %>%
  summarise(n= length(callDateTime), .groups='keep') %>%
  data.frame()



day_dfbaltimore$year <- as.factor(days_df$year)
ggplot(day_dfbaltimore, aes(x = day_order, y=n, group = year)) +
  geom_line(aes(color = year), size = 3) +
  labs(title = "Calls by Day and By Year", x = "Days of the week", y = "Call Count") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_point(shape = 21, size = 5, color = "black", fill = "white") +
  scale_y_continuous(labels = comma)

Conclusion

Overall, the analysis of the Baltimore 911 dataset reveals meaningful patterns in emergency call activity across time and call type. The data shows that certain hours of the day experience higher call volumes, particularly during late evening hours, while other times such as late morning tend to have fewer calls. Additionally, patterns across days of the week suggest that emergency demand fluctuates depending on the time period. Even though I expected the 911 calls to decrease during 2020, my findings have shown me that the calls in 2020 are still higher than previous years. These findings help show when emergency services may experience the greatest demand and where resource allocation may be most critical. By understanding these trends, city officials can make more informed decisions regarding staffing, dispatch planning, and public safety procedures.