Understanding What’s Ahead: Introducing Our Data:

The data used in this analysis is courtesy of “Open Baltimore”. It is THE data source for all things Baltimore City (with regards to publicly available data that is…). For this particular exploration, we will be looking at a data set that covers the whole of 2024 (January 1st - December 31st) and its many, many 311 calls across that year. From streetlight repairs, water main breaks, Animal Control, to building demolition, it’s all in here.

As a life-long Baltimore City resident it is fascinating to see how well (or not) our 311 specialists are achieving what is asked of them. In the following sections we will go through the background of the data itself and some detailed exploration of some particular questions about the workings of this service.

Source of the data used in this project: https://data.baltimorecity.gov/datasets/baltimore::311-customer-service-requests-2024/about.

Breakdown of the Variables:

While there are 29 “unique” variables in the data set, less than half of them are used (8 of the 29 are used in the visualizations) this is not without reason. Some of the variables are un-needed as they pertain to internal record keeping for Baltimore City; those being:

  • RowID -> This is maintained syntax from a data set containing all 311 Calls within Baltimore City

  • SRRecordID

  • IsDeleted

  • NeedsSync

  • ESRI-OID

Others are otherwise made redundant by other columns within the same dataset: - x corresponds to Longitude - y corresponds to Latitude - Geolocation is a collection of Latitude and Longitude in a single column format (Latitude, Longitude)

And others remained un-used as they did not find a home in the final visualizations: - ServiceRequestNum

  • SRStatus

  • LastActivity

  • LastActivityDate

  • Address

  • ZipCode

  • Neighborhood

  • CouncilDistrict

  • PoliceDistrict

  • PolicePost

That leaves us with those 8 remaining variables that were used in our deeper analysis: - SRType: Service Response Type indicates what the nature of the call is such as “Pot Holes”, “Downed Tree”, etc.

  • MethodReceived: How the call was introduced into the system “Phone”, “Email”, etc.

  • CreatedDAte: When the alert was made by a user.

  • DueDate: When the project was projected to be due as ordained by Baltimore City.

  • CloseDate: When the project was officially closed/ completed by the responsible parties.

  • Agency: What party was responsible for completing the project.

  • Latitude and Longitude: Self-explanatory, provides the latitudinal and longitudinal coordinates of the alert respectively.

In total we have 1,048,575 rows of data to work with by those 29 columns so let’s get started with our deeper analysis into the Baltimore City 311 calls for 2024.

Different Plots for the 311 Data Set:

Plot 1: The Top Types of 311 Calls in 2024 (Including a Condensed “Other” Category):

Before getting into the nitty-gritty of what this plot means it is imperative to understand what the forces at play are going forward. In total there are 293 unique Service Response Types in this set, way too many to get an accurate picture as to the breakdown of calls being made in 2024. In order to avoid the visual clutter that came with attempting to plot all the different types it made sense to isolate the ten most prevalent calls and the rest can be relegated to an “Other” group. This choice also takes care of instances where the count (n) of a particular Response Type is incredibly low. “WW-Water Turn Off Seasonal” for example had a single instance over the entirety of 2024 making it a drop in the bucket in the grand scheme of things.

For our purposes a bar chart was the best means of conveying the data in question. Looking at this plot we have our top 10 and with the addition of the “Other” group that merges the remaining 283 Response records. It makes sense that “Other” has the most number of calls purely from a quantity perspective. While there are a few types that pertain to an incredibly small number of instances such as removing sandbags or removal of broken parking meters, (both 1 instance respectively) it also encompasses major services like street sweeping, pot-hole repairs, towing obstructed vehicles (all of which have more than 5,000 instances each). With that in mind let’s turn our attention to the stand-alone cases. “ECC-Information Request” with 307,133 records is the highest total for an individual response type. Specifically this is in reference to any communication with the Emergency Communication Center (ECC) including those that pertain to the organization/ dispatch of emergency services attached to 911. Seeing as how this is a major metropolitan area the number of calls that would filter through the ECC would be quite high further contributing to the massive difference between it and the 3rd highest being “SW-Rat Rubout Proactive” with 144,288 responses. Again, since this is a major metropolitan area there is a large population of rats within the city limits and are periodically exterminated to quell their ever expanding population.

What is perhaps most intriguing about this plot is that (excluding Other) the 1st and 2nd both exceed 100,000 responses by a fair margin while the remaining 8 fall short of 50,000 responses squarely. My guess is that these responses remain the vast majority of calls from year to year since they are fairly static factors in the area.

report_type = data.frame(count(df, SRType))
report_type = report_type[order(report_type$n, decreasing = TRUE), ]

type_percent = report_type %>%
  select(SRType, n) %>%
  mutate(prcnt = (n / sum(n)) * 100) %>%
  group_by(SRType) %>%
  data.frame()

top_type = slice(type_percent, (1:10))

type_grouped = type_percent %>%
  mutate(SRType_grouped = ifelse(SRType %in% top_type$SRType, SRType, "Other"))

type_summary = type_grouped %>%
  group_by(SRType_grouped) %>%
  summarize(n = sum(n)) %>%
  data.frame()

type_summary$SRType_grouped = factor(type_summary$SRType_grouped, levels = type_summary$SRType_grouped[order(type_summary$n, decreasing = TRUE)])

ggplot(type_summary, aes(x = SRType_grouped, y = n, fill = SRType_grouped)) +
  geom_col() +
  geom_text(aes(label = comma(n)), vjust = -0.5) +  
  scale_x_discrete(limits = levels(type_summary$SRType_grouped)) + 
  scale_y_continuous(labels = comma_format()) +
  theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1, size = 7.5)) + 
  labs(title = "Top 10 Baltimore 311 Call Types Across 2024 (Including a Condensed 'Other' Grouping):",
       x = "Call Type", 
       y = "Count",
       caption = "'Other' contains the remaining 283 SRType groups that were not within the Top 10 for 2024.\nThe Total number of calls made was 1048575") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Set3", n = 11)

Plot 2: How is Baltimore City Getting Its 311 Calls Logged:

Problems are popping up all over the place and it takes people to find them and let the appropriate parties know what’s happening and that they should look into fixing them. How are people getting the word out about what needs fixing let’s look at the chart below to find out!

Much like the “SRType” plot above, there were too many methods of reception to have all of them represented on the chart. In addition, the methods: ‘Chat’ ‘Community’ ‘Email’ ‘Mail’ ‘Twitter’ ‘Web’ each accounted for < 1% of the logged reports so it made sense to merge them into another “Other” group.

Taking a look at the plot we can see that “Phone” is the largest portion of all reports logged in 2024 at ~ 60.19% (rounded to 2 decimal points for readability). Most people are carrying cellphones on their person at all times and when they see something worth reporting to 311, so long as the situation is safe, their inclination will be to call the service number to report the issue. It is important to note that this also includes text messages into the equation for phone.

method_received_df = df %>%
  select(MethodReceived) %>%
  group_by(MethodReceived) %>%
  summarise(n = n(), .groups = 'drop') %>%
  mutate(Percentage = (n / sum(n)) * 100) %>%
  mutate(MethodReceived_grouped = ifelse(Percentage < 1, "Other Method", MethodReceived)) %>% 
  group_by(MethodReceived_grouped) %>% 
  summarise(n = sum(n), .groups = 'drop') %>% 
  mutate(Percentage = (n / sum(n)) * 100) %>% 
  mutate(Label = paste0(round(Percentage, 2), "%")) 


ggplot(method_received_df, aes(x = "", y = n, fill = MethodReceived_grouped)) + 
  geom_bar(width = 1, stat = "identity") +
  coord_polar("y", start = 0) +
  labs(title = "Pie Chart of How 311 Reports are Received in 2024",
       caption = "Methods of Reception that totaled less than 1% were condensed into the 'Other Method' grouping.\n This includes: 'Chat' 'Community' 'Email' 'Mail' 'Twitter' 'Web'") +
  theme(axis.line = element_blank(), axis.text = element_blank(),
        axis.ticks = element_blank(),
        plot.caption = element_text(hjust = 0, size = 10)) +
  geom_text(aes(label = Label, y = n),
            position = position_stack(vjust = 0.5),
            color = "black", size = 4)

Plot 3: Where is This All Happening? Using Heatmaps to Plot Report Density:

No two parts of Baltimore City are exactly alike. Some are more residential while others focus on experiences and entertainment, while others still are epicenters of business and development within the state of Maryland. It would make sense then that the number of reports being filed would differ from place to place. Some locations are bound to be magnets for 311 reports while others have them once in a blue moon.

Right from the get-go we can see that this assumption holds true. Around the periphery of the plot (denoted is the white and very-light orange) that fewer 311 reports are being filed as they hover around the hundred to low thousand range. As we move further inward towards the Inner Harbor - Federal Hill Area we can see that reports are consolidated largely within a region extending from [39.275 through 39.31] degrees Latitude to [-76.675 through -76.625] degrees Longitude.

This region is incredibly popular both as a place of entertainment as well as having a substantial number of businesses that have operations here. In short it is a highly trafficked area within Baltimore which means that not only are more things likely to occur that would require 311 intervention, but that there are generally more people who can identify issues and report them to the authorities.

location_df = df %>%
  mutate(Latitude = round(Latitude, 2), Longitude = round(Longitude, 2)) %>%
  filter(!is.na(Latitude) & !is.na(Longitude)) %>% 
  group_by(Latitude, Longitude) %>%
  summarise(n = n(), .groups = "drop") %>%
  data.frame()

location_df_trimmed = location_df %>%
  filter(n > 100)  

ggplot(location_df_trimmed, aes(x = Longitude, y = Latitude, fill = n)) +
  geom_tile(color = "black") +
  geom_text(aes(label = comma(n)), size = 2) +
  labs(x = "Longitude", y = "Latitude", 
       title = "Density of Baltimore 311 Call Locations",
       caption = "Instances of address frequency less than 100 are removed for visual clarity") +
  coord_equal(ratio = 1) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_continuous(low = "white", high = "orange", labels = comma) +
  guides(fill = guide_legend(reverse = TRUE, override.aes = list(colour = "black")))

(Map of Baltimore City Sourced from: https://jfiksel.github.io/2017-02-07-visualizing-life-expectancy-in-baltimore-with-ggmap/)

Plot 4: Individual Agency Contribution by Month in 2024:

(Please Scroll Down to View the Chart the Spacing Doesn’t Want To Co-Operate with my Current Version…)

As we have seen above, nothing is entirely static when it comes to 311 reporting. So why not take that perspective and apply it directly to the agencies who are affiliated with 311’s available services. Who is most active and when during 2024? Let’s find out!

Looking at the Trellis Chart, both Solid Waste and the ECC are the top 2 agencies throughout all of 2024 with them trading back and forth for who is the comprised the largest proportion from month to month. They both hover around the ~30% mark for all 12 months each with the remaining 5 comprising a much smaller proportion. Thinking back to how the #1 type of 311 report was for 2024 was ECC Information, that would mean that the ECC was on the hook for responding and cataloging those reports. Likewise, many of the more “populated” report types involved the Solid Waste division be it for trash removal, illegal dumping transfer or otherwise.

top_10_agencies = df %>%
  filter(!is.na(Agency)) %>% 
  count(Agency, sort = TRUE) %>%
  slice(1:7) %>%
  pull(Agency) #-> A much better way than initially designed (Re-oriented using Google Gemini example)

df_top10 = df %>%
  filter(Agency %in% top_10_agencies, !is.na(Agency), !is.na(CloseDate)) %>% 
  filter(!is.na(CloseDate))

 # Line 268 - 272 were re-written using Google Gemini for improved efficiency
trellis_df = df_top10 %>%
  select(Agency, CloseDate) %>%
  mutate(month_num = month(mdy_hm(CloseDate))) %>%
  mutate(month_complete = month.abb[month_num]) %>%
  group_by(month_complete, Agency) %>%
  summarise(n = n(), .groups = 'drop') %>%
  group_by(month_complete) %>%
  mutate(percentage = (n / sum(n)) * 100) %>%
  ungroup() %>%
  mutate(month_complete = factor(month_complete, levels = month.abb)) %>%
  mutate(Label = paste0(round(percentage, 2), "%")) %>%
  data.frame()
  
ggplot(trellis_df, aes(x = "", y = percentage, fill = Agency)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +
  facet_wrap(~month_complete, ncol = 6) +
  labs(title = "311 Call Distribution by Top 7 Agencies and Month", fill = "Agency") +
  theme_void() +
  theme(strip.text = element_text(size = 9),
        legend.position = "bottom",
        panel.spacing = unit(0.5, "lines"),
        plot.title = element_text(hjust = 0.5)) +
  geom_text(aes(label = Label, x = 1.6), 
            position = position_stack(vjust = 0.5), size = 3)

Plot 5: How Well are Agencies Getting the Job Done?

We just saw that certain agencies are more active from month to month, with an influx of calls that need responding to, it begs the question, are they all getting done on time? Are the 311 reports being resolved in a timely manner or are they being delayed fro whatever reason or are certain agencies more prone to “taking their time” with getting things done?

Save for 3 agencies, those being the ECC (with a staggering 2.11% On Time rate), Public Works, and Parks and Recreation, the remaining 13 agencies have above 50% rates of completion on time for their tasks. The ECC is once again on the extremes with a 97.86% rate of being Past Due (by max 7 days) all other agencies have below 50% for being Past Due. Now looking at the Incredibly Late group (takes longer than 7 days to fulfill the project) we can see that both the Parks and Recreation as well as Public Works show a propensity for being slower than the other agencies. Public Works had 72.67% of all projects from 2024 fall into the Incredibly Late group while Parks and Recreation had 35.73%.

We can then infer that those agencies who remain on time with their work tend to maintain that level of punctuality throughout the year while those that take their time tend to keep that trend the whole year.

completion_df = df %>%
  filter(!is.na(Agency), !is.na(DueDate), !is.na(CloseDate)) %>%
  mutate(DueDate = mdy_hm(DueDate), CloseDate = mdy_hm(CloseDate)) %>% 
  mutate(DaysLate = as.numeric(difftime(CloseDate, DueDate, units = "days"))) %>%
  #-----------------------------------------------------------------------------
  #This chunk was generated by Google Gemini to condense my initial iteration of
  #this piece. Much cleaner and more in line with how I visualized this in my head...
    mutate(CompletionStatus = case_when(DaysLate <= 0 ~ "On Time",
    DaysLate > 0 & DaysLate <= 7 ~ "Past Due",DaysLate > 7 ~ "Incredibly Late")) %>%
  #-----------------------------------------------------------------------------
  group_by(Agency) %>%
  count(Agency, CompletionStatus) %>%
  mutate(Percentage = n / sum(n) * 100) %>%
  ungroup() %>%
  arrange(Agency, CompletionStatus) %>%
  mutate(Agency = factor(Agency, levels = unique(Agency))) %>%
    mutate(CompletionStatus = factor(CompletionStatus, levels = c("On Time", "Past Due", "Incredibly Late"))) %>%
  data.frame()
  
ggplot(completion_df, aes(x = CompletionStatus, y = Percentage, group = Agency, color = Agency)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  labs(title = "311 Call Completion Performance of Top 5 Agencies",
       x = "Completion Status",
       y = "Percentage of Calls", color = "Agency",
       caption = "On Time: Completed Date is before or on the Prescribed Due Date \nPast Due: A week or less (7 days) over the due date\nIncredibly Late: Exceeds the 1 Week Grace Period") +
  theme_bw() +
  theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5))

Conclusion:

There are a lot of factors at play in a large city like Baltimore and in particular there are a lot of moving parts to resolving the many thousands of issues that pop-up every single day. It takes everyone working together to make the necessary parties aware of what may be going wrong in their respective communities.

As a fledgling data scientist, I have a new-found appreciation for the Baltimore 311 Team. Things may seem like they are progressing rather slowly as I go from place to place but the data doesn’t lie. I look forward to replicating this once 2025 is over and see how much has changed from 2024 to years ahead.