This analysis focuses on exploring and interpreting the 911 call log dataset from Baltimore City to uncover key trends and insights about the city’s emergency response system. By examining factors such as call frequency, urgency, and geographic distribution, the analysis aims to identify patterns in the types of incidents reported, response times, and how effectively emergency services are allocated across neighborhoods and districts. Additionally, the data’s integrity and completeness will be assessed to ensure accurate conclusions can be drawn. The goal of this analysis is to provide a deeper understanding of Baltimore’s 911 system, highlighting areas of strength and potential opportunities for improvement in service delivery.
The dataset is a 911 call log from Baltimore City, capturing detailed information about emergency calls made to the city’s response system. It includes columns such as callKey (unique identifier), callDateTime (date and time of the call), and priority (call urgency), which provide a snapshot of when and how urgent the calls were. Additional fields like district, Neighborhood, and PoliceDistrict offer geographic context, enabling analysis of call distribution across different areas of the city. The dataset also contains incident-specific details in the description column, while administrative fields like NeedsSync, IsDeleted, and HashedRecord are used to track data integrity. This dataset allows for the examination of emergency response patterns, call frequency, and service efficiency within Baltimore.
The analysis of the 911 call log dataset uncovered several key insights. A large proportion of the calls were categorized as Non-Emergency, indicating that many incidents did not require immediate attention. The Northeastern police district recorded the highest call volume, with most calls in this district also being Non-Emergency, suggesting a high frequency of lower-priority incidents. Additionally, the Baltimore Highlands neighborhood had the highest number of reported calls, marking it as a significant area of emergency activity. The analysis also identified 3:00 PM as the peak hour for call volume, highlighting a potential trend that could inform staffing and resource deployment strategies during high-traffic times. These findings offer valuable insights into call patterns and opportunities for optimizing emergency response efforts across the city.
For my first visualization, I created a bar chart that shows which neighborhood in the data set had the highest call volume in 2023. The neighborhoods are listed on the x-axis and the call counts are listed on the y-axis.
library(data.table)
library(ggplot2)
library(dplyr)
library(lubridate)
library(scales)
library(ggthemes)
library(RColorBrewer)
library(plotly)
library(ggrepel)
filename <- "911_Calls_For_Service_2023.csv"
df <- fread(filename)
count(df, Neighborhood)
## Neighborhood n
## <char> <int>
## 1: 10547
## 2: Abell 3585
## 3: Allendale 3687
## 4: Arcadia 1013
## 5: Arlington 5920
## ---
## 275: Wrenlane 560
## 276: Wyman Park 2265
## 277: Wyndhurst 909
## 278: Yale Heights 1497
## 279: York-Homeland 1133
neighborhoodcount <- data.frame(count(df, Neighborhood))
neighborhoodcount
## Neighborhood n
## 1 10547
## 2 Abell 3585
## 3 Allendale 3687
## 4 Arcadia 1013
## 5 Arlington 5920
## 6 Armistead Gardens 3280
## 7 Ashburton 4137
## 8 Baltimore Highlands 7735
## 9 Barclay 5791
## 10 Barre Circle 580
## 11 Bayview 3664
## 12 Beechfield 1895
## 13 Belair-Edison 24781
## 14 Belair-Parkside 562
## 15 Bellona-Gittings 81
## 16 Belvedere 1293
## 17 Berea 8113
## 18 Better Waverly 7421
## 19 Beverly Hills 443
## 20 Biddle Street 2262
## 21 Blythewood 65
## 22 Bolton Hill 5124
## 23 Boyd-Booth 1019
## 24 Brewers Hill 2335
## 25 Bridgeview/Greenlawn 3905
## 26 Broadway East 10593
## 27 Broening Manor 2295
## 28 Brooklyn 26892
## 29 Burleith-Leighton 749
## 30 Butcher's Hill 1308
## 31 CARE 2838
## 32 Callaway-Garrison 2574
## 33 Cameron Village 1217
## 34 Canton 9368
## 35 Canton Industrial Area 5510
## 36 Carroll - Camden Industrial Area 6260
## 37 Carroll Park 1187
## 38 Carroll-South Hilton 1872
## 39 Carrollton Ridge 10268
## 40 Cedarcroft 1494
## 41 Cedmont 2389
## 42 Cedonia 2292
## 43 Central Forest Park 2585
## 44 Central Park Heights 15902
## 45 Charles North 11914
## 46 Charles Village 6401
## 47 Cherry Hill 16263
## 48 Cheswolde 3217
## 49 Chinquapin Park 1742
## 50 Clifton Park 3431
## 51 Coldspring 2657
## 52 Coldstream Homestead Montebello 9690
## 53 Concerned Citizens Of Forest Park 1449
## 54 Coppin Heights/Ash-Co-East 2612
## 55 Cross Country 2868
## 56 Cross Keys 867
## 57 Curtis Bay 8934
## 58 Curtis Bay Industrial Area 238
## 59 Cylburn 1918
## 60 Darley Park 1844
## 61 Dickeyville 275
## 62 Dolfield 5258
## 63 Dorchester 1990
## 64 Downtown 38666
## 65 Downtown West 3422
## 66 Druid Heights 5704
## 67 Druid Hill Park 1901
## 68 Dunbar-Broadway 5666
## 69 Dundalk Marine Terminal 47
## 70 East Arlington 1230
## 71 East Baltimore Midway 13243
## 72 Easterwood 1896
## 73 Eastwood 418
## 74 Edgewood 3446
## 75 Edmondson Village 2265
## 76 Ednor Gardens-Lakeside 4768
## 77 Ellwood Park/Monument 7075
## 78 Evergreen 1827
## 79 Evergreen Lawn 780
## 80 Evesham Park 300
## 81 Fairfield Area 4562
## 82 Fairmont 804
## 83 Fallstaff 4160
## 84 Federal Hill 5708
## 85 Fells Point 9818
## 86 Forest Park 2899
## 87 Forest Park Golf Course 142
## 88 Four By Four 1993
## 89 Frankford 26452
## 90 Franklin Square 5405
## 91 Franklintown 2483
## 92 Franklintown Road 3649
## 93 Garwyn Oaks 1209
## 94 Gay Street 5311
## 95 Glen 9885
## 96 Glen Oaks 5437
## 97 Glenham-Belhar 5213
## 98 Graceland Park 4828
## 99 Greektown 3769
## 100 Greenmount Cemetery 143
## 101 Greenmount West 2478
## 102 Greenspring 3156
## 103 Grove Park 1156
## 104 Guilford 2215
## 105 Gwynns Falls 3239
## 106 Gwynns Falls/Leakin Park 451
## 107 Hamilton Hills 8750
## 108 Hampden 9192
## 109 Hanlon-Longwood 2766
## 110 Harlem Park 5094
## 111 Harwood 3357
## 112 Hawkins Point 915
## 113 Heritage Crossing 1508
## 114 Herring Run Park 259
## 115 Highlandtown 5030
## 116 Hillen 2924
## 117 Hoes Heights 875
## 118 Holabird Industrial Park 636
## 119 Hollins Market 3649
## 120 Homeland 4749
## 121 Hopkins Bayview 3240
## 122 Howard Park 7158
## 123 Hunting Ridge 1430
## 124 Idlewood 2018
## 125 Inner Harbor 10382
## 126 Irvington 8030
## 127 Johns Hopkins Homewood 647
## 128 Johnston Square 4576
## 129 Jones Falls Area 539
## 130 Jonestown 3303
## 131 Kenilworth Park 784
## 132 Kernewood 1562
## 133 Keswick 288
## 134 Kresson 1230
## 135 Lake Evesham 221
## 136 Lake Walker 1094
## 137 Lakeland 6921
## 138 Langston Hughes 1567
## 139 Lauraville 3255
## 140 Levindale 2793
## 141 Liberty Square 4324
## 142 Little Italy 1342
## 143 Loch Raven 5540
## 144 Locust Point 631
## 145 Locust Point Industrial Area 1839
## 146 Lower Edmondson Village 828
## 147 Lower Herring Run Park 208
## 148 Loyola/Notre Dame 443
## 149 Lucille Park 820
## 150 Madison Park 3625
## 151 Madison-Eastend 2650
## 152 Mayfield 759
## 153 McElderry Park 10082
## 154 Medfield 2680
## 155 Medford 2309
## 156 Mid-Govans 2048
## 157 Mid-Town Belvedere 9221
## 158 Middle Branch/Reedbird Parks 3053
## 159 Middle East 4155
## 160 Midtown-Edmondson 3810
## 161 Millhill 5518
## 162 Milton-Montford 3557
## 163 Mondawmin 7377
## 164 Montebello 660
## 165 Moravia-Walther 641
## 166 Morgan Park 129
## 167 Morgan State University 1645
## 168 Morrell Park 11804
## 169 Mosher 3262
## 170 Mount Holly 3091
## 171 Mount Vernon 7196
## 172 Mount Washington 1496
## 173 Mount Winans 958
## 174 Mt Pleasant Park 154
## 175 New Northwood 6272
## 176 New Southwest/Mount Clare 5556
## 177 North Harford Road 3930
## 178 North Roland Park/Poplar Hill 669
## 179 Northwest Community Action 3498
## 180 O'Donnell Heights 1700
## 181 Oakenshawe 1698
## 182 Oaklee 402
## 183 Old Goucher 4125
## 184 Oldtown 5288
## 185 Oliver 7610
## 186 Orangeville 3852
## 187 Orangeville Industrial Area 1325
## 188 Orchard Ridge 2063
## 189 Original Northwood 706
## 190 Otterbein 2528
## 191 Overlea 367
## 192 Panway/Braddish Avenue 999
## 193 Park Circle 4460
## 194 Parklane 3586
## 195 Parkside 1372
## 196 Parkview/Woodbrook 2922
## 197 Patterson Park 908
## 198 Patterson Park Neighborhood 7400
## 199 Patterson Place 1693
## 200 Pen Lucy 3343
## 201 Penn North 10910
## 202 Penn-Fallsway 9119
## 203 Penrose/Fayette Street Outreach 7323
## 204 Perkins Homes 209
## 205 Perring Loch 1674
## 206 Pimlico Good Neighbors 2047
## 207 Pleasant View Gardens 1376
## 208 Poppleton 6226
## 209 Port Covington 457
## 210 Pulaski Industrial Area 8617
## 211 Purnell 423
## 212 Radnor-Winston 2077
## 213 Ramblewood 1883
## 214 Reisterstown Station 10764
## 215 Remington 5285
## 216 Reservoir Hill 6241
## 217 Richnor Springs 1530
## 218 Ridgely's Delight 968
## 219 Riverside 4642
## 220 Rognel Heights 4684
## 221 Roland Park 2516
## 222 Rosebank 1629
## 223 Rosemont 4177
## 224 Rosemont East 2264
## 225 Rosemont Homeowners/Tenants 2125
## 226 Sabina-Mattfeldt 386
## 227 Saint Agnes 2561
## 228 Saint Helena 447
## 229 Saint Josephs 2372
## 230 Saint Paul 144
## 231 Sandtown-Winchester 14392
## 232 Seton Business Park 3106
## 233 Seton Hill 2620
## 234 Sharp-Leadenhall 1689
## 235 Shipley Hill 2615
## 236 South Baltimore 2171
## 237 South Clifton Park 3447
## 238 Spring Garden Industrial Area 437
## 239 Stadium Area 1232
## 240 Stonewood-Pentwood-Winston 751
## 241 Taylor Heights 95
## 242 Ten Hills 1103
## 243 The Orchards 184
## 244 Towanda-Grantley 2070
## 245 Tremont 1086
## 246 Tuscany-Canterbury 651
## 247 Union Square 1746
## 248 University Of Maryland 3692
## 249 Uplands 1225
## 250 Upper Fells Point 2026
## 251 Upton 17754
## 252 Villages Of Homeland 70
## 253 Violetville 3831
## 254 Wakefield 1540
## 255 Walbrook 3816
## 256 Waltherson 4417
## 257 Washington Hill 4895
## 258 Washington Village/Pigtown 10876
## 259 Waverly 2918
## 260 West Arlington 1735
## 261 West Forest Park 1567
## 262 West Hills 1277
## 263 Westfield 3775
## 264 Westgate 1327
## 265 Westport 3152
## 266 Wilhelm Park 1048
## 267 Wilson Park 1206
## 268 Winchester 2067
## 269 Windsor Hills 1008
## 270 Winston-Govans 2183
## 271 Woodberry 1160
## 272 Woodbourne Heights 1612
## 273 Woodbourne-McCabe 1441
## 274 Woodmere 9635
## 275 Wrenlane 560
## 276 Wyman Park 2265
## 277 Wyndhurst 909
## 278 Yale Heights 1497
## 279 York-Homeland 1133
head(neighborhoodcount, 11)
## Neighborhood n
## 1 10547
## 2 Abell 3585
## 3 Allendale 3687
## 4 Arcadia 1013
## 5 Arlington 5920
## 6 Armistead Gardens 3280
## 7 Ashburton 4137
## 8 Baltimore Highlands 7735
## 9 Barclay 5791
## 10 Barre Circle 580
## 11 Bayview 3664
str(neighborhoodcount)
## 'data.frame': 279 obs. of 2 variables:
## $ Neighborhood: chr "" "Abell" "Allendale" "Arcadia" ...
## $ n : int 10547 3585 3687 1013 5920 3280 4137 7735 5791 580 ...
neighborhoodcount$n <- as.numeric(neighborhoodcount$n)
str(neighborhoodcount)
## 'data.frame': 279 obs. of 2 variables:
## $ Neighborhood: chr "" "Abell" "Allendale" "Arcadia" ...
## $ n : num 10547 3585 3687 1013 5920 ...
ggplot(neighborhoodcount[2:11,], aes(x=reorder(Neighborhood, -n), y=n))+
geom_bar(colour="darkblue", fill="lightblue", stat="identity")+
labs(title = "Number of Calls by Neighborhood (Top 10)", x="Neighborhood", y="Call Count")+
theme(plot.title = element_text(hjust = 0.5))
This chart visualizes the distribution of citation counts across different hours of the day. It shows a significant drop in citations around early morning (between 4:00 and 6:00), followed by a sharp increase in the afternoon, peaking around 14:00 (2 PM) with a value of 59,880 citations. Afterward, there is a decline, with the lowest count occurring between 5:00 and 6:00 in the morning, showing around 18,520 citations. The chart uses a line with markers to connect the data points, with the highest and lowest values highlighted in purple.
hours_df <- df %>%
select(callDateTime) %>%
mutate(hour24 = hour(ymd_hms(callDateTime))) %>%
group_by(hour24) %>%
summarise(n=length(callDateTime), .groups = 'keep') %>%
data.frame()
hours_df
## hour24 n
## 1 0 48924
## 2 1 46811
## 3 2 40372
## 4 3 33397
## 5 4 27984
## 6 5 23121
## 7 6 18520
## 8 7 27301
## 9 8 40408
## 10 9 45549
## 11 10 48266
## 12 11 50172
## 13 12 49708
## 14 13 47333
## 15 14 42869
## 16 15 50389
## 17 16 59880
## 18 17 59689
## 19 18 57498
## 20 19 55103
## 21 20 52437
## 22 21 46748
## 23 22 35277
## 24 23 40819
str(hours_df)
## 'data.frame': 24 obs. of 2 variables:
## $ hour24: int 0 1 2 3 4 5 6 7 8 9 ...
## $ n : int 48924 46811 40372 33397 27984 23121 18520 27301 40408 45549 ...
x_axis_labels = min(hours_df$hour24):max(hours_df$hour24)
x_axis_labels
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
hi_lo <- hours_df %>%
filter(n==min(n)|n==max(n)) %>%
data.frame()
hi_lo
## hour24 n
## 1 6 18520
## 2 16 59880
ggplot(hours_df, aes(x = hour24, y = n)) +
geom_line(color = 'black', size = 1) +
geom_point(shape = 21, size = 4, color = 'purple', fill = 'white') +
labs(x = "Hour", y = "Call Count", title = "Calls by Hour", caption = "Source: Baltimore City Website: baltimorecity.gov") +
scale_y_continuous(labels = comma) +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_x_continuous(labels = x_axis_labels, breaks = x_axis_labels, minor_breaks = NULL) +
geom_point(data = hi_lo, aes(x = hour24, y = n), shape = 21, size = 4, fill = 'purple', color = 'purple') +
geom_label_repel(
aes(label = ifelse(hour24 %in% hi_lo$hour24, scales::comma(n), "")),
box.padding = 0.5,
point.padding = 0.3,
size = 4,
color = 'Grey50',
segment.color = 'pink'
)
This pie chart displays the distribution of 911 call priorities, with each slice representing a different priority level. The largest slice, colored yellow, corresponds to “Non-Emergency” calls. The smaller slices are labeled for various priority levels such as “Emergency,” “High,” “Low,” “Medium,” and “Out of Service.” The chart excludes slices representing less than 5% of the total, which are not labeled.
priority_count <- df %>%
group_by(priority) %>%
summarise(count = n()) %>%
arrange(desc(count))
priority_count
## # A tibble: 7 × 2
## priority count
## <chr> <int>
## 1 "Non-Emergency" 538542
## 2 "Low" 263178
## 3 "Medium" 190851
## 4 "High" 55757
## 5 "Emergency" 150
## 6 "Out of Service" 93
## 7 "" 4
priority_count$priority <- factor(priority_count$priority, levels = priority_count$priority)
priority_count
## # A tibble: 7 × 2
## priority count
## <fct> <int>
## 1 "Non-Emergency" 538542
## 2 "Low" 263178
## 3 "Medium" 190851
## 4 "High" 55757
## 5 "Emergency" 150
## 6 "Out of Service" 93
## 7 "" 4
priority_count$priority
## [1] Non-Emergency Low Medium High Emergency
## [6] Out of Service
## Levels: Non-Emergency Low Medium High Emergency Out of Service
priority_count <- df[, .N, by = priority]
priority_count
## priority N
## <char> <int>
## 1: Non-Emergency 538542
## 2: Low 263178
## 3: Medium 190851
## 4: High 55757
## 5: Out of Service 93
## 6: Emergency 150
## 7: 4
ggplot(priority_count, aes(x = "", y = N, fill = priority)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y") +
labs(title = "911 Calls Distribution by Priority") +
theme_minimal() +
theme(axis.text.x = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank()) +
theme(plot.title = element_text(hjust = 0.5))+
scale_fill_brewer(palette = "Set2")
This heatmap visualization illustrates the volume of 911 calls across different police districts in Baltimore City, highlighting the predominant call priority in each district. It provides a clear view of where emergency calls are concentrated and allows for quick identification of the most common priorities in each area, offering valuable insights into how resources may be allocated based on call volume and urgency.
district_priority_counts <- df %>%
group_by(PoliceDistrict, priority) %>%
summarise(call_count = n(), .groups = "drop")
district_priority_counts
## # A tibble: 61 × 3
## PoliceDistrict priority call_count
## <chr> <chr> <int>
## 1 "" Emergency 34
## 2 "" High 28
## 3 "" Low 135
## 4 "" Medium 92
## 5 "" Non-Emergency 10142
## 6 "Central" Emergency 18
## 7 "Central" High 6493
## 8 "Central" Low 39880
## 9 "Central" Medium 24173
## 10 "Central" Non-Emergency 47923
## # ℹ 51 more rows
mylevels <- c('Emergency', 'High', 'Medium', 'Low', 'Non-Emergency', 'Out of Service')
district_priority_counts$priority <- factor(district_priority_counts$priority, levels = mylevels)
ggplot(district_priority_counts, aes(x = PoliceDistrict, y = priority, fill = call_count)) +
geom_tile(color = "white")+
geom_text(aes(label = comma(call_count)), size = 3)+
coord_equal(ratio = 1)+
scale_fill_gradient(low = "yellow", high = "red") +
labs(title = "911 Calls by Police District and Priority",
x = "Police District",
y = "Priority Level",
fill = "Number of Calls") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 14),
axis.text.x = element_text(angle = 30, hjust = 1, size = 10))
This visualization is a stacked bar chart that shows the total number of 911 calls categorized by both neighborhood and incident description. It provides a detailed breakdown of call frequencies across different neighborhoods, allowing for a clear comparison of the most common types of incidents reported in each area. By stacking the bars based on the call descriptions, this chart offers insights into the distribution and nature of emergency calls throughout the city, helping to identify patterns and potential areas in need of focused emergency response.
district_priority_counts <- df %>%
group_by(district, priority) %>%
summarise(call_count = n(), .groups = "drop")
district_priority_counts
## # A tibble: 70 × 3
## district priority call_count
## <chr> <chr> <int>
## 1 CD Emergency 19
## 2 CD High 6352
## 3 CD Low 34167
## 4 CD Medium 24103
## 5 CD Non-Emergency 42940
## 6 CD Out of Service 22
## 7 CW Emergency 1
## 8 CW High 39
## 9 CW Low 73
## 10 CW Medium 27
## # ℹ 60 more rows
ggplot(district_priority_counts, aes(x= reorder(district, call_count, sum), y=call_count, fill=priority))+
geom_bar(stat = "identity", position = position_stack(reverse=TRUE))+
coord_flip()+
labs(title = "911 Calls by District & Description", x="", y="Call Count", fill = "Call Priority")+
theme_light()+
theme(plot.title = element_text(hjust = 0.5))+
scale_fill_brewer(palette = "Paired", guide = guide_legend(reverse=TRUE))
In conclusion, the analysis of the 911 call log dataset from Baltimore City, along with the accompanying visualizations, provides valuable insights into the city’s emergency response system. The visualizations reveal key patterns in call distribution, prioritization, and geographic areas with the highest call volumes. Notably, Non-Emergency calls constitute the majority of incidents, with certain districts, such as the Northeastern district, experiencing the highest volume of calls, many of which are categorized as lower-priority. Neighborhoods like Baltimore Highlands also stand out for their high call frequencies. Additionally, temporal trends, such as a peak in call volume at 3:00 PM, suggest potential areas for improving staffing and resource allocation during high-traffic times. The stacked bar chart further highlights the distribution of incident types across neighborhoods, while the heatmap provides a clear view of call volumes and priority levels by district. These findings offer a comprehensive understanding of call patterns and can serve as a basis for optimizing emergency response strategies and improving resource deployment across the city.