Baltimore has a crime problem. According to WorldAtlas the city has the third highest violent crime rate in the United States.
Robbery, Assault, and Murder have been identified as key problems in the city.
This suggests Baltimore crime data may provide some valuable insight for analysis.
Data visualization is an efficient method to to tell a story about Baltimore’s crime.
Baltimore Crime data was accessed from https://opendata.arcgis.com/.
Due to lack of data before 2014 and incomplete data for 2021, only 2014-2020 data was used for analysis.
After removal of redundant ID columns, the dataset contains 15 variables. Only a few of these were utilized for analysis.
Year, month, and hour columns were created using the crime date/time variable.
Below is a summary of the retained columns. Post, VRIName, GeoLocation , CrimeCode, Total_Incidents, Post were removed after reading in the data.
#----Data Summary----
summary(crime)
## CrimeDateTime CrimeCode Location Description
## Length:325286 Length:325286 Length:325286 Length:325286
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Inside_Outside Weapon Post District
## Length:325286 Length:325286 Length:325286 Length:325286
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Neighborhood Latitude Longitude GeoLocation
## Length:325286 Min. : 0.00 Min. :-81.53 Length:325286
## Class :character 1st Qu.:39.29 1st Qu.:-76.65 Class :character
## Mode :character Median :39.30 Median :-76.61 Mode :character
## Mean :39.24 Mean :-76.50
## 3rd Qu.:39.33 3rd Qu.:-76.59
## Max. :39.66 Max. : 0.00
##
## Premise VRIName Total_Incidents year
## Length:325286 Length:325286 Min. :1 2014:45308
## Class :character Class :character 1st Qu.:1 2015:48187
## Mode :character Mode :character Median :1 2016:48783
## Mean :1 2017:52174
## 3rd Qu.:1 2018:48496
## Max. :1 2019:46378
## 2020:35960
## month hour dayOfWeek
## Length:325286 Min. : 0.00 Length:325286
## Class :character 1st Qu.: 8.00 Class :character
## Mode :character Median :14.00 Mode :character
## Mean :13.19
## 3rd Qu.:19.00
## Max. :23.00
##
Below is a stacked bar chart that shows crime by year.
Larceny, common assault, and burglary are the most common crimes. Instances of crime in 2020 appear to be down in general, across the different crimes. This is likely due to less interaction because of COVID-19. Though the number of instances of aggrevated assault and common assault do not appear to have declined much between 2019 and 2020. Street robbery is much more common when compared to commerical or residence robbery. Arson is the least frequent crime committed.
# Group crimes by year
crimeByYear = crime %>%
group_by(Description, year) %>%
summarise(n = length(Description), .groups = 'keep') %>%
data.frame()
# Create aggregate total dataframe
crime_aggTotal = crimeByYear %>%
dplyr::select(Description, n) %>%
group_by(Description) %>%
summarise(tot = sum(n), .groups = 'keep') %>%
data.frame()
# Plot
p1 = ggplot(crimeByYear,
aes(x = reorder(Description,n,sum), y = n, fill = year))+
geom_bar(stat = 'identity', position = position_stack(reverse = TRUE))+
coord_flip()+
labs(title='Crime Count by Year (2014-2020)', x = '',
y = 'Crime Count', fill = 'Year')+
theme_light()+
theme(plot.title = element_text(hjust = 0.5))+
scale_fill_brewer(palette = 'Set2',
guide = guide_legend(reverse = TRUE))+
geom_text(data = crime_aggTotal, aes(x = Description, y = tot, label = scales::comma(tot),fill = NULL),
hjust = -.1,size = 3) +
scale_y_continuous(labels = comma,
breaks = seq(0,80000,by = 10000),
limits = c(0,80000))
p1
Below is crime committed by day of the week for each year between 2014-2020.
As suggested, crime in 2020 was down dramatically in relation to other years.
All of the years between 2014 and 2020 show a similar trend where crime peaks on Friday and trails off over the rest of the weekend. 2019 and 2015 were outliers in that Tuesday and and Monday accounted accounted for the most crimes, respectively.
It would be helpful to see the relationship between day of the week and different crimes. This could offer insight as to when specific crimes are more or less frequently committed.
#----Time and Day---------
# Create day dataframe
days_df = crime %>%
dplyr::select(dayOfWeek, year)%>%
group_by(year, dayOfWeek)%>%
summarise(n = length(dayOfWeek)
, .groups = 'keep') %>%
data.frame()
# Add levels to create day order
day_order = factor(days_df$dayOfWeek,
level = c('Mon','Tue','Wed','Thu','Fri','Sat','Sun'))
# Plot
p2 = ggplot(days_df,
aes(x = day_order, y = n, group = year)) +
geom_line(aes(color = year), size = 3) +
labs(title = 'Crimes by Day and by Year (2014-2020)',
x = 'Day of Week',
y = 'Crime Count') +
theme_light()+
theme(plot.title = element_text(hjust = 0.5))+
geom_point(shape = 21, size = 3, color = 'black',fill = 'white')+
scale_y_continuous(labels = comma, breaks = seq(5000,8000,by = 500)) +
scale_color_brewer(palette= 'Paired', name = 'Year', guide = guide_legend(reverse = TRUE))
p2
Below is a heatmap showing the number of each crime committed by district between 2014-2020.
It is apparent that Northeast and Southeast Baltimore have the most crimes.
Eastern Baltimore has the greatest number of shootings and homicides, but has a relatively low number of other crimes. This could be an area with heavy gang activity.
Northeast Baltimore stands out for its relatively high number of common assault and burglary.
#----Crime By District Data -----
# Group by Districts and Crimes
districts_and_crimes = crime %>%
dplyr::select(District, Description)%>%
group_by(District, Description)%>%
summarise(n = length(District),
.groups = 'keep')%>%
data.frame()
# Add unknown district for blanks
districts_and_crimes$District[districts_and_crimes$District == ''] = 'UNKNOWN'
# Create Breaks
breaks = c(seq(0,max(districts_and_crimes$n),by = 2000))
# Plot
p3 = ggplot(districts_and_crimes,
aes(x=District, y = Description, fill = n))+
geom_tile(color='black')+
geom_text(aes(label=round(n,0)),
size = 2.8)+
coord_equal(ratio=.6)+
labs(title='Heatmap: Crimes By District (2014-2020)',
x='District',
y='Crime Description',
fill = 'Crime Count')+
theme_minimal()+
theme(plot.title=element_text(hjust=0.5),
axis.text.x = element_text(angle = 35, hjust = 1))+
scale_fill_continuous(
low = 'white', high = 'blue',
breaks = breaks,
labels = comma)+
guides(fill = guide_legend(reverse = TRUE,
override.aes=list(color='black')))
p3
Below is a line plot showing the crime count by hour between 2014-2020.
There is a sharp decline in crime beginning at midnight. The number of crimes was lowest between 4 and 7 A.M.
The number of crimes was highest at 6 P.M. This could be because more people are out of work and out in public.
The number of crimes remained elevated between 5 P.M. and 1 A.M.
#-----Crime By Hour-----
hours_df = crime %>%
dplyr::select(hour)%>%
group_by(hour)%>%
summarise(n = length(hour), .groups = 'keep')%>%
data.frame()
# Plot
p4 = ggplot(hours_df,
aes(x = hour, y= n))+
geom_line(color = 'black',size =1)+
geom_point(shape=21, size = 4,
color = 'red',fill='white')+
labs(x = 'Hour',y = 'Crime Count',
title = 'Citations by Hour')+
scale_y_continuous(labels = comma)+
theme_classic()+
theme(plot.title = element_text(hjust = 0.5))+
scale_x_continuous(labels = 0:23,
breaks = 0:23,
minor_breaks = NULL)+
geom_label_repel(aes(label=scales::comma(n)),
box.padding=.8,
point.padding=.8,
size=3,
fill='cyan',
color='black',
segment.color='black')
p4
Below are heatmaps that show frequency of crimes of in different areas in Baltimore City.
The homicide and rape maps look similar from afar. The larceny map shows some highlighted areas that are not highlighted on the other maps.
Murders appear to in clusters on specific blocks, these are worth examining further. These may be gang related.
For example, Garrison Blvd, Clifton Ave, and Dennison St. are all in the same vicinity and there are 10 murders all within a block or two of one another.
#----Murder Map----
murder_df = crime[crime$Description == 'HOMICIDE',]
rape_df = crime[crime$Description=='RAPE',]
larceny_df = crime[crime$Description=='LARCENY',]
# Murder Map Plot
p5 = leaflet()%>%
addProviderTiles(providers$CartoDB.Positron)%>%
setView(lng=-76.6,lat=39.2,zoom=10.3)%>%
addHeatmap(lng = murder_df$Longitude,
lat = murder_df$Latitude,
radius = 8)
p5
There appears to be a hotspot for rape crimes at Bayview.
Carlton Ridge and Montclare also have been a location for a relatively high number of rape crimes.
p6 = leaflet()%>%
addProviderTiles(providers$CartoDB.Positron)%>%
setView(lng=-76.6,lat=39.2,zoom=10.3)%>%
addHeatmap(lng = rape_df$Longitude,
lat = rape_df$Latitude,
radius = 8)
p6
The larceny map shows larceny has spread to other areas on the map that were not visible for murder or rape.
For example, the area surrounding Hawkins Point as well as Dundalk show several larceny crimes were commited.
There are also areas that appear to be outside of the Baltimore City domain. This may suggest that there was an error in data entry for latitude/longitude or data was included in the dataset that was not for Baltimore City.
p7 = leaflet()%>%
addProviderTiles(providers$CartoDB.Positron)%>%
setView(lng=-76.6,lat=39.2,zoom=10.3)%>%
addHeatmap(lng = larceny_df$Longitude,
lat = larceny_df$Latitude,
radius = 8)
p7
Below are charts that show how frequently different weapons were used for each crime between 2014 and 2020.
The different larceny, robbery, and assault crime types were combined into larceny, robbery, and assualt categories, respectively.
Some of the crimes are self explanatory such as shooting and arson. Obviously the weapons used in these cases were always firearms and fire, respectively. Also certain crimes are differentiated because they are considered non violent - like larceny and burglary.
Knives were used less freqently than expected. Only 8.9% of assaults and 7.5% of homicides were committed using a knife.
Robbery was often committed using a firearm, but 33.8% of robberies were committed with no weapon at all.
Although several pie charts suggest weapons were not used at all for crimes - I’d argue they still have value. For example, one can see burglary differs from robbery in that it does not involve weapons.
#----Pie Chart: Weapon Usage-----
crime2 = crime
crime2$Description[crime2$Description %like% 'LARCENY'] = 'LARCENY'
crime2$Description[crime2$Description %like% 'ROBBERY'] = 'ROBBERY'
crime2$Description[crime2$Description %like% 'ASSAULT'] = 'ASSAULT'
# Group by Crime and Weapon
weapon_df = crime2%>%
dplyr::select(Weapon, Description)%>%
group_by(Description, Weapon)%>%
summarise(n=length(Description),
.groups='keep')%>%
group_by(Description)%>%
mutate(pctOfTotal = round(100*n/sum(n),1))%>%
ungroup()%>%
data.frame()
# Replace "NA" with "NONE"
weapon_df$Weapon[is.na(weapon_df$Weapon)] = 'NONE'
# Weapon Used By Crime Plot
p8 = ggplot(weapon_df,
aes(x='',y=n, fill = Weapon))+
geom_bar(stat='identity',position='fill')+
coord_polar(theta='y', start = 0)+
labs(fill='Weapon Used',
x=NULL,y=NULL,
title='Weapon Used By Crime')+
theme_light()+
theme(plot.title = element_text(hjust=0.5),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank()) +
facet_wrap(~Description,nrow = 3, ncol = 3)+
scale_fill_brewer(palette = 'Paired')+
geom_text(aes(x=1.7,
label = paste0(pctOfTotal,"%")),
size = 2.1,
position=position_fill(
vjust=0.5))
p8
The analysis of Baltimore crime data helped show several observable trends. The heatmap that showed crime by district was very valuable. Creating multiple heatmaps by year could help visualize changes over time for each district.
I also found value in the the heatmaps that showed areas of Baltimore that had high frequencies of crimes like rape and homicide. In addition to making people aware of riskier areas, the knowledge can help Baltimore deploy adequate resources to the appropriate areas.