As the COVID-19 pandemic continues, I’ve had much more time to play around with data. Like many people, I was interested in looking at the coronavirus data for myself, especially with regards to my community and the communites of my family and friends.
Once quarantine procedures were put in place, I was also interested in seeing how this affected other areas of life, particularly crime. Baltimore City is unfortunately known to have a high crime rate, and I was curious if the state-wide quarantine procedures put into effect in March would have any affect on the level of crime reported in the city.
If there was an affect, which types of crimes were affected, and how? And, if there was a change in crime rate, was there any way to really prove that it was due to COVID-19?
All data used for this project is free and publicly available. Here are my sources:
COVID-19 Data: Location and count data for the number of confirmed COVID-19 cases in the United States was obtained from the Novel Coronavirus (COVID-19) Cases public github repository provided by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Their repo is also supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
Baltimore Crime Data: Information about reported victim-based crimes in Baltimore City was obtained from the BPD Part 1 Victim Based Crime Data dataset found at data.baltimorecity.gov.
The following R packages were used in the project:
library(tidyverse)
library(gridExtra)
library(knitr)
library(kableExtra)
I first uploaded data about reported crimes published by the Baltimore Police Department. Here is a sample of the data:
#March crime data from BPD Part 1 Victim Based Crime Data
crime <- read.csv("C:/Users/Morganak/Documents/R/Projects/COVID-19/crime_data/BPD_Part_1_Victim_Based_Crime_Data_mar2020.csv", stringsAsFactors = TRUE)
head(crime)
## ï..CrimeDate CrimeTime CrimeCode Location Description
## 1 03/01/2020 20:00:00 4E 1600 N CHESTER ST COMMON ASSAULT
## 2 03/01/2020 03:00:00 7A 700 MELVILLE AVE AUTO THEFT
## 3 03/01/2020 19:00:00 6E 200 S HILTON ST LARCENY
## 4 03/01/2020 14:00:00 6G 800 GLADE CT LARCENY
## 5 03/01/2020 12:40:00 8FV 3700 ELLERSLIE AVE ARSON
## 6 03/01/2020 14:57:00 4B 500 E 26TH ST AGG. ASSAULT
## Inside.Outside Weapon Post District Neighborhood Longitude
## 1 I <NA> 331 EASTERN BROADWAY EAST -76.58853
## 2 O <NA> 515 NORTHERN WAVERLY -76.60595
## 3 O <NA> 835 SOUTHWEST CARROLL-SOUTH HILTON -76.67207
## 4 I <NA> 913 SOUTHERN BROOKLYN -76.60081
## 5 I FIRE 515 NORTHERN EDNOR GARDENS-LAKESI -76.60505
## 6 I KNIFE 513 NORTHERN BETTER WAVERLY -76.60935
## Latitude Location.1 Premise vri_name1 Total.Incidents
## 1 39.30885 NA ROW/TOWNHOUSE-OCC 1
## 2 39.33158 NA STREET 1
## 3 39.28278 NA STREET 1
## 4 39.22955 NA ROW/TOWNHOUSE-OCC 1
## 5 39.33527 NA ROW/TOWNHOUSE-OCC 1
## 6 39.31896 NA ROW/TOWNHOUSE-VAC 1
Next, I did some slight reformatting…
names(crime)[1]<-"Date"
crime$Date<-as.Date(crime$Date, format = "%m/%d/%Y")
…followed by some exploring.
The first thing I wanted to see was the number of total crimes reported during March, 2020:
#summary tables of crimes
crime_count<-crime %>% group_by(Description) %>% tally()
sum(crime_count$n)#total crimes in March
## [1] 2632
That’s a lot of crime!
Next, I was interested in seeing the types of crimes that were being tracked, and how many crimes of each type had been reported:
kable(crime_count, col.names = c("Crime Type", "Count")) %>% kable_styling(bootstrap_options = "striped", full_width = F)
| Crime Type | Count |
|---|---|
| AGG. ASSAULT | 401 |
| ARSON | 8 |
| AUTO THEFT | 225 |
| BURGLARY | 291 |
| COMMON ASSAULT | 565 |
| HOMICIDE | 16 |
| LARCENY | 543 |
| LARCENY FROM AUTO | 209 |
| RAPE | 11 |
| ROBBERY - CARJACKING | 35 |
| ROBBERY - COMMERCIAL | 43 |
| ROBBERY - RESIDENCE | 40 |
| ROBBERY - STREET | 196 |
| SHOOTING | 49 |
Kind of hard to see which types stand out.
Here is the same info in an easier-to-read graph:
#Crime by type
p<-ggplot(crime_count, aes(crime_count$Description, crime_count$n))+
geom_bar(stat = "identity", color=crime_count$Description) +
labs(title="March 2020 ", subtitle= "Source: data.baltimorecity.gov", x=NULL, y="Count") + coord_flip()
p
As you can see, certain crime types really stand out.
I was also curious to see how many crimes were occurring per day, regardless of type:
#Restructuring data to obtain daily counts
daily_crime<-crime %>% group_by(Date) %>% tally()# number crimes occurring by date
p20<-ggplot(data=daily_crime, aes(x=Date, y=n)) + geom_line(color = "#00AFBB", size = 2)+
ggtitle("Baltimore Crimes Reported for March, 2020") + ylab("Crimes Reported") + xlab(NULL)+
ylim(0,160) +
scale_x_date(breaks=as.Date(c("2020-03-01", "2020-03-05", "2020-03-10", "2020-03-15", "2020-03-20","2020-03-25")), date_labels = "%m/%d")
p20
Social distancing procedures were underway in Maryland more strictly in mid-March. Many people began to work from home, and the governor closed schools and state offices.
At a glance, this does look like the updward crime trend does seem to stop in mid-March. But what was last year like? Maybe crime always decreases at that time of year.
To see what things were like last year, I pulled the same data, but this time filtered by March 2019. Then I ran the same code as above:
crime19<-read.csv("C:/Users/Morganak/Documents/R/Projects/COVID-19/crime_data/BPD_Part_1_Victim_Based_Crime_Data_mar2019.csv", stringsAsFactors = TRUE)
names(crime19)[1]<-"Date"
crime19$Date<-as.Date(crime19$Date, format = "%m/%d/%Y")
#summary tables of crimes in 2019
crime19_count<-crime19 %>% group_by(Description) %>% tally()
sum(crime19_count$n)#total crimes in March 2019
## [1] 3465
I can see already that the total number of crimes reported in March 2019 varies greatly from those reported this year. Much more crime was report in March 2019 than March 2020.
What about other crime stats? I’ll skip right to the bar plot this time.
p2<-ggplot(crime19_count, aes(crime19_count$Description, crime19_count$n))+
geom_bar(stat = "identity", color=crime19_count$Description) +
labs(title="March 2019 ", subtitle= "Source: data.baltimorecity.gov", x=NULL, y="Count") + coord_flip()
require(gridExtra)
grid.arrange(p2,p, ncol=2)
Doesn’t look much different from 2019. My impression is that the type of crimes being committed haven’t changed much between last year and the present.
So, how do overall crime levels compare between years?
To see that, I made a line graph for March 2019 and put that next to the March 2020 line graph for quick comparison.
daily_crime19<-crime19 %>% group_by(Date) %>% tally()# number crimes occurring by date
#Side-by-Side Line graphs March 2019 and 2020 total crime levels
p19<-ggplot(data=daily_crime19, aes(x=Date, y=n)) + geom_line(color = "#00AFBB", size = 2)+
ggtitle("March, 2019") + ylab("Crimes Reported") + xlab(NULL)+
ylim(0,160) +
scale_x_date(breaks=as.Date(c("2019-03-01", "2019-03-05", "2019-03-10", "2019-03-15", "2019-03-20", "2019-03-25")), date_labels = "%m/%d")
p20<-ggplot(data=daily_crime, aes(x=Date, y=n)) + geom_line(color = "#00AFBB", size = 2)+
ggtitle("March, 2020") + ylab("Crimes Reported") + xlab(NULL)+
ylim(0,160) +
scale_x_date(breaks=as.Date(c("2020-03-01", "2020-03-05", "2020-03-10", "2020-03-15", "2020-03-20","2020-03-25")), date_labels = "%m/%d")
require(gridExtra)
grid.arrange(p19,p20, ncol=2)
It definitely looks like a difference between the end of March in 2019 and the end of March this year. However, I don’t really like how the x-axis breaks are spread out differently at the end of each month.
March 2019 had crimes reported after March 28th, but March 2020 didn’t, so the x-axes breaks are not evenly spaced.
I want people to focus on the content of the graph, not be distracted by the axis breaks, so I decided to try plotting both years on the same graph:
#new data without the year so differences on same days can be seen
daily_crime$Date<-format(daily_crime$Date, format="%m/%d")
daily_crime19$Date<-format(daily_crime19$Date, format="%m/%d")
all<-daily_crime19 %>% full_join(daily_crime, by = "Date")
names(all)[2]<-"Count2019"
names(all)[3]<-"Count2020"
all$Date<-as.Date(all$Date,"%m/%d")
Since both lines are on the same plot, it’s a good idea to choose colors that are color-blind friendly, as 1 in 20 people experience some form of colorblindness.
I used the color-picker by David Nichols that can be found at Coloring for Colorblindness.
colors<-c("2019"="#10A0BA" , "2020"="#D81B60")
p<-ggplot(data=all, aes(x=as.Date(all$Date, "%m/%d"))) + geom_line(aes(y=all$Count2019, color="2019")) +
geom_line(aes(y=all$Count2020, color="2020")) +
scale_color_manual(values=colors, name=NULL) +
labs(title="Baltimore Crime: March 2019 vs. March 2020", subtitle= "Source: data.baltimorecity.gov", x=NULL, y="Reported Crimes")
p
The final graph is definitely easier to read! Well, sort of. If you wanted to compare the exact differences between dates, this would not be the graph of choice. It’s a little hard to read at the beginning of the month, when the crime rates are more similar.
But it gets the point across pretty quickly. There is a clear difference in the trend of Baltimore crime last year (blue line) and this year (red line).
Can that change be attributed directly to COVID-19? Probably not directly. But I’d be curious to see how the data compares to earlier years, as well as whether the downward trend continues through the summer.