Introduction

The exploration of DC crime data through R and RStudio using ggplot function in the tidyverse library

This data encompasses all reported crime data in DC for the year of 2023. The data includes variables such as report dates, police shift it occurred on, method of crime, type of crime, and what district the crime happened in. There is also a spatial component to the data that would be good to visualize.

Data Preperation

library(tidyverse) 
# tidy verse Includes GGPLOT2
library(here)
library(skimr)
library(janitor)
library(ggbeeswarm)
library(RColorBrewer)

Import the 2023 DC Crime Data

# Read the csv as crime23
crime23 <- read.csv(here("Lab3", "Crime2023.csv"))

Lets review the data!

glimpse(crime23)
## Rows: 34,218
## Columns: 18
## $ X              <dbl> -77.04559, -76.93632, -76.98733, -77.00132, -77.05789, …
## $ Y              <dbl> 38.91335, 38.89190, 38.86447, 38.89612, 38.92933, 38.91…
## $ CCN            <int> 23121881, 23124238, 23173340, 23420013, 23424043, 23002…
## $ REPORT_DAT     <chr> "2023/07/27 19:36:34+00", "2023/07/31 14:17:54+00", "20…
## $ SHIFT          <chr> "EVENING", "DAY", "MIDNIGHT", "DAY", "DAY", "MIDNIGHT",…
## $ METHOD         <chr> "OTHERS", "OTHERS", "GUN", "OTHERS", "OTHERS", "OTHERS"…
## $ OFFENSE        <chr> "BURGLARY", "ROBBERY", "THEFT/OTHER", "THEFT/OTHER", "T…
## $ BLOCK          <chr> "1700 - 1799 BLOCK OF CONNECTICUT AVENUE NW", "4600 - 4…
## $ XBLOCK         <dbl> 396046.2, 405524.4, 401099.5, 399886.0, 394981.0, 39655…
## $ YBLOCK         <dbl> 138387.6, 136006.7, 132959.9, 136474.0, 140161.8, 13912…
## $ DISTRICT       <int> 2, 6, 7, 1, 2, 3, 5, 1, 3, 3, 4, 3, 3, 7, 1, 2, 3, 3, 2…
## $ LATITUDE       <dbl> 38.91334, 38.89189, 38.86446, 38.89611, 38.92932, 38.91…
## $ LONGITUDE      <dbl> -77.04559, -76.93632, -76.98733, -77.00131, -77.05788, …
## $ BID            <chr> "DUPONT CIRCLE", "", "", "", "", "", "", "CAPITOL HILL"…
## $ START_DATE     <chr> "2023/07/27 13:31:00+00", "2023/07/31 11:22:00+00", "20…
## $ END_DATE       <chr> "2023/07/27 17:32:00+00", "2023/07/31 13:05:00+00", "20…
## $ OBJECTID       <int> 484230789, 484230790, 484230795, 484230804, 484230824, …
## $ OCTO_RECORD_ID <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Tidy up the data!

# do a minor tidying of the data by turning all the headers to lowercase
newCrime <- select_all(crime23, tolower)

Analysis

Overview of the overall crime offense in DC for each District

The following bar graph counts the amount of crimes by type of crime for each district. Each bar graph represents one of the seven Metropolitan Police Districts (MPD) in DC. Based on the chart, it is clear that “Theft/Other” is the most dominate crime in DC where district 2 and 3 appear to have the most crimes.

newCrime %>% 
  ggplot(aes(x = offense, color = offense, fill = offense)) + # create a ggplot while adjusting the aesthetics, x axis, y axis and fill of the bars
  geom_bar() + # selects the type of ggplot
  facet_wrap(~ district) + # wraps individual bar graphs by district
  labs(title = "Crime offense by District", # labs allows edits to specific features such as title, caption, x, and y axis
       caption = "Data derived from DC Open Data",
       x = "Offense",
       y = "Crime Count") +
  theme(axis.text.x = element_blank()) # removes unecessary labels on the x axis

District crime counts and type with gun violence filtered

Gun violence has been filtered as a method of crime. Any crime that used a gun was filtered to change the results of the graph. Previous bar graph revealed that District Two and Three had the highest crime reports. However, District Six and Seven have the highest gun violence crimes in DC. Those crimes are primarily robbery and assault with a dangerous weapon.

newCrime %>% 
  # Adds another layer of analysis to the process by filtering only gun violence
  filter(method == "GUN") %>% 
  ggplot(aes(x = offense, color = offense, fill = offense)) +
  geom_bar() +
  facet_wrap(~ district) +
  labs(title = "Crime offense by District",
       caption = "Data derived from DC Open Data",
       x = "Offense",
       y = "Crime Count") +
  theme(axis.text.x = element_blank())

Analyzing total crime in DC based on 2023 months

The dates had hours, minutes, and miliseconds. This was difficult to overlay in a linier time series so I removed some end characters from the string to only reflect the year and month. This created it into a table with the counts of crimes for every day.
crimedate <- table(substr(newCrime$report_dat, 1, 7)) # looks into the report_dat field in the newCrime object to count each instance and create a table with two observations (variable and frequency)
Transformed the table to a data frame so I could plot it using ggplot.
dfcrimedays <- as.data.frame(crimedate) # turns table into a data frame
dfcrimedays %>% 
  ggplot(aes(x = Var1, y = Freq, fill = Freq)) +
  geom_col() +
  labs(title = "Crime over the years in DC",
       caption = "Data derived from DC Open Data",
       x = "Month",
       y = "Crime Count") +
  theme(axis.text.x = element_text(angle = 18)) # changes the angle of the x axis text so that they don't overlap eachother. 

Lets save our ggplots!

ggsave(here("2023totalcrime.png")) # saves the last ggplot!
## Saving 7 x 5 in image

The previous ggplot will save and the result will look identical… almost like twins!

Conclusion and Findings

We reviewed the 2023 DC Crime data, made minor changes, manipulated a table and dataframe, and created several ggplots so that we could visualize some changes with filtered data and specific fields.
Overall we understand that there was a large increase in crime during the summer months and right before Christmas. Furthermore, we learned that District Two and Three have the largest “overall crime,” but District Six and Seven have the most gun violence in DC.

See you next time!

Cody Longbotham - PSU - GEOG588 - Lab 3