Introduction
Crime has been a common topic lately when talking about Washington D.C.. To understand the phenomenon, this document provides a clear analysis and aims to treat the following: ” what are the most prevalent crimes in Washington D.C., their distribution across times of the day, their frequency by day, and the method used?”
To proceed with the analysis and ensure that accurate outcomes result from it, this document used authentic data provided by data.gov–an official website of the US government–which can be accessed through the following link “https://hub.arcgis.com/api/v3/datasets/74d924ddc3374e3b977e6f002478cb9b_7/downloads/data?format=csv&spatialRefId=26985&where=1%3D1/ .” This dataset contains 19,311 observations and 25 variables with crucial technical information on crimes committed from the beginning of this year up until august 2025 that facilitate the completion of this project. For example, it contains information about the type of crimes committed, which allows the assessment of crime’s diversity. In addition, it also contains information about the methods used, the time the crime was committed, or the location.
#importing the dataset and the libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
crime<-read_csv("Crime_Incidents_in_2025.csv")
## Rows: 19311 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): REPORT_DAT, SHIFT, METHOD, OFFENSE, BLOCK, ANC, NEIGHBORHOOD_CLUST...
## dbl (11): X, Y, CCN, XBLOCK, YBLOCK, WARD, DISTRICT, PSA, LATITUDE, LONGITUD...
## lgl (1): OCTO_RECORD_ID
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Key variables include: - Offense: The type of crime committed(character) - Method: The method/weapon used to commit the crime(character) - Shift: The time frame the crime was committed(day/evening/midnight, character) - start date: The time actions were initiated(character) - end date : The time actions ended(character)
Analyzing the distribution of crimes
#adjusting the data to remove undesired rows
crime_adjusted<- crime |>
filter(year(crime$START_DATE)!=2024)
#categorizing the offenses and evaluating their proportions
crime_factor<- factor(crime_adjusted$OFFENSE)
prop.table(table(crime_factor))
## crime_factor
## ARSON ASSAULT W/DANGEROUS WEAPON
## 0.0002091285 0.0366497621
## BURGLARY HOMICIDE
## 0.0292257019 0.0055419041
## MOTOR VEHICLE THEFT ROBBERY
## 0.1739948764 0.0562032729
## SEX ABUSE THEFT F/AUTO
## 0.0036597480 0.2293616354
## THEFT/OTHER
## 0.4651539708
With a percentage of 46,52%, theft/other is the most common crime in the D.C. area, followed by theft f/auto with 22,94%, and motor vehicle theft with 17,40% .
Evaluation of the methods used
#evaluation of the method used to commit these crimes
method<- factor(crime$METHOD)
table(method)
## method
## GUN KNIFE OTHERS
## 1082 308 17921
barplot(table(method))
Only 308 of the crimes involved the used of a knife, while 1082 involved guns, and the rest includes others methods.
understanding the influence of time on the rate of criminality in the area
#replacing inappropriate data
crime_adjusted$SHIFT<- gsub("MIDNIGHT" , "NIGHT", crime_adjusted$SHIFT)
crime_adjusted$SHIFT<- gsub("EVENING" , "NIGHT", crime_adjusted$SHIFT)
#evaluating the frequency of each shift category
shift_factor<- factor(crime_adjusted$SHIFT)
d_n_freq<-table(shift_factor)
d_n_freq
## shift_factor
## DAY NIGHT
## 7971 11156
It is to understand that a wide majority of the crimes are committed during night time.
Determining whether or not crimes are more likely to be committed on the weekend
crime_adjusted<- crime_adjusted |>
mutate(DAY=wday(crime_adjusted$START_DATE, label=T))
table(crime_adjusted$DAY)
##
## Sun Mon Tue Wed Thu Fri Sat
## 2676 2833 2729 2649 2659 2856 2725
ggplot(crime_adjusted, aes(x = DAY)) +
geom_bar() +
labs(
title = "crime rate distribution accross days of the week",
x = "Days",
y = "Count",
) +
theme_minimal()
Conclusion
Between January and august of the current year, theft has been the most common crime in the D.C. area. Divided into different categories, it includes theft involving auto(theft f/ auto), which occupies 46,52% of the general distribution, theft that involves other materials(theft/others), which represents 22,94% of the general distribution, and theft of motor vehicles, which represents 17,4% of the total distribution. Overall, theft represents 86,86% of the crimes. Besides, there is also robbery with 5,62%, burglary with 2,92%, assault with dangerous weapon with 3,66%, and more. However, the commission of these crimes does not generally involve the use of guns or knives. Out of approximately 19,000 records, more than 17,000 did not include the use of weaponry like guns or knives.
Additionally, after thorough analysis, it is to understand that most of the offenses reported were responded by the night shift crews, Which brings us to comprehend that the D.C. area might be safer during day time. Moreover, It does not matter if it is weekdays or the weekend. The crime rate is not really influenced by the days of the week, for the frequency is roughly similar across all days.