my_data <- Crime_Incidents_in_2015
small_data <- my_data[1:100, ]
head(small_data)
## # A tibble: 6 × 25
## X Y CCN REPORT_DAT START_DATE END_DATE BLOCK OFFENSE METHOD SHIFT
## <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 401701 136862 11105… 2015/08/1… 2011/07/2… 2011/07… 1516… SEX AB… KNIFE MIDN…
## 2 396523 137847 13134… 2015/07/2… 2013/09/1… 2013/09… 1700… SEX AB… OTHERS MIDN…
## 3 396535 140772 14174… 2015/01/2… 2014/11/0… 2014/11… 3400… THEFT … OTHERS DAY
## 4 400121 137998 14191… 2015/03/2… 2014/12/1… 2014/12… 1300… SEX AB… OTHERS MIDN…
## 5 397628 140648 14192… 2015/01/1… 2014/12/1… 2014/12… 3500… SEX AB… OTHERS MIDN…
## 6 405483 136075 15000… 2015/01/0… 2015/01/0… 2015/01… 200 … ASSAUL… GUN EVEN…
## # ℹ 15 more variables: WARD <dbl>, ANC <chr>, DISTRICT <dbl>, PSA <dbl>,
## # NEIGHBORHOOD_CLUSTER <chr>, BLOCK_GROUP <chr>, CENSUS_TRACT <chr>,
## # VOTING_PRECINCT <chr>, BID <chr>, XBLOCK <dbl>, YBLOCK <dbl>,
## # LATITUDE <dbl>, LONGITUDE <dbl>, OBJECTID <dbl>, OCTO_RECORD_ID <lgl>
kable(head('Crime Incidents in 2015', 20), caption = "First 20 Rows of My Data")
| x |
|---|
| Crime Incidents in 2015 |
library(tidyverse)
small_data %>%
count(OFFENSE) %>%
ggplot(aes(x = reorder(OFFENSE, n), y = n, fill = OFFENSE)) +
geom_bar(stat = "identity") +
coord_flip() + # Makes the long crime names easier to read
labs(title = "Distribution of Crime Incidents by Offense Type",
x = "Offense Type",
y = "Count") +
theme_minimal() +
guides(fill = "none")
Looking at this chart, we see that certain offenses like ‘Theft’ or ‘Sex
Abuse’ are captured in this initial sample. This distribution allows us
to understand which types of crimes were most frequently reported in the
early part of 2015.
small_data %>%
count(METHOD) %>%
arrange(desc(n)) %>%
ggplot(aes(x = reorder(METHOD, n), y = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Distribution of Crime Incidents by Method",
x = "Method Used",
y = "Count") +
theme_minimal()
This chart shows the methods used to commit the recorded offenses. By
identifying the primary methods—whether involving weapons or other
means—we can assess the severity and nature of the incidents in this
dataset.
small_data %>%
count(WARD) %>%
ggplot(aes(x = factor(WARD), y = n)) +
geom_bar(stat = "identity", fill = "orange") +
labs(title = "Number of Incidents by City Ward",
x = "Ward Number",
y = "Number of Incidents") +
theme_minimal()
By looking at the distribution across Wards, we can identify geographic
clusters. This sample suggests that some Wards may have higher reported
incident rates than others, which could indicate areas requiring more
public safety resources.
small_data %>%
count(SHIFT) %>%
ggplot(aes(x = SHIFT, y = n, fill = SHIFT)) +
geom_bar(stat = "identity") +
labs(title = "Crime Incidents by Time Shift",
x = "Shift",
y = "Count") +
theme_minimal()
This bar chart breaks down the reported incidents by the shift during
which they were recorded (Day, Evening, or Midnight). By analyzing the
workload across these time frames, we can identify which periods of the
day experience the highest volume of reported activity. This is
essential for understanding the operational demands placed on law
enforcement at different times of the day.
# 5. Monthly Trend of Crime Incidents
small_data %>%
mutate(Month = floor_date(as.Date(REPORT_DAT), "month")) %>%
count(Month) %>%
ggplot(aes(x = Month, y = n)) +
geom_line(color = "darkred", size = 1) +
geom_point(color = "darkred", size = 2) +
labs(title = "Monthly Trend of Crime Incidents (Sample)",
x = "Month of 2015",
y = "Number of Incidents") +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The line graph illustrates the temporal progression of crime reports
throughout the year within our sample. By tracking incidents on a
month-to-month basis, we can identify seasonal trends or specific
periods of heightened activity. This longitudinal view is vital for
determining whether crime reporting is stable or if it fluctuates
significantly based on the time of year.
small_data %>%
count(NEIGHBORHOOD_CLUSTER) %>%
filter(!is.na(NEIGHBORHOOD_CLUSTER)) %>%
slice_max(n, n = 10) %>%
ggplot(aes(x = reorder(NEIGHBORHOOD_CLUSTER, n), y = n, fill = NEIGHBORHOOD_CLUSTER)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Neighborhood Clusters by Incident Count",
subtitle = "Based on the first 100 recorded incidents",
x = "Neighborhood Cluster",
y = "Number of Incidents") +
theme_minimal() +
guides(fill = "none")
While the Ward-level analysis provides a broad geographic overview, this
chart offers a more granular look at crime distribution by identifying
the top 10 Neighborhood Clusters with the most reported incidents.
Pinpointing these specific ‘hotspots’ allows for a more localized
understanding of where public safety challenges are most concentrated
within the city’s various communities.
small_data %>%
ggplot(aes(x = OFFENSE, fill = SHIFT)) +
geom_bar(position = "stack") +
coord_flip() +
labs(title = "Crime Type Distribution Across Shifts",
x = "Offense Type",
y = "Count of Incidents",
fill = "Shift Time") +
theme_minimal() +
scale_fill_brewer(palette = "Set2")
This visualization examines the intersection between the nature of the
offense and the time of day. By stacking the shifts within each offense
category, we can see if certain crimes are disproportionately occurring
during specific hours—for instance, determining if ‘Theft’ is more
prevalent during the day versus the midnight shift. This insight is
crucial for developing shift-specific prevention and response
strategies.