Data sources
All property crimes over past 5 years (type, place, time, …) from Metropolitan Police Department (https://crimecards.dc.gov/all:property%20crimes/all:weapons/5:years/citywide:point)
Alcohol License Business Locations from Open Data DC (https://opendata.dc.gov/datasets/cabe9dcef0b344518c7fae1a3def7de1_5/explore?location=38.905371%2C-77.018682%2C11.82)
Homeless Service Facilities from Open Data DC (https://opendata.dc.gov/datasets/47be87a68e7a4376a3bdbe15d85de398_6/explore?location=8.913026%2C104.271644%2C0.92)
Temperature in D.C. from Visualcrossing (https://www.kaggle.com/datasets/taweilo/washington-dc-historical-weather-20158202407)
ggplot(df_summary, aes(x = YEAR, y = incidents, group = offense_grouped)) +
geom_line(aes(color = color_group), size = 1.2) +
geom_point(aes(color = color_group), size = 2.5) +
scale_color_manual(
values = c("theft" = "lightcoral", "other" = "gray70"),
labels = c("theft" = "Theft", "other" = "Other")
) +
scale_x_continuous(breaks = 2020:2024) +
labs(
title = "Yearly Crime Trends (2020~2024) in DC",
subtitle = "Theft highlighted in red, other crimes in gray",
x = "Year", y = "Number of Incidents", color = "Offense Type"
) +
theme_minimal(base_size = 14)
This line graph was created using ggplot2, It visualizes yearly crime trends from 2020 to 2024. The data is grouped into two categories: “theft” (highlighted in red) and “other” crimes (shown in gray). It clearly shows that theft incidents are significantly higher than other types of crimes throughout all five years. In particular, there is a notable peak in theft cases in 2023. The steady and relatively low trend of other crimes contrasts with the fluctuation in theft, emphasizing the dominance and variability of theft-related offenses in the dataset.
ggplot(df_counts, aes(x = reorder(offense_label, count), y = count, fill = color_group)) +
geom_col(show.legend = FALSE) +
scale_fill_manual(values = custom_color) +
coord_flip() +
labs(title = "Bar Chart by Offense",
subtitle = "2020~2024 Total Counts",
x = NULL, y = "Number of Incidents") +
theme_minimal(base_size = 14)
This horizontal bar chart shows the total number of incidents by offense type from 2020 to 2024. The chart uses light coral to highlight theft, while all other crimes are shown in gray. The graph makes it immediately clear that theft overwhelmingly dominates all other crime categories, with over 120,000 incidents. Offenses like robbery, assault, and burglary appear much lower in comparison. This visual strongly emphasizes that theft is the most frequent and persistent crime in during 2020 to 2024 period.
ggplot(merged, aes(x = year_month)) +
geom_line(aes(y = rollmean(crime_count, k = 6, fill = NA), color = "Theft Crimes"), linewidth = 1.2) +
geom_line(aes(y = avg_temp * 60, color = "Temperature"), linewidth = 1) +
scale_y_continuous(
name = "Number of Theft Crimes",
sec.axis = sec_axis(~./60, name = "Temperature (°C)")
) +
scale_x_date(
date_breaks = "1 year",
date_labels = "%Y",
limits = as.Date(c("2020-07-15", "2024-07-31"))
) +
scale_color_manual(values = c("Theft Crimes" = "skyblue", "Temperature" = "lightcoral")) +
labs(title = "Monthly Theft vs Temperature (2020~2024)",
x = "Date", color = "") +
theme_minimal(base_size = 14) +
theme(
axis.title.y = element_text(color = "skyblue"),
axis.title.y.right = element_text(color = "lightcoral"),
legend.position = "right"
)
Dual-axis line graph visualizes the relationship between monthly theft crimes (in sky blue) and average temperature (in light coral) from 2020 to 2024 in DC. The graph shows that theft incidents tend to follow the temperature. This repeating seasonal pattern suggests a positive correlation between temperature and theft activity, with crime rates peak in hotter periods and dipping during colder periods. By combining crime and weather data, this plot offers insight into how environmental factors may influence criminal behavior.
DC in 2022
img <- readPNG("2023crime.png")
grid.raster(img)
This shows why crime rate increases after 2022 winter.
ggplot(time_summary, aes(x = factor(hour_group), y = count, fill = fill_color)) +
geom_bar(data = plate, aes(x = hour_group, y = count),
stat = "identity", width = 1,
fill = "gray95", color = "black", size = 0.5, alpha = 0.6,
inherit.aes = FALSE) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar(start = 0, direction = 1) +
scale_fill_manual(values = custom_colors) +
geom_segment(data = peak,
aes(x = factor(hour_group), xend = factor(hour_group),
y = 0, yend = max(count) * 0.8),
inherit.aes = FALSE,
linewidth = 0.7, color = "black",
arrow = arrow(length = unit(10, "pt"), type = "closed")) +
geom_segment(data = mini,
aes(x = factor(hour_group), xend = factor(hour_group),
y = 0, yend = max(count) * 1.0),
inherit.aes = FALSE,
linewidth = 0.7, color = "black",
arrow = arrow(length = unit(10, "pt"), type = "closed")) +
geom_text(aes(label = label, y = max(count) + 6000), size = 4, vjust = 0.5) +
theme_void() +
labs(title = "3-Hour Interval Theft Clock (2020~2024)",
fill = "Theft Frequency Level") +
geom_text(data = peak,
aes(x = factor(hour_group), y = max(count) * 0.9, label = label_arrow),
inherit.aes = FALSE,
size = 4, fontface = "bold", vjust = -0.5) +
geom_text(data = mini,
aes(x = factor(hour_group), y = max(count) * 1.3, label = label_arrow),
inherit.aes = FALSE,
size = 4, fontface = "bold", vjust = -0.5)
This circular plot, created using the ggclock package, This plot shows theft frequencies by 3-hour intervals throughout the day. The plot is designed like a clock, where each colored wedge represents a 3-hour time block. The long black arrow (minute hand) points to the time with the highest number of thefts (15–18), categorized as “Very High” in red. The shorter arrow (hour hand) indicates the least risky time (03–06), shown in green as “Low”. This visualization makes it clear that afternoons between 3–6 PM are the most dangerous, while early morning hours are the safest in terms of theft activity.
ggplot(merged, aes(x = Var2, y = Var1, fill = cor_value)) +
geom_tile(color = "white") +
geom_text(aes(label = label), size = 5) +
scale_fill_gradient2(
low = "#b2182b", mid = "white", high = "#2166ac",
midpoint = 0, limit = c(-1, 1), name = "Correlation"
) +
theme_minimal(base_size = 14) +
labs(title = "Correlation Matrix with P-values",
x = NULL, y = NULL) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This correlation heatmap shows the relationship between three variables: alcohol business count, homeless facility count, and theft incident count. Each tile represents the Pearson correlation coefficient between two variables, with corresponding p-values included to assess statistical significance.
Key Relationships:
Theft vs Alcohol Count
Correlation: 0.71
p-value: 3.6e-33
➤ This shows a strong positive correlation between theft and the number of alcohol-related facilities. The extremely low p-value indicates this result is highly statistically significant.
Theft vs Homeless Count
Correlation: 0.36
p-value: 8.5e-08
➤ This indicates a moderate positive correlation. The p-value is much less than 0.05, so this relationship is also statistically significant.
Homeless vs Alcohol Count
Correlation: 0.22
p-value: 0.0017
➤ This is a weak to moderate positive correlation, but again, the p-value suggests it is statistically significant.
Conclusion: Since all p-values are well below 0.05, you can confidently say that all three relationships are statistically significant. The strongest association is between theft and alcohol locations, which may suggest that areas with more alcohol-related businesses tend to experience more thefts.
# leaflet
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addMiniMap(toggleDisplay = TRUE) %>%
addControl("<strong>Top 25 and Bottom 25 Homeless Facilities by Theft Count (500m Buffer) (2020~2024)</strong>", position = "topright") %>%
# buffer
addPolygons(data = both_buffers,
fillColor = ~pal(theft_count),
color = "black",
weight = 1,
opacity = 1,
fillOpacity = 0.6,
popup = ~paste0("<strong>Facility:</strong> ", PROGRAM_NA,
"<br><strong>Theft Count:</strong> ", theft_count)) %>%
addCircleMarkers(data = facility_points,
radius = 4,
color = "white",
weight = 1,
fillColor = "black",
fillOpacity = 0.5,
popup = ~paste0("<strong>Facility:</strong> ", PROGRAM_NA)) %>%
addLegend("bottomright",
pal = pal,
values = both_buffers$theft_count,
title = "Theft Count",
opacity = 0.7)
# leaflet
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addMiniMap(toggleDisplay = TRUE) %>%
addControl("<strong>Top 25 and Bottom 25 Alcohol shops by Theft Count (500m Buffer) (2020~2024)</strong>", position = "topright") %>%
# buffer
addPolygons(data = both_alcohol,
fillColor = ~pal(theft_count),
color = "black",
weight = 1,
opacity = 1,
fillOpacity = 0.3,
popup = ~paste0("<strong>Facility:</strong> ", TRADE_NAME,
"<br><strong>Theft Count:</strong> ", theft_count)) %>%
addCircleMarkers(data = alcohol_facility_points,
radius = 4,
color = "white",
weight = 1,
fillColor = "black",
fillOpacity = 0.5,
popup = ~paste0("<strong>Facility:</strong> ", TRADE_NAME)) %>%
addLegend("bottomright",
pal = pal,
values = both_alcohol$theft_count,
title = "Theft Count",
opacity = 0.7)
Map Explanation & Analysis
This set of interactive maps displays theft activity around homeless facilities and alcohol-related businesses in Washington, D.C., using 500-meter buffer zones. For each location, the total number of thefts that occurred within the buffer between 2020 and 2024 is visualized through color intensity:
-Darker shades represent higher theft counts. -The legend on the right shows theft count ranges corresponding to the color.
Homeless Facilities:
The central downtown D.C. area shows the highest theft activity near homeless facilities.
The facility with the highest theft count is Legal Assistance Project, with 5,364 incidents.
Many facilities are located close to each other, causing their buffers to overlap, which highlights the area as a high-risk cluster.
Alcohol-Related Businesses:
The highest theft counts are concentrated in the U Street NW and 11th Street NW area.
According to the Google Map, this area includes Mama San, a nightclub with the highest count of 5,713 thefts.
This location is known for nightlife and includes popular venues like 930 Club, Flash, Black Cat, and Busboys and Poets—likely contributing to increased theft incidents in the area.
Advantages of Buffer-Based Visualization:
Clear Spatial Risk Detection → Makes it easy to identify high-theft areas at a glance using visual clues .
Combination of Quantitative and Spatial Analysis → Shows the geographic distribution of theft along with the numeric scale of risk using color and location of the buffer.
Limitations of Buffer Visualization:
Overlapping Buffers → In densely clustered areas, buffers overlap heavily, making it difficult to distinguish which location is responsible for the high theft count.
Proximity Doesn’t Always Mean Causality → Buffers are based on distance only. They don’t explain why theft occurs. Other factors like metro stations, public parks, or nightlife spots might also fall within the buffer.
This visualization clearly highlights that homeless facilities in central D.C. and alcohol-related businesses in the U Street NW nightlife district are surrounded by high levels of theft incidents. The Legal Assistance Project and Mama San Nightclub are the most high-risk locations based on theft counts.