Data sources

Crime by Offense Type

ggplot(df_counts, aes(x = reorder(offense_label, count), y = count, fill = color_group)) +
  geom_col(show.legend = FALSE) +
  scale_fill_manual(values = custom_color) +
  coord_flip() +
  labs(title = "Bar Chart by Offense",
       subtitle = "2020~2024 Total Counts",
       x = NULL, y = "Number of Incidents") +
  theme_minimal(base_size = 14)

This horizontal bar chart shows the total number of incidents by offense type from 2020 to 2024. The chart uses light coral to highlight theft, while all other crimes are shown in gray. The graph makes it immediately clear that theft overwhelmingly dominates all other crime categories, with over 120,000 incidents. Offenses like robbery, assault, and burglary appear much lower in comparison. This visual strongly emphasizes that theft is the most frequent and persistent crime in during 2020 to 2024 period.

Theft vs. Temperature (Monthly)

ggplot(merged, aes(x = year_month)) +
  geom_line(aes(y = rollmean(crime_count, k = 6, fill = NA), color = "Theft Crimes"), linewidth = 1.2) +
  geom_line(aes(y = avg_temp * 60, color = "Temperature"), linewidth = 1) +
  scale_y_continuous(
    name = "Number of Theft Crimes",
    sec.axis = sec_axis(~./60, name = "Temperature (°C)")
  ) +
  scale_x_date(
    date_breaks = "1 year",
    date_labels = "%Y",
    limits = as.Date(c("2020-07-15", "2024-07-31"))
  ) +
  scale_color_manual(values = c("Theft Crimes" = "skyblue", "Temperature" = "lightcoral")) +
  labs(title = "Monthly Theft vs Temperature (2020~2024)",
       x = "Date", color = "") +
  theme_minimal(base_size = 14) +
  theme(
    axis.title.y = element_text(color = "skyblue"),
    axis.title.y.right = element_text(color = "lightcoral"),
    legend.position = "right"
  )

Dual-axis line graph visualizes the relationship between monthly theft crimes (in sky blue) and average temperature (in light coral) from 2020 to 2024 in DC. The graph shows that theft incidents tend to follow the temperature. This repeating seasonal pattern suggests a positive correlation between temperature and theft activity, with crime rates peak in hotter periods and dipping during colder periods. By combining crime and weather data, this plot offers insight into how environmental factors may influence criminal behavior.

DC in 2022

img <- readPNG("2023crime.png")
grid.raster(img)

https://www.washingtonpost.com/dc-md-va/interactive/2024/dc-crime-homicide-victims-shooting-violence/

This shows why crime rate increases after 2022 winter.

Theft Clock Plot

ggplot(time_summary, aes(x = factor(hour_group), y = count, fill = fill_color)) +
  geom_bar(data = plate, aes(x = hour_group, y = count),
           stat = "identity", width = 1,
           fill = "gray95", color = "black", size = 0.5, alpha = 0.6,
           inherit.aes = FALSE) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(start = 0, direction = 1) +
  scale_fill_manual(values = custom_colors) +
  geom_segment(data = peak,
               aes(x = factor(hour_group), xend = factor(hour_group),
                   y = 0, yend = max(count) * 0.8),
               inherit.aes = FALSE,
               linewidth = 0.7, color = "black",
               arrow = arrow(length = unit(10, "pt"), type = "closed")) +
  geom_segment(data = mini,
               aes(x = factor(hour_group), xend = factor(hour_group),
                   y = 0, yend = max(count) * 1.0),
               inherit.aes = FALSE,
               linewidth = 0.7, color = "black",
               arrow = arrow(length = unit(10, "pt"), type = "closed")) +
  geom_text(aes(label = label, y = max(count) + 6000), size = 4, vjust = 0.5) +
  theme_void() +
  labs(title = "3-Hour Interval Theft Clock (2020~2024)",
       fill = "Theft Frequency Level") +
  geom_text(data = peak,
            aes(x = factor(hour_group), y = max(count) * 0.9, label = label_arrow),
            inherit.aes = FALSE,
            size = 4, fontface = "bold", vjust = -0.5) +
  geom_text(data = mini,
            aes(x = factor(hour_group), y = max(count) * 1.3, label = label_arrow),
            inherit.aes = FALSE,
            size = 4, fontface = "bold", vjust = -0.5)

This circular plot, created using the ggclock package, This plot shows theft frequencies by 3-hour intervals throughout the day. The plot is designed like a clock, where each colored wedge represents a 3-hour time block. The long black arrow (minute hand) points to the time with the highest number of thefts (15–18), categorized as “Very High” in red. The shorter arrow (hour hand) indicates the least risky time (03–06), shown in green as “Low”. This visualization makes it clear that afternoons between 3–6 PM are the most dangerous, while early morning hours are the safest in terms of theft activity.

Correlation

ggplot(merged, aes(x = Var2, y = Var1, fill = cor_value)) +
  geom_tile(color = "white") +
  geom_text(aes(label = label), size = 5) +
  scale_fill_gradient2(
    low = "#b2182b", mid = "white", high = "#2166ac",
    midpoint = 0, limit = c(-1, 1), name = "Correlation"
  ) +
  theme_minimal(base_size = 14) +
  labs(title = "Correlation Matrix with P-values",
       x = NULL, y = NULL) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This correlation heatmap shows the relationship between three variables: alcohol business count, homeless facility count, and theft incident count. Each tile represents the Pearson correlation coefficient between two variables, with corresponding p-values included to assess statistical significance.

Key Relationships:

Theft vs Alcohol Count

Correlation: 0.71

p-value: 3.6e-33

➤ This shows a strong positive correlation between theft and the number of alcohol-related facilities. The extremely low p-value indicates this result is highly statistically significant.

Theft vs Homeless Count

Correlation: 0.36

p-value: 8.5e-08

➤ This indicates a moderate positive correlation. The p-value is much less than 0.05, so this relationship is also statistically significant.

Homeless vs Alcohol Count

Correlation: 0.22

p-value: 0.0017

➤ This is a weak to moderate positive correlation, but again, the p-value suggests it is statistically significant.

Conclusion: Since all p-values are well below 0.05, you can confidently say that all three relationships are statistically significant. The strongest association is between theft and alcohol locations, which may suggest that areas with more alcohol-related businesses tend to experience more thefts.

Buffer Map - Homeless Facility vs theft

# leaflet 
leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  addMiniMap(toggleDisplay = TRUE) %>%
  addControl("<strong>Top 25 and Bottom 25 Homeless Facilities by Theft Count (500m Buffer) (2020~2024)</strong>", position = "topright") %>%
  
  # buffer 
  addPolygons(data = both_buffers,
              fillColor = ~pal(theft_count),
              color = "black",
              weight = 1,
              opacity = 1,
              fillOpacity = 0.6,
              popup = ~paste0("<strong>Facility:</strong> ", PROGRAM_NA,
                              "<br><strong>Theft Count:</strong> ", theft_count)) %>%
  
  addCircleMarkers(data = facility_points,
                   radius = 4,
                   color = "white",
                   weight = 1,
                   fillColor = "black",
                   fillOpacity = 0.5,
                   popup = ~paste0("<strong>Facility:</strong> ", PROGRAM_NA)) %>%
  
  addLegend("bottomright",
            pal = pal,
            values = both_buffers$theft_count,
            title = "Theft Count",
            opacity = 0.7)

Buffer Map - Alcohol business vs theft

# leaflet 
leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  addMiniMap(toggleDisplay = TRUE) %>%
  addControl("<strong>Top 25 and Bottom 25 Alcohol shops by Theft Count (500m Buffer) (2020~2024)</strong>", position = "topright") %>%
  
  # buffer 
  addPolygons(data = both_alcohol,
              fillColor = ~pal(theft_count),
              color = "black",
              weight = 1,
              opacity = 1,
              fillOpacity = 0.3,
              popup = ~paste0("<strong>Facility:</strong> ", TRADE_NAME,
                              "<br><strong>Theft Count:</strong> ", theft_count)) %>%
  
  
  addCircleMarkers(data = alcohol_facility_points,
                   radius = 4,
                   color = "white",
                   weight = 1,
                   fillColor = "black",
                   fillOpacity = 0.5,
                   popup = ~paste0("<strong>Facility:</strong> ", TRADE_NAME)) %>%
  
  
  addLegend("bottomright",
            pal = pal,
            values = both_alcohol$theft_count,
            title = "Theft Count",
            opacity = 0.7)

Map Explanation & Analysis

This set of interactive maps displays theft activity around homeless facilities and alcohol-related businesses in Washington, D.C., using 500-meter buffer zones. For each location, the total number of thefts that occurred within the buffer between 2020 and 2024 is visualized through color intensity:

-Darker shades represent higher theft counts. -The legend on the right shows theft count ranges corresponding to the color.

Homeless Facilities:

The central downtown D.C. area shows the highest theft activity near homeless facilities.

The facility with the highest theft count is Legal Assistance Project, with 5,364 incidents.

Many facilities are located close to each other, causing their buffers to overlap, which highlights the area as a high-risk cluster.

Alcohol-Related Businesses:

The highest theft counts are concentrated in the U Street NW and 11th Street NW area.

According to the Google Map, this area includes Mama San, a nightclub with the highest count of 5,713 thefts.

This location is known for nightlife and includes popular venues like 930 Club, Flash, Black Cat, and Busboys and Poets—likely contributing to increased theft incidents in the area.

Advantages of Buffer-Based Visualization:

Clear Spatial Risk Detection → Makes it easy to identify high-theft areas at a glance using visual clues .

Combination of Quantitative and Spatial Analysis → Shows the geographic distribution of theft along with the numeric scale of risk using color and location of the buffer.

Limitations of Buffer Visualization:

Overlapping Buffers → In densely clustered areas, buffers overlap heavily, making it difficult to distinguish which location is responsible for the high theft count.

Proximity Doesn’t Always Mean Causality → Buffers are based on distance only. They don’t explain why theft occurs. Other factors like metro stations, public parks, or nightlife spots might also fall within the buffer.

This visualization clearly highlights that homeless facilities in central D.C. and alcohol-related businesses in the U Street NW nightlife district are surrounded by high levels of theft incidents. The Legal Assistance Project and Mama San Nightclub are the most high-risk locations based on theft counts.