This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the plot.
I’ll write these assuming you selected THEFT (since it
was the most common and used in your code). If you chose a different
crime, tell me and I’ll adjust the wording.
library(tidyverse) library(lubridate) library(sf) library(ggspatial) library(janitor) library(tigris)
options(tigris_use_cache = TRUE) crime <- read_csv(“your_file_name.csv”) # View number of rows and columns dim(crime)
names(crime)
print(head(crime, 10)) # Clean column names (lowercase + underscores) crime <- crime %>% clean_names()
names(crime) primary_type sum(duplicated(crime)) crime %>% count(primary_type, sort = TRUE) %>% print(n = Inf) chi_boundary <- places(state = “IL”) %>% filter(NAME == “Chicago”) %>% st_transform(4326) crime_arson_sf <- crime %>% filter( primary_type == “ARSON”, !is.na(longitude), !is.na(latitude) ) %>% st_as_sf( coords = c(“longitude”, “latitude”), crs = 4326 ) bb <- sf::st_bbox(chi_boundary)
crime_crop <- sf::st_crop(crime_arson_sf, bb)
ggplot() + annotation_map_tile(type = “cartolight”, zoomin = 0) + geom_sf(data = chi_boundary, fill = NA, linewidth = 0.3) + geom_sf(data = crime_crop, size = 0.3, alpha = 0.6, color = “red”) + labs( title = “Chicago Arson Points 2025–Present” ) + theme_void() “community” “area” “shape_area” “perimeter” “area_num_1” “area_numbe” “comarea_id” “comarea” “shape_len” “geometry” crime_by_ca <- crime %>% filter(primary_type == “ARSON”, !is.na(community_area)) %>% mutate(community_area = as.integer(community_area)) %>% count(community_area, name = “crime_n”) if (“area_numbe” %in% names(comm_areas_sf)) { ca_id_col <- “area_numbe” } else { candidates <- names(comm_areas_sf) %>% keep(~ str_detect(.x, “area”) && !str_detect(.x, “name|comm|community|nei gh”)) ca_id_col <- candidates[1] if (is.na(ca_id_col)) stop(“Could not find a community area id column in co mm_areas_sf.”) } comm_areas_choro <- comm_areas_sf %>% mutate(community_area = as.integer(area_num_1)) %>% left_join(crime_by_ca, by = “community_area”) %>% mutate(crime_n = tidyr::replace_na(crime_n, 0)) ggplot() + annotation_map_tile(type = “cartolight”, zoomin = 0) + geom_sf( data = comm_areas_choro, aes(fill = crime_n), color = “white”, linewidth = 0.15, alpha = 0.85 ) + geom_sf( data = chi_boundary, fill = NA, color = “grey30”, linewidth = 0.3 ) + coord_sf(expand = FALSE) + labs( title = “Arson Incidents by Community Area in Chicago, 2025–Present”, fill = “Arson” ) + theme_void() area_lookup <- comm_areas_sf %>% st_drop_geometry() %>% transmute( community_area = as.integer(area_num_1), area_name = community )
crime_by_area <- crime_by_ca %>% left_join(area_lookup, by = “community_area”) %>% select(area_name, crime_n) %>% arrange(desc(crime_n))
crime_by_area %>% head() population <- read_csv(file.choose()) comm_areas_choro <- comm_areas_choro %>% left_join(population, by = “community”) %>% mutate(rate_per_10k = (crime_n / population) * 10000)
crime_rate_table <- comm_areas_choro %>% st_drop_geometry() %>% select( area_name = community, population, rate_per_10k ) %>% arrange(desc(rate_per_10k)) %>% mutate(rate_per_10k = round(rate_per_10k, 2))
head(crime_rate_table) 17.2 Do the areas with the highest crime rates form geographic clusters, or are they scattered? What might explain this pattern?
The areas with the highest arson crime rates tend to form geographic clusters rather than being completely scattered across Chicago. These higher-rate areas appear concentrated in certain parts of the city rather than evenly distributed. This clustering may be explained by neighborhood-level factors such as economic disadvantage, housing vacancy rates, population density, and differences in property conditions. Areas with more abandoned buildings or economic instability may experience higher arson rates due to property neglect, insurance fraud, or retaliatory behavior.
17.3 What factors might explain why your selected crime is more common in certain areas?
Arson may be more common in certain community areas due to socioeconomic and environmental factors. Neighborhoods with higher poverty rates, vacant properties, or aging infrastructure may present more opportunities for arson. Additionally, areas with higher residential turnover or lower community cohesion may experience less informal social control, making criminal activity more likely. Limited resources for property maintenance and policing may also contribute to higher rates in certain neighborhoods.
17.4 What are two limitations of using reported crime data to understand crime patterns?
One limitation of reported crime data is underreporting. Not all crimes are reported to the police, which means official data may underestimate the true level of crime. A second limitation is differences in enforcement and policing practices. Some areas may have more police presence, leading to higher recorded crime rates even if actual crime levels are similar elsewhere. Therefore, reported crime data may reflect patterns of reporting and enforcement rather than actual criminal behavior alone.
17.5 Which map (point map or choropleth) was more useful for understanding the pattern of crime? Why?
The choropleth map was more useful for understanding overall crime patterns because it clearly showed which community areas had the highest rates relative to population size. While the point map displayed the exact locations of individual incidents, it was harder to identify broader patterns or compare areas. The choropleth map allowed for easier identification of high-rate clusters and made comparisons between neighborhoods more straightforward.