This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
library(tidyverse)
library(sf)
library(ggspatial)
library(janitor)
library(tigris)
options(tigris_use_cache = TRUE)
cat("Select the Chicago Crime CSV first\n")
## Select the Chicago Crime CSV first
crime <- read_csv(file.choose(), show_col_types = FALSE) %>%
clean_names()
cat("Select the Population CSV second\n")
## Select the Population CSV second
population <- read_csv(file.choose(), show_col_types = FALSE)
if(!"primary_type" %in% names(crime)){
stop("You selected the wrong file first. Please choose the Chicago crime dataset.")
}
chi_boundary <- places(state = "IL") %>%
filter(NAME == "Chicago") %>%
st_transform(4326)
crime_theft_sf <- crime %>%
filter(primary_type == "THEFT",
!is.na(longitude),
!is.na(latitude)) %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
bb <- st_bbox(chi_boundary)
crime_crop <- st_crop(crime_theft_sf, bb)
ggplot() +
annotation_map_tile(type = "cartolight") +
geom_sf(data = chi_boundary, fill = NA) +
geom_sf(data = crime_crop, size = 0.3, alpha = 0.6, color = "red") +
theme_void()
comm_areas_sf <- st_read( “https://raw.githubusercontent.com/RandomFractals/ChicagoCrimes/master/data/chicago-community-areas.geojson”,
quiet = TRUE ) %>% st_transform(4326) %>% clean_names()
crime_by_ca <- crime %>% filter(primary_type == “THEFT”, !is.na(community_area)) %>% mutate(community_area = as.integer(community_area)) %>% count(community_area, name = “crime_n”)
comm_areas_choro <- comm_areas_sf %>% mutate(community_area = as.integer(area_numbe)) %>% left_join(crime_by_ca, by = “community_area”) %>% mutate(crime_n = replace_na(crime_n, 0))
ggplot() + annotation_map_tile(type = “cartolight”) + geom_sf(data = comm_areas_choro, aes(fill = crime_n), color = “white”, linewidth = 0.15) + geom_sf(data = chi_boundary, fill = NA, color = “grey30”, linewidth = 0.3) + labs(title = “Theft Incidents by Community Area in Chicago (2025–Present)”, fill = “Theft Count”) + theme_void()
comm_areas_choro <- comm_areas_choro %>% left_join(population, by = “community”) %>% mutate(rate_per_10k = (crime_n / population) * 10000)
crime_rate_table <- comm_areas_choro %>% st_drop_geometry() %>% select(area_name = community, population, rate_per_10k) %>% arrange(desc(rate_per_10k)) %>% mutate(rate_per_10k = round(rate_per_10k, 2))
head(crime_rate_table) 17.1
The community area with the highest count of theft offenses was the area with the largest total number of reported incidents. However, the area with the highest rate per 10,000 residents was different. This discrepancy shows that areas with larger populations tend to have higher raw crime counts simply because more people live there. In contrast, crime rates account for population size and reveal where crime is proportionally more concentrated. This suggests that although one area may have the most incidents overall, another area may experience theft more intensely relative to its population.
17.2
The areas with the highest crime rates tend to form geographic clusters rather than being randomly scattered. High-rate areas are often located near commercial corridors, transportation hubs, or densely populated neighborhoods. Clustering may also reflect underlying social and economic similarities between neighboring areas, such as income levels, housing density, or retail concentration. Crime patterns are often influenced by environmental and structural factors that extend across adjacent communities.
17.3
Several factors may explain why theft is more common in certain areas. Commercial zones with many stores, restaurants, and shopping centers provide more opportunities for theft. Areas with high foot traffic or public transit access may also attract offenders due to increased anonymity and potential targets. Additionally, socioeconomic conditions such as poverty, unemployment, and residential instability can influence crime levels. Environmental factors like lighting, surveillance, and police presence may also affect where theft occurs.
17.4
Two major limitations of reported crime data are:
Underreporting – Not all crimes are reported to police. Some victims may choose not to report theft, meaning official data may underestimate actual crime levels.
Reporting and recording bias – Crime data reflect police activity as well as actual crime. Areas with heavier police presence may show higher reported crime simply because more incidents are detected and recorded.
These limitations mean reported crime data do not fully capture the true extent or distribution of crime.
17.5
The choropleth map was more useful for understanding overall patterns of crime across community areas because it clearly shows differences in crime counts or rates by geographic boundaries. It makes it easier to compare neighborhoods and identify high-rate areas. However, the point map was helpful for visualizing the precise locations of incidents and spotting clustering within neighborhoods. Overall, the choropleth map was better for identifying broader spatial trends, while the point map was better for detailed location analysis.