library(readr)
Sex_Offenders <- read_csv("C:/Users/sumay/Downloads/Sex_Offenders.csv")
## Rows: 3950 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): LAST, FIRST, BLOCK, GENDER, RACE, BIRTH DATE, VICTIM MINOR
## dbl (2): HEIGHT, WEIGHT
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Sex_Offenders)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ purrr 1.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
Sex_Offenders <- read_csv("C:/Users/sumay/Downloads/Sex_Offenders.csv")
## Rows: 3950 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): LAST, FIRST, BLOCK, GENDER, RACE, BIRTH DATE, VICTIM MINOR
## dbl (2): HEIGHT, WEIGHT
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Sex_Offenders)
glimpse(Sex_Offenders)
## Rows: 3,950
## Columns: 9
## $ LAST <chr> "WILLIAMS", "CHRISTIANSEN", "SOPHER", "GASCOIGNE", "BAR…
## $ FIRST <chr> "GLYNN", "DWIGHT", "MICHAEL", "DUSTIN", "UNDERWOOD", "E…
## $ BLOCK <chr> "086XX S COLFAX AVE", "004XX N WABASH AVE", "062XX S RO…
## $ GENDER <chr> "MALE", "MALE", "MALE", "MALE", "MALE", "MALE", "MALE",…
## $ RACE <chr> "BLACK", "WHITE", "WHITE", "WHITE", "BLACK", "BLACK", "…
## $ `BIRTH DATE` <chr> "07/03/1962", "09/26/1959", "01/19/1993", "09/17/1987",…
## $ HEIGHT <dbl> 509, 600, 602, 511, 510, 506, 511, 509, 507, 509, 507, …
## $ WEIGHT <dbl> 189, 190, 182, 208, 175, 205, 260, 220, 145, 150, 211, …
## $ `VICTIM MINOR` <chr> "Y", "Y", "Y", "Y", "Y", "Y", "N", "Y", "Y", "Y", "Y", …
Sex_Offenders %>%
count(GENDER) %>%
ggplot(aes(x = GENDER, y = n)) +
geom_bar(stat = "identity") +
labs(title = "Distribution of Registered Sex Offenders by Gender",
x = "Gender",
y = "Count") +
theme_minimal()
Looking at this chart we see that MALE = almost all,FEMALE = very small
number. This chart shows that the registry is overwhelmingly composed of
male offenders, with very few female registrants. This aligns with
broader national crime patterns, where sexual offenses are committed
primarily by males.
Sex_Offenders %>%
count(RACE) %>%
arrange(desc(n)) %>%
ggplot(aes(x = reorder(RACE, n), y = n)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Distribution of Registered Sex Offenders by Race",
x = "Race",
y = "Count") +
theme_minimal()
This chart shows the distribution of registered offenders by race. We
can see that some racial groups are represented more frequently in the
registry than others. However, this chart only shows raw counts, not
population-adjusted rates, so it doesn’t tell us whether any group is
overrepresented relative to Chicago’s overall population.
Sex_Offenders <- Sex_Offenders %>%
mutate(birth_date = as.Date(`BIRTH DATE`, format = "%m/%d/%Y"),
age = as.numeric(difftime(Sys.Date(), birth_date, units = "days")) / 365.25)
summary(Sex_Offenders$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.13 41.05 50.04 50.88 60.41 96.78
Sex_Offenders %>%
ggplot(aes(x = age)) +
geom_histogram(binwidth = 5) +
labs(title = "Age Distribution of Registered Sex Offenders",
x = "Age",
y = "Frequency") +
theme_minimal()
This histogram shows that most registered offenders are middle-aged,
with the highest concentration between approximately 40 and 60 years
old. There are relatively few younger or very elderly individuals in the
registry.
Sex_Offenders %>%
count(GENDER) %>%
mutate(percent = n / sum(n) * 100)
## # A tibble: 2 × 3
## GENDER n percent
## <chr> <int> <dbl>
## 1 FEMALE 75 1.90
## 2 MALE 3875 98.1
Males account for approximately 98.1% of individuals in the registry, while females represent only about 1.9%. This highlights a substantial gender imbalance within the registered population and aligns with broader national patterns in sexual offense data.
Sex_Offenders %>%
count(`VICTIM MINOR`) %>%
ggplot(aes(x = `VICTIM MINOR`, y = n)) +
geom_bar(stat = "identity") +
labs(title = "Offenses Involving a Minor Victim",
x = "Victim Was a Minor (Y/N)",
y = "Count") +
theme_minimal()
This chart shows that a majority of registered offenses involved minor
victims. The “Y” category is significantly higher than the “N” category,
indicating that a large portion of individuals in the registry were
convicted of offenses involving minors.
names(Sex_Offenders)
## [1] "LAST" "FIRST" "BLOCK" "GENDER" "RACE"
## [6] "BIRTH DATE" "HEIGHT" "WEIGHT" "VICTIM MINOR" "birth_date"
## [11] "age"
chicago_map <- map_data("county")
head(chicago_map)
## long lat group order region subregion
## 1 -86.50517 32.34920 1 1 alabama autauga
## 2 -86.53382 32.35493 1 2 alabama autauga
## 3 -86.54527 32.36639 1 3 alabama autauga
## 4 -86.55673 32.37785 1 4 alabama autauga
## 5 -86.57966 32.38357 1 5 alabama autauga
## 6 -86.59111 32.37785 1 6 alabama autauga
chicago_map <- chicago_map %>%
filter(region == "illinois", subregion == "cook")
head(chicago_map)
## long lat group order region subregion
## 1 -88.20686 42.15824 576 22186 illinois cook
## 2 -87.77141 42.15824 576 22187 illinois cook
## 3 -87.74275 42.12386 576 22188 illinois cook
## 4 -87.67400 42.06656 576 22189 illinois cook
## 5 -87.66254 42.04364 576 22190 illinois cook
## 6 -87.63963 41.99781 576 22191 illinois cook
ggplot() +
geom_polygon(data = chicago_map,
aes(x = long, y = lat, group = group),
fill = "gray90",
color = "white") +
coord_fixed(1.3) +
labs(title = "Cook County Base Map") +
theme_minimal()
count(Sex_Offenders, BLOCK)
## # A tibble: 2,279 × 2
## BLOCK n
## <chr> <int>
## 1 (registered as homeless) 549
## 2 0000X 0 1
## 3 0000X 165TH ST 1
## 4 0000X 28W012 GALUSHA 1
## 5 0000X ANNABLE COURT 1
## 6 0000X ANNABLE CT 1
## 7 0000X BUL 1
## 8 0000X CALLE LLINAS 1
## 9 0000X CARRER GIRONA 25 1
## 10 0000X DAWN LN 1
## # ℹ 2,269 more rows
block_counts <- count(Sex_Offenders, BLOCK)
block_counts <- block_counts[order(-block_counts$n), ]
head(block_counts, 10)
## # A tibble: 10 × 2
## BLOCK n
## <chr> <int>
## 1 (registered as homeless) 549
## 2 066XX S PERRY AVE 75
## 3 11XXX S STATE ST 41
## 4 12XXX S GREEN ST 33
## 5 087XX S BALTIMORE AVE 31
## 6 042XX W CARROLL AVE 27
## 7 057XX S GREEN ST 26
## 8 008XX W 122ND ST 24
## 9 003XX W 117TH ST 21
## 10 041XX W CARROLL AVE 20
top_blocks <- block_counts[1:10, ]
ggplot(top_blocks, aes(x = reorder(BLOCK, n), y = n)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Blocks with Highest Number of Registrants",
x = "Block",
y = "Number of Registrants") +
theme_minimal()
This chart shows the top 10 block locations with the highest number of
registered individuals. One notable finding is that a significant number
of registrants are listed as “registered as homeless,” which far exceeds
any specific residential block. The next highest block contains 75
registrants, compared to 549 classified as homeless, indicating a major
concentration in housing instability reporting.
ggplot(Sex_Offenders, aes(x = age, fill = `VICTIM MINOR`)) +
geom_histogram(position = "identity", alpha = 0.5, binwidth = 5) +
labs(title = "Age Distribution by Victim Type",
x = "Age",
y = "Frequency",
fill = "Victim Minor") +
theme_minimal()
This visualization compares the age distribution of registrants whose
offenses involved minors versus those that did not. It allows us to
assess whether age patterns differ based on victim type.