library(readr)
Sex_Offenders <- read_csv("C:/Users/sumay/Downloads/Sex_Offenders.csv")
## Rows: 3950 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): LAST, FIRST, BLOCK, GENDER, RACE, BIRTH DATE, VICTIM MINOR
## dbl (2): HEIGHT, WEIGHT
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Sex_Offenders) 
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ purrr     1.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
Sex_Offenders <- read_csv("C:/Users/sumay/Downloads/Sex_Offenders.csv")
## Rows: 3950 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): LAST, FIRST, BLOCK, GENDER, RACE, BIRTH DATE, VICTIM MINOR
## dbl (2): HEIGHT, WEIGHT
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Sex_Offenders) 
glimpse(Sex_Offenders)
## Rows: 3,950
## Columns: 9
## $ LAST           <chr> "WILLIAMS", "CHRISTIANSEN", "SOPHER", "GASCOIGNE", "BAR…
## $ FIRST          <chr> "GLYNN", "DWIGHT", "MICHAEL", "DUSTIN", "UNDERWOOD", "E…
## $ BLOCK          <chr> "086XX S COLFAX AVE", "004XX N WABASH AVE", "062XX S RO…
## $ GENDER         <chr> "MALE", "MALE", "MALE", "MALE", "MALE", "MALE", "MALE",…
## $ RACE           <chr> "BLACK", "WHITE", "WHITE", "WHITE", "BLACK", "BLACK", "…
## $ `BIRTH DATE`   <chr> "07/03/1962", "09/26/1959", "01/19/1993", "09/17/1987",…
## $ HEIGHT         <dbl> 509, 600, 602, 511, 510, 506, 511, 509, 507, 509, 507, …
## $ WEIGHT         <dbl> 189, 190, 182, 208, 175, 205, 260, 220, 145, 150, 211, …
## $ `VICTIM MINOR` <chr> "Y", "Y", "Y", "Y", "Y", "Y", "N", "Y", "Y", "Y", "Y", …
Sex_Offenders %>%
  count(GENDER) %>%
  ggplot(aes(x = GENDER, y = n)) +
  geom_bar(stat = "identity") +
  labs(title = "Distribution of Registered Sex Offenders by Gender",
       x = "Gender",
       y = "Count") +
  theme_minimal() 

Looking at this chart we see that MALE = almost all,FEMALE = very small number. This chart shows that the registry is overwhelmingly composed of male offenders, with very few female registrants. This aligns with broader national crime patterns, where sexual offenses are committed primarily by males.

Sex_Offenders %>%
  count(RACE) %>%
  arrange(desc(n)) %>%
  ggplot(aes(x = reorder(RACE, n), y = n)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Distribution of Registered Sex Offenders by Race",
       x = "Race",
       y = "Count") +
  theme_minimal() 

This chart shows the distribution of registered offenders by race. We can see that some racial groups are represented more frequently in the registry than others. However, this chart only shows raw counts, not population-adjusted rates, so it doesn’t tell us whether any group is overrepresented relative to Chicago’s overall population.

Sex_Offenders <- Sex_Offenders %>%
  mutate(birth_date = as.Date(`BIRTH DATE`, format = "%m/%d/%Y"),
    age = as.numeric(difftime(Sys.Date(), birth_date, units = "days")) / 365.25) 
summary(Sex_Offenders$age) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.13   41.05   50.04   50.88   60.41   96.78
Sex_Offenders %>%
  ggplot(aes(x = age)) +
  geom_histogram(binwidth = 5) +
  labs(title = "Age Distribution of Registered Sex Offenders",
       x = "Age",
       y = "Frequency") +
  theme_minimal() 

This histogram shows that most registered offenders are middle-aged, with the highest concentration between approximately 40 and 60 years old. There are relatively few younger or very elderly individuals in the registry.

Sex_Offenders %>%
  count(GENDER) %>%
  mutate(percent = n / sum(n) * 100) 
## # A tibble: 2 × 3
##   GENDER     n percent
##   <chr>  <int>   <dbl>
## 1 FEMALE    75    1.90
## 2 MALE    3875   98.1

Males account for approximately 98.1% of individuals in the registry, while females represent only about 1.9%. This highlights a substantial gender imbalance within the registered population and aligns with broader national patterns in sexual offense data.

Sex_Offenders %>%
  count(`VICTIM MINOR`) %>%
  ggplot(aes(x = `VICTIM MINOR`, y = n)) +
  geom_bar(stat = "identity") +
  labs(title = "Offenses Involving a Minor Victim",
       x = "Victim Was a Minor (Y/N)",
       y = "Count") +
  theme_minimal() 

This chart shows that a majority of registered offenses involved minor victims. The “Y” category is significantly higher than the “N” category, indicating that a large portion of individuals in the registry were convicted of offenses involving minors.

names(Sex_Offenders) 
##  [1] "LAST"         "FIRST"        "BLOCK"        "GENDER"       "RACE"        
##  [6] "BIRTH DATE"   "HEIGHT"       "WEIGHT"       "VICTIM MINOR" "birth_date"  
## [11] "age"
chicago_map <- map_data("county") 
head(chicago_map) 
##        long      lat group order  region subregion
## 1 -86.50517 32.34920     1     1 alabama   autauga
## 2 -86.53382 32.35493     1     2 alabama   autauga
## 3 -86.54527 32.36639     1     3 alabama   autauga
## 4 -86.55673 32.37785     1     4 alabama   autauga
## 5 -86.57966 32.38357     1     5 alabama   autauga
## 6 -86.59111 32.37785     1     6 alabama   autauga
chicago_map <- chicago_map %>%
  filter(region == "illinois", subregion == "cook") 
head(chicago_map) 
##        long      lat group order   region subregion
## 1 -88.20686 42.15824   576 22186 illinois      cook
## 2 -87.77141 42.15824   576 22187 illinois      cook
## 3 -87.74275 42.12386   576 22188 illinois      cook
## 4 -87.67400 42.06656   576 22189 illinois      cook
## 5 -87.66254 42.04364   576 22190 illinois      cook
## 6 -87.63963 41.99781   576 22191 illinois      cook
ggplot() +
  geom_polygon(data = chicago_map,
               aes(x = long, y = lat, group = group),
               fill = "gray90",
               color = "white") +
  coord_fixed(1.3) +
  labs(title = "Cook County Base Map") +
  theme_minimal() 

count(Sex_Offenders, BLOCK) 
## # A tibble: 2,279 × 2
##    BLOCK                        n
##    <chr>                    <int>
##  1 (registered as homeless)   549
##  2 0000X  0                     1
##  3 0000X  165TH ST              1
##  4 0000X  28W012 GALUSHA        1
##  5 0000X  ANNABLE COURT         1
##  6 0000X  ANNABLE CT            1
##  7 0000X  BUL                   1
##  8 0000X  CALLE LLINAS          1
##  9 0000X  CARRER GIRONA 25      1
## 10 0000X  DAWN LN               1
## # ℹ 2,269 more rows
block_counts <- count(Sex_Offenders, BLOCK) 
block_counts <- block_counts[order(-block_counts$n), ] 
head(block_counts, 10) 
## # A tibble: 10 × 2
##    BLOCK                        n
##    <chr>                    <int>
##  1 (registered as homeless)   549
##  2 066XX S PERRY AVE           75
##  3 11XXX S STATE ST            41
##  4 12XXX S GREEN ST            33
##  5 087XX S BALTIMORE AVE       31
##  6 042XX W CARROLL AVE         27
##  7 057XX S GREEN ST            26
##  8 008XX W 122ND ST            24
##  9 003XX W 117TH ST            21
## 10 041XX W CARROLL AVE         20
top_blocks <- block_counts[1:10, ] 
ggplot(top_blocks, aes(x = reorder(BLOCK, n), y = n)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Top 10 Blocks with Highest Number of Registrants",
       x = "Block",
       y = "Number of Registrants") +
  theme_minimal() 

This chart shows the top 10 block locations with the highest number of registered individuals. One notable finding is that a significant number of registrants are listed as “registered as homeless,” which far exceeds any specific residential block. The next highest block contains 75 registrants, compared to 549 classified as homeless, indicating a major concentration in housing instability reporting.

ggplot(Sex_Offenders, aes(x = age, fill = `VICTIM MINOR`)) +
  geom_histogram(position = "identity", alpha = 0.5, binwidth = 5) +
  labs(title = "Age Distribution by Victim Type",
       x = "Age",
       y = "Frequency",
       fill = "Victim Minor") +
  theme_minimal() 

This visualization compares the age distribution of registrants whose offenses involved minors versus those that did not. It allows us to assess whether age patterns differ based on victim type.