For this project I will explore the dataset “Sex Offenders: City of Chicago” provided by the city of Chicago state government. The dataset has 10 variables:

LAST: chr - last name of offender FIRST: chr - first name of offender BLOCK: chr - block offender lives on GENDER: chr - gender of offender RACE: chr - race of offender BIRTH.DATE: chr - birth date of offender AGE: num - age of offender HEIGHT: int - heigt of offender WEIGHT: int - weight of offender VICTIM.MINOR: chr - was the offender’s victim(s) a minor Y/N

I will explore a few questions, including:

what are the demographics of the offenders (age, race, gender)? age of the victims? location of the offenders?

City of Chicago, (2021, July 9). Sex Offenders: City of Chicago. Chicago Data Portal. https://data.cityofchicago.org/Public-Safety/Sex-Offenders/vc9r-bqvy.

#bring in data set
sexoff <- read.csv("sex-offenders.csv")
head(sexoff)
##       LAST     FIRST            BLOCK GENDER  RACE BIRTH.DATE AGE HEIGHT WEIGHT
## 1 MCGINNIS      DELL 0000X E 100TH PL   MALE BLACK 07/26/1982  37    505    141
## 2    WHITE   CHARLES 0000X E 100TH ST   MALE BLACK 05/02/1961  58    509    180
## 3    SIMON      GERA 0000X E 110TH PL   MALE BLACK 05/21/1952  67    504    110
## 4     WARD   RICHARD 0000X E 110TH PL   MALE BLACK 07/23/1949  70    506    190
## 5  WORTHON SEBASTIAN 0000X E 119TH PL   MALE BLACK 10/22/1982  37    600    180
## 6    SCOTT   MICHAEL  0000X E 91ST ST   MALE BLACK 05/22/1957  62    506    170
##   VICTIM.MINOR
## 1            Y
## 2            Y
## 3            Y
## 4            Y
## 5            Y
## 6            N
## bring in a few libraries
library(stats, dplyr, tidyr)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)

The data is clean enough to make simple graphs regarding demographics of the offender and age of victims.

## portion of offenders by race
race_table <- table(sexoff$RACE)
prop.table(race_table)
## 
## ASIAN/PACIFIC ISLANDER                  BLACK         BLACK HISPANIC 
##            0.014814815            0.612037037            0.003703704 
##                UNKNOWN                  WHITE         WHITE HISPANIC 
##            0.001851852            0.179629630            0.187962963
## creating a count_race column so I can order the bars
count_race <- sexoff %>%
  group_by(RACE) %>%
  mutate(count_race = n())

##bar graph
## removing the legend for space
ggplot(count_race, aes(x = reorder(RACE, count_race), fill = RACE)) +
  geom_bar(stat="count") +
  scale_x_discrete(guide = guide_axis(n.dodge=2)) +
  xlab("RACE")+
  theme(legend.position = "none") +
  ggtitle("Number of Offenders by Race")

The majority of sex offenders in Chicago are Black, followed by White Hispanic, White, Asian/Pacific Islander, Black Hispanic, and unknown. I’m going to change

## portion of offenders that are male
genders_table <- table(sexoff$GENDER)
prop.table(genders_table)
## 
##     FEMALE       MALE 
## 0.01851852 0.98148148
## bar graph for number of offenders by gender
ggplot(sexoff) +
  stat_count(mapping = aes(x = GENDER, fill = GENDER)) + 
  scale_fill_manual(values=c("#56B4E9", "#E69F00")) +
  xlab(NULL) +
  ggtitle("Number of Offenders by Gender")

Sex offenders are predominantly male.

## portion of victims that are minors
minors <- table(sexoff$VICTIM.MINOR)
prop.table(minors)
## 
##         N         Y 
## 0.2833333 0.7166667
##plotting number of minor victims with bar graph, and using different colors
ggplot(sexoff) +
  stat_count(mapping = aes(x = VICTIM.MINOR, fill = VICTIM.MINOR)) +
  scale_fill_manual(values=c("#9977FF", "#CCFFEE")) +
  xlab("VICTIM IS A MINOR?")+
  theme(legend.title = element_blank()) + 
  ggtitle("Number of Minor Victims")

Victim’s are primarily minors.

library(tibble)
sex_off <- as_tibble(sexoff)
##boxplot to show the median age, removing the legend, changing the color
ggplot(sex_off, aes(x = RACE, y = AGE, fill = RACE)) + 
  geom_boxplot() +
  scale_x_discrete(guide = guide_axis(n.dodge=2)) +
  theme(legend.position="none") +
  ggtitle("Median Age of Offender by Race") +
  scale_fill_brewer(type = "seq",palette = 10,direction = 1, aesthetics = "fill")

Median age of offenders appear to be between 45 and 62 when broken down by race. Looks like there are two outliers. I’ll explore if there are outliers for all of the ages combined as one group.

## I will explore the age variable to do a outlier analysis
summary(sexoff)
##      LAST              FIRST              BLOCK              GENDER         
##  Length:1080        Length:1080        Length:1080        Length:1080       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      RACE            BIRTH.DATE             AGE            HEIGHT     
##  Length:1080        Length:1080        Min.   :19.00   Min.   : 60.0  
##  Class :character   Class :character   1st Qu.:39.00   1st Qu.:507.0  
##  Mode  :character   Mode  :character   Median :49.00   Median :509.0  
##                                        Mean   :49.36   Mean   :531.9  
##                                        3rd Qu.:58.00   3rd Qu.:600.0  
##                                        Max.   :96.00   Max.   :608.0  
##      WEIGHT      VICTIM.MINOR      
##  Min.   :100.0   Length:1080       
##  1st Qu.:160.0   Class :character  
##  Median :185.0   Mode  :character  
##  Mean   :190.2                     
##  3rd Qu.:210.0                     
##  Max.   :607.0

Mediand age for all sex offenders is 49.

## looking at max and min ages
min(sexoff$AGE)
## [1] 19
max(sexoff$AGE)
## [1] 96
#histogram of all ages
hist(sexoff$AGE,
     xlab = "Age",
     main = "Histogram of Age")

So the outlier for White Hispanic is an outlier for that group only, but not for all ages. The outlier for black however, is still an outlier for all ages.

Going to filter to for ages above 80 to pinpoint the two outliers.

## using a fitler to look at the top ages since my histogram and box plot show these as potential outliers
age_outlier <- sexoff %>%
filter(sexoff$AGE > 80)

age_outlier
##       LAST   FIRST                 BLOCK GENDER           RACE BIRTH.DATE AGE
## 1     CORY HARDING       015XX W 68TH ST   MALE          BLACK 09/04/1923  96
## 2 WILLIAMS  DONALD 029XX W BRYN MAWR AVE   MALE          WHITE 03/08/1935  84
## 3   MIRTIA  ELIGHA 041XX S WENTWORTH AVE   MALE          BLACK 03/31/1938  81
## 4    MOODY OSEVELL 060XX S WASHTENAW AVE   MALE          BLACK 04/17/1935  84
## 5   CUEVAS    OMAR 061XX W THORNDALE AVE   MALE WHITE HISPANIC 10/06/1935  84
## 6    SHIPP     SAM        086XX S MAY ST   MALE          BLACK 04/19/1937  82
## 7   BRYANT    MARY   099XX S ABERDEEN ST FEMALE          BLACK 09/21/1938  81
##   HEIGHT WEIGHT VICTIM.MINOR
## 1    508    147            Y
## 2    507    225            Y
## 3    508    160            Y
## 4    507    160            Y
## 5    600    154            Y
## 6    601    235            Y
## 7    505    155            Y
## finding the percentiles
lowerage <- quantile(sexoff$AGE, 0.01)
lowerage
## 1% 
## 27
upperage <- quantile(sexoff$AGE, 0.99)
upperage
## 99% 
##  79

According to the quantile function using interval between 1% and 99% all ages before 27 and above 79 can be considered outliers. Hence almost all registered sex offenders living in Chicago are between the age of 27 and 79.

Exploring if there is a connection between race of offender and whether the victim is a minor.

##creating a new df
sexoff2 <- sexoff%>%
  group_by(RACE, VICTIM.MINOR) %>%
  summarize(MINOR_COUNT=n())
## `summarise()` has grouped output by 'RACE'. You can override using the `.groups` argument.
head(sexoff2)
## # A tibble: 6 x 3
## # Groups:   RACE [3]
##   RACE                   VICTIM.MINOR MINOR_COUNT
##   <chr>                  <chr>              <int>
## 1 ASIAN/PACIFIC ISLANDER N                      1
## 2 ASIAN/PACIFIC ISLANDER Y                     15
## 3 BLACK                  N                    240
## 4 BLACK                  Y                    421
## 5 BLACK HISPANIC         N                      1
## 6 BLACK HISPANIC         Y                      3
## I am trying to use jcolors package without much success so far
devtools::install_github("jaredhuling/jcolors")
## Skipping install of 'jcolors' from a github remote, the SHA1 (f2007550) has not changed since last install.
##   Use `force = TRUE` to force installation
library(jcolors)
jcolors('default')
##        kelly_green rich_electric_blue        maximum_red     majorelle_blue 
##          "#29BF12"          "#00A5CF"          "#DE1A1A"          "#574AE2" 
## fluorescent_orange 
##          "#FFBF00"
ggplot(sexoff2, aes(x=RACE, y = MINOR_COUNT, fill = VICTIM.MINOR)) +
  geom_bar(stat='identity', position = "dodge") +
  scale_x_discrete(guide = guide_axis(n.dodge=2)) +
  scale_y_continuous(trans = 'log2')

This chart shows a higher nubmer of victims were minors when the offender was black. There are high instances for this as well with White and White Hispanice offendors.

## re-reading data in and chaning certain categorical variables to factors.
sexoff3 <- within(read.csv("sex-offenders.csv"), {
  RACE <- as.factor(RACE)
  GENDER <- as.factor(GENDER)
  VICTIM.MINOR <- as.factor(VICTIM.MINOR)
})
##creating table that shows if victim was a minor grouped by the race of offender
table(sexoff3$RACE, sexoff3$VICTIM.MINOR)
##                         
##                            N   Y
##   ASIAN/PACIFIC ISLANDER   1  15
##   BLACK                  240 421
##   BLACK HISPANIC           1   3
##   UNKNOWN                  0   2
##   WHITE                   31 163
##   WHITE HISPANIC          33 170
## chi test for minor and race relationship.
chisq.test(sexoff3$RACE, sexoff3$VICTIM.MINOR, correct = FALSE)
## Warning in chisq.test(sexoff3$RACE, sexoff3$VICTIM.MINOR, correct = FALSE): Chi-
## squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  sexoff3$RACE and sexoff3$VICTIM.MINOR
## X-squared = 54.524, df = 5, p-value = 1.636e-10

From this data set and the low p-value, there is a relationship between whether the victim is a minor and the race of the offender.

## chi test for minor and age relationship.
chisq.test(sexoff3$AGE, sexoff3$VICTIM.MINOR, correct = FALSE)
## Warning in chisq.test(sexoff3$AGE, sexoff3$VICTIM.MINOR, correct = FALSE): Chi-
## squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  sexoff3$AGE and sexoff3$VICTIM.MINOR
## X-squared = 76.102, df = 60, p-value = 0.07847

Given the higher than 0.05 p-value, there is most likely not a relationship between age of the offender and whether the victim is a minor.

Exploring geographic data.

## breaking up block into four variables so I can isolate quadrant
sexoff4 <- sexoff %>% separate(BLOCK, c("HouseNumber", "Quadrant", "Street", "StreetType"), " ", extra = "merge")
##grouping by quandrant
sexoff4 %>%
  group_by(Quadrant) %>%
  summarize(quandrant_count = n())
## # A tibble: 4 x 2
##   Quadrant quandrant_count
##   <chr>              <int>
## 1 E                     71
## 2 N                    179
## 3 S                    538
## 4 W                    292
quadrant_count2 <- table(sexoff4$Quadrant)
prop.table(quadrant_count2)
## 
##          E          N          S          W 
## 0.06574074 0.16574074 0.49814815 0.27037037

There are a lot more sex offenders living in southside Chicago. I don’t actually know if ‘S’ means southside of Chicago or if its just a street naming convention that does not imply quandrant so this data may not tell us much.

Exploring if there are multiple sex offenders living on the same block.

##grouping by BLOCK
sexoff5 <- sexoff3 %>%
  group_by(BLOCK) %>%
  summarize(block_count = n())

sexoff5
## # A tibble: 902 x 2
##    BLOCK             block_count
##    <chr>                   <int>
##  1 0000X E 100TH PL            1
##  2 0000X E 100TH ST            1
##  3 0000X E 110TH PL            2
##  4 0000X E 119TH PL            1
##  5 0000X E 91ST ST             1
##  6 0000X N HOYNE AVE           1
##  7 0000X W 110TH ST            1
##  8 0000X W 112TH PL            3
##  9 0000X W 112TH ST            1
## 10 0000X W 113TH PL            1
## # … with 892 more rows
## summarizing by blocks to show how many sex offenders live at each one.
sexoff5 %>%
  group_by(block_count) %>%
  summarize(BLOCK = n())
## # A tibble: 8 x 2
##   block_count BLOCK
##         <int> <int>
## 1           1   793
## 2           2    75
## 3           3    16
## 4           4     8
## 5           5     6
## 6           6     2
## 7           7     1
## 8           8     1

I’m now going to create a treemap to show which blocks have more than two sex offenders living on it. ’m filtering out two and below to make it more managable. Treemap seems the most appropriate since I do not have time data, other than date of birth.

##without a filter I'd have too many blocks for my treemap so reducing to 2 or more sex offenders.
sexoff5 %>%
  filter(block_count > 1)
## # A tibble: 109 x 2
##    BLOCK                block_count
##    <chr>                      <int>
##  1 0000X E 110TH PL               2
##  2 0000X W 112TH PL               3
##  3 003XX E 120TH PL               3
##  4 004XX S CLARK ST               2
##  5 006XX N DRAKE AVE              2
##  6 007XX W GRAND AVE              2
##  7 008XX W 51ST PL                4
##  8 010XX W BALMORAL AVE           2
##  9 012XX E 71ST PL                2
## 10 012XX E 79TH ST                5
## # … with 99 more rows
##bringing in treemap library
library(treemap)
##creating treemap with which uses block_count for size and color.
treemap(sexoff5, index="BLOCK", vSize="block_count", 
              vColor="block_count", type="value",
              title = "Chicago Registered Sex Offenders",
              title.legend = "Number of Registered Sex Offenders By Block",
              palette="PuOr")

Not very useful looking map and I don’t like the color. I’ll filter out count of 2 and pick a different color.

##bringin viridis colors
library(viridis)
## Loading required package: viridisLite
##filtering out counts of 2
sexoff6 <- sexoff5 %>%
  filter(block_count > 2)

##creating treemap
treemap(sexoff6, index="BLOCK", vSize="block_count", 
              vColor="block_count", type="value",
              title = "Chicago Registered Sex Offenders",
              title.legend = "Number of Registered Sex Offenders",
              palette="Set1")

This is better, but I cannot get the color viridis right and I want more control over the text format so switching to geom_treemap using treemapify library.

##bring in treemapify library
library(treemapify)

##create geom_treemap.  Making the font white, border black, left justifying font, and other font formats.
ggplot(sexoff6, aes(area=block_count, fill = block_count, label = BLOCK)) +
  geom_treemap(colour="black") +
  geom_treemap_text(colour = "white", place = "left", size = 1, grow = TRUE, reflow = TRUE) +
  scale_fill_viridis_c() +
  labs(title="Chicago Registered Sex Offenders by Block", fill="# of Offenders")

So now we have a treemap which shows the blocks in Chicago that have more than two sex offenders living there. This could be useful for a parent who is moving and wants to move to a block with two or less offenders. Conversely, sex offenders who are having trouble finding a place to rent or buy due to their criminal history could also use this map to find potential blocks more receptive to a sex offender moving to that location.

Brief Essay:

One reason I was drawn to this dataset is due to my line of work in homeless services. I work for a city government agency similar in size to Chicago, and one unique challenge we face is housing homeless clients whom are registered sex offenders. A crucial tenet one should have when working in this industry is that everyone deserves a home, even registered sex offenders. In fact, many social workers would attest that the best way to avoid a homeless sex offender is to stabilize their housing. Individuals who are unhoused are much more likely to not take medication or receive counseling, both of which are key factors in reducing instances of relapse among the sex offender community, hence, a housed sex offender is less likely to re-offend. The crux of the challenge lies with finding both a landlord and a neighborhood that will rent a unit to a sex offender, as most are understandably reluctant to do so.

Given that at least 21 states and 400 municipalities have laws restricting where a sex offenders can live, where should they go? (Sandra Norman-Eady) To ensure lower rates of recidivism you want to both keep sex offenders away from potential targets, but also not isolate them to the point of homelessness or retreating to an underground status which makes them difficult to track.

The above treemap represents blocks in Chicago that contain at least three sex offenders as residences. Keeping track of where sex offenders are being allowed to rent or buy in such a way could direct sex offenders to stabilize housing and provide the rest of society needed information that will help inform where they wish to move, or not move, their families to.

After completing this treemap there are two areas that warrant further exploration if more time were available. First, I could not create an interactivity layer for this treemap. Ideally, a tooltip would allow for more blocks to be show on the treemap that would leave little room for text that a tool tip or hoover widget could compensate for. Howver, I have included a link to a tableu treemap visualization I created that does include tooltips (https://public.tableau.com/app/profile/kris.sutton/viz/KSProject2ChicagoSexOffenders/Treemap). Second, the data contained some location information that would lend itself to a GIS map if I were able to find the needed GIS files that could accommodate its creation. Also, next time I would include historical data to explore any associated trends.

Despite the heavy material, I had a lot of fun creating these visualizations and I am happy with the results.

Sandra Norman-Eady, C. A. (n.d.). SEX OFFENDERS’ RESIDENCY RESTRICTIONS. https://www.cga.ct.gov/2007/rpt/2007-r-0380.htm.