Hate Crimes Dataset

This dataset, provided by Corey, looks at all types of hate crimes in New York counties by the type of hate crime from 2010 to 2016.

library(tidyverse)
## ── Attaching packages ────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
#install.packages("googledrive")
library(googledrive)

Get hateCrimes2010.csv from DATASETS on Google Drive

Use the shareable link to hateCrimes2010.csv to download a copy to the working directory.

# drive_download runs but won't knit
#drive_download(as_id("https://drive.google.com/open?id=1iT6SxAm_gI2srBZuwJK-mEWp7kHCaLdt"),overwrite = TRUE)
hatecrimes<-read_csv("hateCrimes2010.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   County = col_character(),
##   `Crime Type` = col_character()
## )
## See spec(...) for full column specifications.

Clean up the Data: make all headers lowercase and remove spaces

After cleaning up the variable names, look at the structure of the data. Since there are 44 variables considered in this dataset, you can use “summary” to decide which hate crimes to focus on. In the output of “summary”, look at the min/max values. Some have a max-vale of 1.

names(hatecrimes) <- tolower(names(hatecrimes))
names(hatecrimes) <- gsub(" ","",names(hatecrimes))
str(hatecrimes)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 423 obs. of  44 variables:
##  $ county                                  : chr  "Albany" "Albany" "Allegany" "Bronx" ...
##  $ year                                    : num  2016 2016 2016 2016 2016 ...
##  $ crimetype                               : chr  "Crimes Against Persons" "Property Crimes" "Property Crimes" "Crimes Against Persons" ...
##  $ anti-male                               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-female                             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-transgender                        : num  0 0 0 4 0 0 0 0 0 0 ...
##  $ anti-genderidentityexpression           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-age*                               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-white                              : num  0 0 0 1 1 0 0 0 0 0 ...
##  $ anti-black                              : num  1 2 1 0 0 1 0 1 0 2 ...
##  $ anti-americanindian/alaskannative       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-asian                              : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ anti-nativehawaiian/pacificislander     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-multi-racialgroups                 : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-otherrace                          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-jewish                             : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ anti-catholic                           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-protestant                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-islamic(muslim)                    : num  1 0 0 6 0 0 0 0 1 0 ...
##  $ anti-multi-religiousgroups              : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ anti-atheism/agnosticism                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-religiouspracticegenerally         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-otherreligion                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-buddhist                           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-easternorthodox(greek,russian,etc.): num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-hindu                              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-jehovahswitness                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-mormon                             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-otherchristian                     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-sikh                               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-hispanic                           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-arab                               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-otherethnicity/nationalorigin      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-non-hispanic*                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-gaymale                            : num  1 0 0 8 0 1 0 0 0 0 ...
##  $ anti-gayfemale                          : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ anti-gay(maleandfemale)                 : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-heterosexual                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-bisexual                           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-physicaldisability                 : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-mentaldisability                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ totalincidents                          : num  3 3 1 20 2 3 1 1 1 2 ...
##  $ totalvictims                            : num  4 3 1 20 2 3 1 1 1 2 ...
##  $ totaloffenders                          : num  3 3 1 25 2 3 1 1 1 2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   County = col_character(),
##   ..   Year = col_double(),
##   ..   `Crime Type` = col_character(),
##   ..   `Anti-Male` = col_double(),
##   ..   `Anti-Female` = col_double(),
##   ..   `Anti-Transgender` = col_double(),
##   ..   `Anti-Gender Identity Expression` = col_double(),
##   ..   `Anti-Age*` = col_double(),
##   ..   `Anti-White` = col_double(),
##   ..   `Anti-Black` = col_double(),
##   ..   `Anti-American Indian/Alaskan Native` = col_double(),
##   ..   `Anti-Asian` = col_double(),
##   ..   `Anti-Native Hawaiian/Pacific Islander` = col_double(),
##   ..   `Anti-Multi-Racial Groups` = col_double(),
##   ..   `Anti-Other Race` = col_double(),
##   ..   `Anti-Jewish` = col_double(),
##   ..   `Anti-Catholic` = col_double(),
##   ..   `Anti-Protestant` = col_double(),
##   ..   `Anti-Islamic (Muslim)` = col_double(),
##   ..   `Anti-Multi-Religious Groups` = col_double(),
##   ..   `Anti-Atheism/Agnosticism` = col_double(),
##   ..   `Anti-Religious Practice Generally` = col_double(),
##   ..   `Anti-Other Religion` = col_double(),
##   ..   `Anti-Buddhist` = col_double(),
##   ..   `Anti-Eastern Orthodox (Greek, Russian, etc.)` = col_double(),
##   ..   `Anti-Hindu` = col_double(),
##   ..   `Anti-Jehovahs Witness` = col_double(),
##   ..   `Anti-Mormon` = col_double(),
##   ..   `Anti-Other Christian` = col_double(),
##   ..   `Anti-Sikh` = col_double(),
##   ..   `Anti-Hispanic` = col_double(),
##   ..   `Anti-Arab` = col_double(),
##   ..   `Anti-Other Ethnicity/National Origin` = col_double(),
##   ..   `Anti-Non-Hispanic*` = col_double(),
##   ..   `Anti-Gay Male` = col_double(),
##   ..   `Anti-Gay Female` = col_double(),
##   ..   `Anti-Gay (Male and Female)` = col_double(),
##   ..   `Anti-Heterosexual` = col_double(),
##   ..   `Anti-Bisexual` = col_double(),
##   ..   `Anti-Physical Disability` = col_double(),
##   ..   `Anti-Mental Disability` = col_double(),
##   ..   `Total Incidents` = col_double(),
##   ..   `Total Victims` = col_double(),
##   ..   `Total Offenders` = col_double()
##   .. )
summary(hatecrimes)
##     county               year       crimetype           anti-male       
##  Length:423         Min.   :2010   Length:423         Min.   :0.000000  
##  Class :character   1st Qu.:2011   Class :character   1st Qu.:0.000000  
##  Mode  :character   Median :2013   Mode  :character   Median :0.000000  
##                     Mean   :2013                      Mean   :0.007092  
##                     3rd Qu.:2015                      3rd Qu.:0.000000  
##                     Max.   :2016                      Max.   :1.000000  
##   anti-female      anti-transgender  anti-genderidentityexpression
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000              
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000              
##  Median :0.00000   Median :0.00000   Median :0.00000              
##  Mean   :0.01655   Mean   :0.04728   Mean   :0.05674              
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000              
##  Max.   :1.00000   Max.   :5.00000   Max.   :3.00000              
##    anti-age*         anti-white        anti-black    
##  Min.   :0.00000   Min.   : 0.0000   Min.   : 0.000  
##  1st Qu.:0.00000   1st Qu.: 0.0000   1st Qu.: 0.000  
##  Median :0.00000   Median : 0.0000   Median : 1.000  
##  Mean   :0.05201   Mean   : 0.3357   Mean   : 1.761  
##  3rd Qu.:0.00000   3rd Qu.: 0.0000   3rd Qu.: 2.000  
##  Max.   :9.00000   Max.   :11.0000   Max.   :18.000  
##  anti-americanindian/alaskannative   anti-asian    
##  Min.   :0.000000                  Min.   :0.0000  
##  1st Qu.:0.000000                  1st Qu.:0.0000  
##  Median :0.000000                  Median :0.0000  
##  Mean   :0.007092                  Mean   :0.1773  
##  3rd Qu.:0.000000                  3rd Qu.:0.0000  
##  Max.   :1.000000                  Max.   :8.0000  
##  anti-nativehawaiian/pacificislander anti-multi-racialgroups
##  Min.   :0                           Min.   :0.00000        
##  1st Qu.:0                           1st Qu.:0.00000        
##  Median :0                           Median :0.00000        
##  Mean   :0                           Mean   :0.08511        
##  3rd Qu.:0                           3rd Qu.:0.00000        
##  Max.   :0                           Max.   :3.00000        
##  anti-otherrace  anti-jewish     anti-catholic     anti-protestant  
##  Min.   :0      Min.   : 0.000   Min.   : 0.0000   Min.   :0.00000  
##  1st Qu.:0      1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.:0.00000  
##  Median :0      Median : 0.000   Median : 0.0000   Median :0.00000  
##  Mean   :0      Mean   : 3.981   Mean   : 0.2695   Mean   :0.02364  
##  3rd Qu.:0      3rd Qu.: 3.000   3rd Qu.: 0.0000   3rd Qu.:0.00000  
##  Max.   :0      Max.   :82.000   Max.   :12.0000   Max.   :1.00000  
##  anti-islamic(muslim) anti-multi-religiousgroups anti-atheism/agnosticism
##  Min.   : 0.0000      Min.   : 0.00000           Min.   :0               
##  1st Qu.: 0.0000      1st Qu.: 0.00000           1st Qu.:0               
##  Median : 0.0000      Median : 0.00000           Median :0               
##  Mean   : 0.4704      Mean   : 0.07565           Mean   :0               
##  3rd Qu.: 0.0000      3rd Qu.: 0.00000           3rd Qu.:0               
##  Max.   :10.0000      Max.   :10.00000           Max.   :0               
##  anti-religiouspracticegenerally anti-otherreligion anti-buddhist
##  Min.   :0.000000                Min.   :0.000      Min.   :0    
##  1st Qu.:0.000000                1st Qu.:0.000      1st Qu.:0    
##  Median :0.000000                Median :0.000      Median :0    
##  Mean   :0.007092                Mean   :0.104      Mean   :0    
##  3rd Qu.:0.000000                3rd Qu.:0.000      3rd Qu.:0    
##  Max.   :2.000000                Max.   :4.000      Max.   :0    
##  anti-easternorthodox(greek,russian,etc.)   anti-hindu      
##  Min.   :0.000000                         Min.   :0.000000  
##  1st Qu.:0.000000                         1st Qu.:0.000000  
##  Median :0.000000                         Median :0.000000  
##  Mean   :0.002364                         Mean   :0.002364  
##  3rd Qu.:0.000000                         3rd Qu.:0.000000  
##  Max.   :1.000000                         Max.   :1.000000  
##  anti-jehovahswitness  anti-mormon anti-otherchristian   anti-sikh
##  Min.   :0            Min.   :0    Min.   :0.00000     Min.   :0  
##  1st Qu.:0            1st Qu.:0    1st Qu.:0.00000     1st Qu.:0  
##  Median :0            Median :0    Median :0.00000     Median :0  
##  Mean   :0            Mean   :0    Mean   :0.01655     Mean   :0  
##  3rd Qu.:0            3rd Qu.:0    3rd Qu.:0.00000     3rd Qu.:0  
##  Max.   :0            Max.   :0    Max.   :3.00000     Max.   :0  
##  anti-hispanic       anti-arab       anti-otherethnicity/nationalorigin
##  Min.   : 0.0000   Min.   :0.00000   Min.   : 0.0000                   
##  1st Qu.: 0.0000   1st Qu.:0.00000   1st Qu.: 0.0000                   
##  Median : 0.0000   Median :0.00000   Median : 0.0000                   
##  Mean   : 0.3735   Mean   :0.06619   Mean   : 0.2837                   
##  3rd Qu.: 0.0000   3rd Qu.:0.00000   3rd Qu.: 0.0000                   
##  Max.   :17.0000   Max.   :2.00000   Max.   :19.0000                   
##  anti-non-hispanic*  anti-gaymale    anti-gayfemale  
##  Min.   :0          Min.   : 0.000   Min.   :0.0000  
##  1st Qu.:0          1st Qu.: 0.000   1st Qu.:0.0000  
##  Median :0          Median : 0.000   Median :0.0000  
##  Mean   :0          Mean   : 1.499   Mean   :0.2411  
##  3rd Qu.:0          3rd Qu.: 1.000   3rd Qu.:0.0000  
##  Max.   :0          Max.   :36.000   Max.   :8.0000  
##  anti-gay(maleandfemale) anti-heterosexual  anti-bisexual     
##  Min.   :0.0000          Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.0000          1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.0000          Median :0.000000   Median :0.000000  
##  Mean   :0.1017          Mean   :0.002364   Mean   :0.004728  
##  3rd Qu.:0.0000          3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :4.0000          Max.   :1.000000   Max.   :1.000000  
##  anti-physicaldisability anti-mentaldisability totalincidents  
##  Min.   :0.00000         Min.   :0.000000      Min.   :  1.00  
##  1st Qu.:0.00000         1st Qu.:0.000000      1st Qu.:  1.00  
##  Median :0.00000         Median :0.000000      Median :  3.00  
##  Mean   :0.01182         Mean   :0.009456      Mean   : 10.09  
##  3rd Qu.:0.00000         3rd Qu.:0.000000      3rd Qu.: 10.00  
##  Max.   :1.00000         Max.   :1.000000      Max.   :101.00  
##   totalvictims    totaloffenders  
##  Min.   :  1.00   Min.   :  1.00  
##  1st Qu.:  1.00   1st Qu.:  1.00  
##  Median :  3.00   Median :  3.00  
##  Mean   : 10.48   Mean   : 11.77  
##  3rd Qu.: 10.00   3rd Qu.: 11.00  
##  Max.   :106.00   Max.   :113.00

Select only certain hate-crimes

I decided I would only look at the hate-crime types with a max number or 9 or more. That way I can focus on the most prominent types of hate-crimes.

hatecrimes2 <- hatecrimes %>% 
  select(county, year, `anti-black`, 'anti-white', `anti-jewish`, 'anti-catholic','anti-age*','anti-islamic(muslim)', 'anti-gaymale', 'anti-hispanic', totalincidents, totalvictims, totaloffenders)
head(hatecrimes2)
## # A tibble: 6 x 13
##   county  year `anti-black` `anti-white` `anti-jewish` `anti-catholic`
##   <chr>  <dbl>        <dbl>        <dbl>         <dbl>           <dbl>
## 1 Albany  2016            1            0             0               0
## 2 Albany  2016            2            0             0               0
## 3 Alleg…  2016            1            0             0               0
## 4 Bronx   2016            0            1             0               0
## 5 Bronx   2016            0            1             1               0
## 6 Broome  2016            1            0             0               0
## # … with 7 more variables: `anti-age*` <dbl>,
## #   `anti-islamic(muslim)` <dbl>, `anti-gaymale` <dbl>,
## #   `anti-hispanic` <dbl>, totalincidents <dbl>, totalvictims <dbl>,
## #   totaloffenders <dbl>

Check Summary to make sure no missing values

Also check the dimensions to count how many variables remain

dim(hatecrimes2)
## [1] 423  13
summary(hatecrimes2)
##     county               year        anti-black       anti-white     
##  Length:423         Min.   :2010   Min.   : 0.000   Min.   : 0.0000  
##  Class :character   1st Qu.:2011   1st Qu.: 0.000   1st Qu.: 0.0000  
##  Mode  :character   Median :2013   Median : 1.000   Median : 0.0000  
##                     Mean   :2013   Mean   : 1.761   Mean   : 0.3357  
##                     3rd Qu.:2015   3rd Qu.: 2.000   3rd Qu.: 0.0000  
##                     Max.   :2016   Max.   :18.000   Max.   :11.0000  
##   anti-jewish     anti-catholic       anti-age*       anti-islamic(muslim)
##  Min.   : 0.000   Min.   : 0.0000   Min.   :0.00000   Min.   : 0.0000     
##  1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.:0.00000   1st Qu.: 0.0000     
##  Median : 0.000   Median : 0.0000   Median :0.00000   Median : 0.0000     
##  Mean   : 3.981   Mean   : 0.2695   Mean   :0.05201   Mean   : 0.4704     
##  3rd Qu.: 3.000   3rd Qu.: 0.0000   3rd Qu.:0.00000   3rd Qu.: 0.0000     
##  Max.   :82.000   Max.   :12.0000   Max.   :9.00000   Max.   :10.0000     
##   anti-gaymale    anti-hispanic     totalincidents    totalvictims   
##  Min.   : 0.000   Min.   : 0.0000   Min.   :  1.00   Min.   :  1.00  
##  1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.:  1.00   1st Qu.:  1.00  
##  Median : 0.000   Median : 0.0000   Median :  3.00   Median :  3.00  
##  Mean   : 1.499   Mean   : 0.3735   Mean   : 10.09   Mean   : 10.48  
##  3rd Qu.: 1.000   3rd Qu.: 0.0000   3rd Qu.: 10.00   3rd Qu.: 10.00  
##  Max.   :36.000   Max.   :17.0000   Max.   :101.00   Max.   :106.00  
##  totaloffenders  
##  Min.   :  1.00  
##  1st Qu.:  1.00  
##  Median :  3.00  
##  Mean   : 11.77  
##  3rd Qu.: 11.00  
##  Max.   :113.00

Order the crimes in descending order

Order the data, first by total indcidents, then total offenders, then by total victims. It will be interesting to see if counties and years correlate with certain types of crimes.

ordered <- hatecrimes2 %>% 
  arrange(desc(totalincidents, totaloffenders, totalvictims))
head(ordered)
## # A tibble: 6 x 13
##   county  year `anti-black` `anti-white` `anti-jewish` `anti-catholic`
##   <chr>  <dbl>        <dbl>        <dbl>         <dbl>           <dbl>
## 1 Kings   2012            4            1            82               6
## 2 Suffo…  2012           18            0            48               7
## 3 Kings   2010           10            3            34               0
## 4 New Y…  2016            6            5             9               0
## 5 Kings   2015            6            3            35               0
## 6 Kings   2016            4            6            26               1
## # … with 7 more variables: `anti-age*` <dbl>,
## #   `anti-islamic(muslim)` <dbl>, `anti-gaymale` <dbl>,
## #   `anti-hispanic` <dbl>, totalincidents <dbl>, totalvictims <dbl>,
## #   totaloffenders <dbl>

Use Facet_Wrap

Look at each set of hate-crimes for each type for each year. Use the package “tidyr” to convert the dataset from wide to long with the command “gather”. It will take each column’s hate-crime type combine them all into one column called “id”. Then each cell count will go into the new column, “crimecount”. Finally, we are only doing this for the quantitiative variables, which are in columns 3 - 13. Note the command facet_wrap requires (~) before “id”.

# install.package("reshape2")
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
hatecrimeslong <- ordered %>% tidyr::gather("id", "crimecount", 3:13) 
hatecrimesplot <-hatecrimeslong %>% 
  ggplot(., aes(year, crimecount))+
  geom_point()+
  aes(color = id)+
  facet_wrap(~id)
hatecrimesplot

## Look deeper into crimes against blacks, gay males, and jews From the facet_wrap plot above, anti-black, anti-gay males, and anti-jewish categories seem to have highest rates of offenses reported. Filter out just for those 3 crimes.

hatenew <- hatecrimeslong %>%
  filter( id== "anti-black" | id == "anti-jewish" | id == "anti-gaymale")

Plot these three types of hate crimes together

Use the following commands to finalize your barplot: - position = “dodge” makes side-by-side bars, rather than stacked bars - stat = “identity” allows you to plot each set of bars for each year between 2010 and 2016 - ggtitle gives the plot a title - labs gives a title to the legend

plot2 <- hatenew %>%
  ggplot() +
  geom_bar(aes(x=year, y=crimecount, fill = id),
      position = "dodge", stat = "identity") +
  ggtitle("Hate Crime Type in NY Counties Between 2010-2016") +
  ylab("Number of Hate Crime Incidents") + 
  labs(fill = "Hate Crime Type")
plot2

## What about the counties? I have not dealt with the counties, but I think that is the next place to explore. I can make bar graphs by county instead of by year.

plot3 <- hatenew %>%
  ggplot() +
  geom_bar(aes(x=county, y=crimecount, fill = id),
      position = "dodge", stat = "identity") +
  ggtitle("Hate Crime Type in NY Counties Between 2010-2016") +
  ylab("Number of Hate Crime Incidents") + 
  labs(fill = "Hate Crime Type")
plot3

## So many counties There are too many counties for this plot to make sense, but maybe we can just look at the 10 counties with the highest number of incidents. - use “group_by” to group each row by counties - use summarize to get the total sum of incidents by county - use arrange(desc) to arrange those sums of total incidents by counties in descending order - use top_n to list the 5 counties with highest total incidents

counties <- hatenew %>%
  group_by(county)%>%
  summarize(sum = sum(crimecount)) %>%
  arrange(desc(sum)) %>%
  top_n(n=5)
## Selecting by sum

Selecting by sum

counties
## # A tibble: 5 x 2
##   county     sum
##   <chr>    <dbl>
## 1 Kings      713
## 2 New York   459
## 3 Suffolk    360
## 4 Nassau     298
## 5 Queens     235

Finally, create the barplot above, but only for the 5 counties with the highest incidents of hate-crimes. The command “labs” is nice, because you can get a title, subtitle, y-axis label, and legend title, all in one command.

plot4 <- hatenew %>%
  filter(county =="Kings" | county =="New York" | county == "Suffolk" | county == "Nassau" | county == "Queens") %>%
  ggplot() +
  geom_bar(aes(x=county, y=crimecount, fill = id),
      position = "dodge", stat = "identity") +
  labs(ylab = "Number of Hate Crime Incidents",
    title = "5 Counties in NY with Highest Incidents of Hate Crimes",
    subtitle = "Between 2010-2016", 
    fill = "Hate Crime Type")
plot4

# force new title below plot

My take: Two word clouds:

I will produce one word cloud of the categories and frequencies of hate crime victims (2010-2016) and, for the same time interval, another of the NY counties and frequencies where they occur.

The hatecrimes dataset has too many columns for too many categories containing too many cells filled with zeroes to conveniently represent in two dimensions.

We must first mangle hatecrimes into two distinct tibbles

  1. hateCrimesSummary

containing county, year, crimetype, and totals

  1. hateCrimes

containing county, year, crimetype, victimCategory, and incidents

hateCrimesSummary <- hatecrimes %>%
  select(county, year, crimetype, totalvictims,totaloffenders,totalincidents)

Transform wide hatecrimes to tall hateCrimes:

hateCrimes <- hatecrimes %>%
  # eliminate the totals already placed in hateCrimeSummary
  select(-totalvictims,-totaloffenders,-totalincidents) %>%
  # combine victim categories column counts to two new columns
  gather("victimCategory","incidents",-county,-year,-crimetype) %>%
  # exclude rows in which there are no victims
  filter(incidents>0)
# what do the victimCategories look like?
unique(hateCrimes$victimCategory)
##  [1] "anti-male"                               
##  [2] "anti-female"                             
##  [3] "anti-transgender"                        
##  [4] "anti-genderidentityexpression"           
##  [5] "anti-age*"                               
##  [6] "anti-white"                              
##  [7] "anti-black"                              
##  [8] "anti-americanindian/alaskannative"       
##  [9] "anti-asian"                              
## [10] "anti-multi-racialgroups"                 
## [11] "anti-jewish"                             
## [12] "anti-catholic"                           
## [13] "anti-protestant"                         
## [14] "anti-islamic(muslim)"                    
## [15] "anti-multi-religiousgroups"              
## [16] "anti-religiouspracticegenerally"         
## [17] "anti-otherreligion"                      
## [18] "anti-easternorthodox(greek,russian,etc.)"
## [19] "anti-hindu"                              
## [20] "anti-otherchristian"                     
## [21] "anti-hispanic"                           
## [22] "anti-arab"                               
## [23] "anti-otherethnicity/nationalorigin"      
## [24] "anti-gaymale"                            
## [25] "anti-gayfemale"                          
## [26] "anti-gay(maleandfemale)"                 
## [27] "anti-heterosexual"                       
## [28] "anti-bisexual"                           
## [29] "anti-physicaldisability"                 
## [30] "anti-mentaldisability"

Describe the victim instead of the offense

Shorter category names can more easily fit on legends and titles. I hope we can do this without offending anyone.

Note the use of camelCase in object names and categories. This eliminates the need for embedded spaces and underscores.

“anti-” -> "" // this is now redundant
“transgender” -> “lgbTq*” // note the uppercase T
“genderidentityexpression” -> “lgbtq*”
“americanindian/alaskannative” -> “nativeAmerican”
“multi-racialgroups” -> “multiRacial”
“islamic.muslim.” -> “islamic”
“multi-religiousgroups” -> “syncretic” //too obscure?
“religiouspracticegenerally” -> “clerical”
“otherreligion” -> “otherReligion”
“easternorthodox.greek.russian.etc..” -> “easternChristian” // per wikipedia
“otherchristian” -> “otherChristian”
“otherethnicity.nationalorigin” -> “ethnicity” // includes nationality
“gaymale” -> “lGbtq*”
“gayfemale” -> “Lgbtq*”
“gay.maleandfemale.” -> “LGbtq*”
“heterosexual” -> “straight” // any objections?
“bisexual” -> “lgBtq*”
“physicaldisability” -> “physicallyDisabled”
“mentaldisability” -> “mentallyDisabled”
# shorten and correct (?) victim categories
hateCrimes <- hateCrimes %>%
select(county,victimCategory,year,crimetype,incidents) %>% 
         mutate(
  victimCategory=str_replace(victimCategory,"anti-","") ,
  victimCategory=str_replace(victimCategory,"transgender","lgbTq*") ,
  victimCategory=str_replace(victimCategory,"genderidentityexpression","lgbtq*") ,
  victimCategory=str_replace(victimCategory,"americanindian/alaskannative","nativeAmerican") ,
  victimCategory=str_replace(victimCategory,"multi-racialgroups","multiRacial") ,
  victimCategory=str_replace(victimCategory,"islamic.muslim.","islamic") ,
  victimCategory=str_replace(victimCategory,"multi-religiousgroups","syncretic") ,
  victimCategory=str_replace(victimCategory,"religiouspracticegenerally","clerical") ,
  victimCategory=str_replace(victimCategory,"otherreligion","otherReligion") ,
  victimCategory=str_replace(victimCategory,"easternorthodox.greek,russian,etc..","easternChristian") ,
  victimCategory=str_replace(victimCategory,"otherchristian","otherChristian") ,
  victimCategory=str_replace(victimCategory,"otherethnicity.nationalorigin","ethnicity") ,
  victimCategory=str_replace(victimCategory,"gaymale","lGbtq*") ,
  victimCategory=str_replace(victimCategory,"gayfemale","Lgbtq*") ,
  victimCategory=str_replace(victimCategory,"gay.maleandfemale.","LGbtq*") ,
  victimCategory=str_replace(victimCategory,"heterosexual","straight") ,
  victimCategory=str_replace(victimCategory,"bisexual","lgBtq*") ,
  victimCategory=str_replace(victimCategory,"physicaldisability","physicallyDisabled") ,
  victimCategory=str_replace(victimCategory,"mentaldisability","mentallyDisabled")
)

Visualizing the data

I tried some variations on treemaps and they all looked pretty ugly with illegible labels and legends. I have seen word clouds used in the Washington Post to represent phrases extracted from text. It seemed dubious how phrases of equal weights could be arbitrarily emphasized by how they were positioned on the page. Nevertheless, they can contain many categories legibly named.

 wordCloudArgs<-hateCrimes %>% 
    filter(crimetype!="Property Crimes") %>%
    select(victimCategory,incidents,crimetype) %>%
    group_by(victimCategory) %>%
    summarise(incident=sum(incidents)) %>%
  arrange(desc(incident))

# include median to give a better picture of the distribution
wordCloudArgs<- rbind(wordCloudArgs,tibble(
  victimCategory="median",
  incident=median(wordCloudArgs$incident))
)

# tried to see if log looked better: it didn't
wordCloudArgs<-wordCloudArgs[wordCloudArgs$incident!=0,]
wordCloudArgs$log<-log(wordCloudArgs$incident)

  
wordcloud::wordcloud(
  wordCloudArgs$victimCategory,wordCloudArgs$incident,
  min.freq = 1, # default is 3 which hides many rows
  # make it look pretty
  colors=RColorBrewer::brewer.pal(8, "Dark2"),
  random.color = TRUE,
  random.order = FALSE,
  scale=par("fin")*.35 
  # figure dimension in inches. .35 seems to best fill the plotting area
)
title(main=paste("Victims of",
sum(hateCrimes$incidents[hateCrimes$crimetype!="Property Crimes"]), 
"Hate Crimes in New York State"),sub="Crimes Against Persons reported between 2010 and 2016")

print(wordCloudArgs,n=300)
## # A tibble: 29 x 3
##    victimCategory     incident   log
##    <chr>                 <dbl> <dbl>
##  1 lGbtq*                  563 6.33 
##  2 black                   426 6.05 
##  3 jewish                  424 6.05 
##  4 islamic                 139 4.93 
##  5 hispanic                136 4.91 
##  6 white                   121 4.80 
##  7 Lgbtq*                   92 4.52 
##  8 ethnicity                70 4.25 
##  9 asian                    62 4.13 
## 10 LGbtq*                   26 3.26 
## 11 arab                     23 3.14 
## 12 lgbTq*                   19 2.94 
## 13 lgbtq*                   17 2.83 
## 14 catholic                 11 2.40 
## 15 otherReligion            11 2.40 
## 16 multiRacial               9 2.20 
## 17 female                    6 1.79 
## 18 physicallyDisabled        5 1.61 
## 19 syncretic                 5 1.61 
## 20 age*                      3 1.10 
## 21 mentallyDisabled          3 1.10 
## 22 lgBtq*                    2 0.693
## 23 male                      2 0.693
## 24 nativeAmerican            2 0.693
## 25 clerical                  1 0    
## 26 easternChristian          1 0    
## 27 protestant                1 0    
## 28 straight                  1 0    
## 29 median                   11 2.40

Package “wordcloud2” produces more dramatic plots but is full of bugs.

note the missing victimCategories: wordcloud2

# make another wordcloud for counties.  Pretty much the same as above
wordCloudArgs<-hateCrimesSummary %>% 
    filter(crimetype!="Property Crimes") %>%
    select(county,totalincidents) %>%
    group_by(county) %>%
    summarise(incident=sum(totalincidents)) %>%
  arrange(desc(incident))
   
  # include median to give a better picture of the distribution 
  wordCloudArgs<- rbind(wordCloudArgs,tibble(
  county="median",
  incident=median(wordCloudArgs$incident))
)
  
# tried to see if log looked better: it didn't
wordCloudArgs<-wordCloudArgs[wordCloudArgs$incident!=0,]
wordCloudArgs$log<-log(wordCloudArgs$incident)

wordcloud::wordcloud(
  wordCloudArgs$county,wordCloudArgs$log,
  min.freq = 1,
  colors=RColorBrewer::brewer.pal(8, "Dark2"),
  random.color = TRUE,
  random.order = FALSE,
  scale=par("fin")*.27
)
title(
  main=paste(
    sum(hateCrimesSummary$totalincidents[hateCrimesSummary$crimetype!="Property Crimes"]), 
             "Hate Crimes located in",
             nrow(wordCloudArgs),
             "Counties of New York State"), outer = TRUE,
  sub="Crimes Against Persons reported between 2010 and 2016",
  line = -1
  )

print(wordCloudArgs,n=300)
## # A tibble: 60 x 3
##    county       incident   log
##    <chr>           <dbl> <dbl>
##  1 Kings             483 6.18 
##  2 New York          444 6.10 
##  3 Queens            169 5.13 
##  4 Erie              166 5.11 
##  5 Suffolk           158 5.06 
##  6 Bronx             117 4.76 
##  7 Westchester        87 4.47 
##  8 Nassau             82 4.41 
##  9 Richmond           72 4.28 
## 10 Albany             37 3.61 
## 11 Rockland           34 3.53 
## 12 Dutchess           33 3.50 
## 13 Orange             31 3.43 
## 14 Monroe             30 3.40 
## 15 Niagara            15 2.71 
## 16 Tompkins           14 2.64 
## 17 Broome             13 2.56 
## 18 Otsego             13 2.56 
## 19 Saratoga           12 2.48 
## 20 St. Lawrence       11 2.40 
## 21 Ulster             11 2.40 
## 22 Oswego             10 2.30 
## 23 Clinton             9 2.20 
## 24 Multiple            9 2.20 
## 25 Schenectady         8 2.08 
## 26 Madison             7 1.95 
## 27 Onondaga            7 1.95 
## 28 Cayuga              6 1.79 
## 29 Franklin            6 1.79 
## 30 Livingston          6 1.79 
## 31 Cattaraugus         5 1.61 
## 32 Chautauqua          5 1.61 
## 33 Chenango            5 1.61 
## 34 Cortland            5 1.61 
## 35 Oneida              5 1.61 
## 36 Essex               4 1.39 
## 37 Jefferson           4 1.39 
## 38 Putnam              4 1.39 
## 39 Rensselaer          4 1.39 
## 40 Washington          4 1.39 
## 41 Wayne               4 1.39 
## 42 Chemung             3 1.10 
## 43 Columbia            3 1.10 
## 44 Ontario             3 1.10 
## 45 Orleans             3 1.10 
## 46 Schoharie           3 1.10 
## 47 Steuben             3 1.10 
## 48 Herkimer            2 0.693
## 49 Tioga               2 0.693
## 50 Allegany            1 0    
## 51 Delaware            1 0    
## 52 Fulton              1 0    
## 53 Genesee             1 0    
## 54 Greene              1 0    
## 55 Lewis               1 0    
## 56 Schuyler            1 0    
## 57 Seneca              1 0    
## 58 Sullivan            1 0    
## 59 Warren              1 0    
## 60 median              6 1.79