Hate crime Assignment
Hate Crimes Dataset This dataset looks at all types of hate crimes in New York counties by the type of hate crime from 2010 to 2016.
My caveat:
Flawed hate crime data collection - we should know how the data was collected (Nathan Yau of Flowing Data, Dec 5, 2017)
Data can provvictim_cate you with important information, but when the collection process is flawed, there’s not much you can do. Ken Schwencke, reporting for ProPublica, researched the tiered system that the FBI relies on to gather hate crime data for the United States:
“Under a federal law passed in 1990, the FBI is required to track and tabulate crimes in which there was ‘manifest evvictim_catence of prejudice’ against a host of protected groups, regardless of differences in how state laws define who’s protected. The FBI, in turn, relies on local law enforcement agencies to collect and submit this data, but can’t compel them to do so.”
This is a link to the ProPublica Article: https://www.propublica.org/article/why-america-fails-at-gathering-hate-crime-statistics
Here is a data visualization of where hate crimes do NOT get reported around the country (Ken Schwencke, 2017): https://projects.propublica.org/graphics/hatecrime-map
So now we know that there is possible bias in the dataset, what can we do with it? library(tidyverse) #tinytex::install_tinytex() #library(tinytex) setwd(“C:/Users/rsaidi/Dropbox/Rachel/MontColl/Datasets/Datasets”) hatecrimes <- read_csv(“hateCrimes2010.csv”)
Clean up the data: Make all headers lowercase and remove spaces After cleaning up the variable names, look at the structure of the data. Since there are 44 variables in this dataset, you can use “summary” to decide which hate crimes to focus on. In the output of “summary”, look at the min/max values. Some have a max-vale of 1.
names(hatecrimes) <- tolower(names(hatecrimes)) names(hatecrimes) <- gsub(” “,”“,names(hatecrimes)) head(hatecrimes)
A tibble: 6 × 44
county year crimetype anti-male anti-female anti-transgender anti-genderidentityexpression anti-age* anti-white anti-black anti-americanindian/alaskannative anti-asian anti-nativehawaiian/pacificislander anti-multi-racialgroups anti-otherrace anti-jewish anti-catholic anti-protestant anti-islamic(muslim) anti-multi-religiousgroups
county year crimetype anti-male
Length:423 Min. :2010 Length:423 Min. :0.000000
Class :character 1st Qu.:2011 Class :character 1st Qu.:0.000000
Mode :character Median :2013 Mode :character Median :0.000000
Mean :2013 Mean :0.007092
3rd Qu.:2015 3rd Qu.:0.000000
Max. :2016 Max. :1.000000
anti-female anti-transgender anti-genderidentityexpression Min. :0.00000 Min. :0.00000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.00000 Median :0.00000 Median :0.00000
Mean :0.01655 Mean :0.04728 Mean :0.05674
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.00000 Max. :5.00000 Max. :3.00000
anti-age* anti-white anti-black
Min. :0.00000 Min. : 0.0000 Min. : 0.000
1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 0.000
Median :0.00000 Median : 0.0000 Median : 1.000
Mean :0.05201 Mean : 0.3357 Mean : 1.761
3rd Qu.:0.00000 3rd Qu.: 0.0000 3rd Qu.: 2.000
Max. :9.00000 Max. :11.0000 Max. :18.000
anti-americanindian/alaskannative anti-asian
Min. :0.000000 Min. :0.0000
1st Qu.:0.000000 1st Qu.:0.0000
Median :0.000000 Median :0.0000
Mean :0.007092 Mean :0.1773
3rd Qu.:0.000000 3rd Qu.:0.0000
Max. :1.000000 Max. :8.0000
anti-nativehawaiian/pacificislander anti-multi-racialgroups anti-otherrace Min. :0 Min. :0.00000 Min. :0
1st Qu.:0 1st Qu.:0.00000 1st Qu.:0
Median :0 Median :0.00000 Median :0
Mean :0 Mean :0.08511 Mean :0
3rd Qu.:0 3rd Qu.:0.00000 3rd Qu.:0
Max. :0 Max. :3.00000 Max. :0
anti-jewish anti-catholic anti-protestant anti-islamic(muslim) Min. : 0.000 Min. : 0.0000 Min. :0.00000 Min. : 0.0000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.: 0.0000
Median : 0.000 Median : 0.0000 Median :0.00000 Median : 0.0000
Mean : 3.981 Mean : 0.2695 Mean :0.02364 Mean : 0.4704
3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.:0.00000 3rd Qu.: 0.0000
Max. :82.000 Max. :12.0000 Max. :1.00000 Max. :10.0000
anti-multi-religiousgroups anti-atheism/agnosticism Min. : 0.00000 Min. :0
1st Qu.: 0.00000 1st Qu.:0
Median : 0.00000 Median :0
Mean : 0.07565 Mean :0
3rd Qu.: 0.00000 3rd Qu.:0
Max. :10.00000 Max. :0
anti-religiouspracticegenerally anti-otherreligion anti-buddhist Min. :0.000000 Min. :0.000 Min. :0
1st Qu.:0.000000 1st Qu.:0.000 1st Qu.:0
Median :0.000000 Median :0.000 Median :0
Mean :0.007092 Mean :0.104 Mean :0
3rd Qu.:0.000000 3rd Qu.:0.000 3rd Qu.:0
Max. :2.000000 Max. :4.000 Max. :0
anti-easternorthodox(greek,russian,etc.) anti-hindu
Min. :0.000000 Min. :0.000000
1st Qu.:0.000000 1st Qu.:0.000000
Median :0.000000 Median :0.000000
Mean :0.002364 Mean :0.002364
3rd Qu.:0.000000 3rd Qu.:0.000000
Max. :1.000000 Max. :1.000000
anti-jehovahswitness anti-mormon anti-otherchristian anti-sikh Min. :0 Min. :0 Min. :0.00000 Min. :0
1st Qu.:0 1st Qu.:0 1st Qu.:0.00000 1st Qu.:0
Median :0 Median :0 Median :0.00000 Median :0
Mean :0 Mean :0 Mean :0.01655 Mean :0
3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.00000 3rd Qu.:0
Max. :0 Max. :0 Max. :3.00000 Max. :0
anti-hispanic anti-arab anti-otherethnicity/nationalorigin Min. : 0.0000 Min. :0.00000 Min. : 0.0000
1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.: 0.0000
Median : 0.0000 Median :0.00000 Median : 0.0000
Mean : 0.3735 Mean :0.06619 Mean : 0.2837
3rd Qu.: 0.0000 3rd Qu.:0.00000 3rd Qu.: 0.0000
Max. :17.0000 Max. :2.00000 Max. :19.0000
anti-non-hispanic* anti-gaymale anti-gayfemale anti-gay(maleandfemale) Min. :0 Min. : 0.000 Min. :0.0000 Min. :0.0000
1st Qu.:0 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0 Median : 0.000 Median :0.0000 Median :0.0000
Mean :0 Mean : 1.499 Mean :0.2411 Mean :0.1017
3rd Qu.:0 3rd Qu.: 1.000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :0 Max. :36.000 Max. :8.0000 Max. :4.0000
anti-heterosexual anti-bisexual anti-physicaldisability Min. :0.000000 Min. :0.000000 Min. :0.00000
1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000
Median :0.000000 Median :0.000000 Median :0.00000
Mean :0.002364 Mean :0.004728 Mean :0.01182
3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000
Max. :1.000000 Max. :1.000000 Max. :1.00000
anti-mentaldisability totalincidents totalvictims totaloffenders
Min. :0.000000 Min. : 1.00 Min. : 1.00 Min. : 1.00
1st Qu.:0.000000 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 1.00
Median :0.000000 Median : 3.00 Median : 3.00 Median : 3.00
Mean :0.009456 Mean : 10.09 Mean : 10.48 Mean : 11.77
3rd Qu.:0.000000 3rd Qu.: 10.00 3rd Qu.: 10.00 3rd Qu.: 11.00
Max. :1.000000 Max. :101.00 Max. :106.00 Max. :113.00
I decided I would only look at the hate-crime types with a max number of 9 or more. That way I can focus on the most prominent types of hate-crimes.
hatecrimes2 <- hatecrimes |> select(county, year, ‘anti-black’, ‘anti-white’, ‘anti-jewish’, ‘anti-catholic’,‘anti-age*’,‘anti-islamic(muslim)’, anti-multi-religiousgroups, ‘anti-gaymale’, ‘anti-hispanic’, anti-otherethnicity/nationalorigin) |> group_by(county, year) head(hatecrimes2)
A tibble: 6 × 12
Groups: county, year [4]
county year anti-black anti-white anti-jewish anti-catholic anti-age* anti-islamic(muslim) anti-multi-religiousgroups anti-gaymale anti-hispanic anti-otherethnicity/nationalorigin
dim(hatecrimes2)
[1] 423 12 # There are currently 12 variables with 423 rows. summary(hatecrimes2)
county year anti-black anti-white
Length:423 Min. :2010 Min. : 0.000 Min. : 0.0000
Class :character 1st Qu.:2011 1st Qu.: 0.000 1st Qu.: 0.0000
Mode :character Median :2013 Median : 1.000 Median : 0.0000
Mean :2013 Mean : 1.761 Mean : 0.3357
3rd Qu.:2015 3rd Qu.: 2.000 3rd Qu.: 0.0000
Max. :2016 Max. :18.000 Max. :11.0000
anti-jewish anti-catholic anti-age* anti-islamic(muslim) Min. : 0.000 Min. : 0.0000 Min. :0.00000 Min. : 0.0000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.: 0.0000
Median : 0.000 Median : 0.0000 Median :0.00000 Median : 0.0000
Mean : 3.981 Mean : 0.2695 Mean :0.05201 Mean : 0.4704
3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.:0.00000 3rd Qu.: 0.0000
Max. :82.000 Max. :12.0000 Max. :9.00000 Max. :10.0000
anti-multi-religiousgroups anti-gaymale anti-hispanic
Min. : 0.00000 Min. : 0.000 Min. : 0.0000
1st Qu.: 0.00000 1st Qu.: 0.000 1st Qu.: 0.0000
Median : 0.00000 Median : 0.000 Median : 0.0000
Mean : 0.07565 Mean : 1.499 Mean : 0.3735
3rd Qu.: 0.00000 3rd Qu.: 1.000 3rd Qu.: 0.0000
Max. :10.00000 Max. :36.000 Max. :17.0000
anti-otherethnicity/nationalorigin Min. : 0.0000
1st Qu.: 0.0000
Median : 0.0000
Mean : 0.2837
3rd Qu.: 0.0000
Max. :19.0000
Convert from wide to long format Look at each set of hate-crimes for each type for each year. Convert the dataset from wide to long with the pivot_longer function. It will take each column’s hate-crime type combine them all into one column called “victim_cat”. Then each cell count will go into the new column, “crimecount”.
Finally, we are only doing this for the quantitiative variables, which are in columns 3 - 10. Note the command facet_wrap requires (~) before “victim_cat”.
hatelong <- hatecrimes2 |> pivot_longer( cols = 3:12, names_to = “victim_cat”, values_to = “crimecount”)
Now use the long format to create a facet plot hatecrimplot <-hatelong |> ggplot(aes(year, crimecount))+ geom_point()+ aes(color = victim_cat)+ facet_wrap(~victim_cat) hatecrimplot
Look deeper into crimes against blacks, gay males, and jews From the facet_wrap plot above, anti-black, anti-gay males, and anti-jewish categories seem to have highest rates of offenses reported. Filter out just for those 3 crimes.
hatenew <- hatelong |> filter( victim_cat %in% c(“anti-black”, “anti-jewish”, “anti-gaymale”))|> group_by(year, county) |> arrange(desc(crimecount)) hatenew
A tibble: 1,269 × 4
Groups: year, county [277]
county year victim_cat crimecount
plot2 <- hatenew |> ggplot() + geom_bar(aes(x=year, y=crimecount, fill = victim_cat), position = “dodge”, stat = “identity”) + labs(fill = “Hate Crime Type”, y = “Number of Hate Crime Incidents”, title = “Hate Crime Type in NY Counties Between 2010-2016”, caption = “Source: NY State Division of Criminal Justice Services”) plot2
We can see that hate crimes against jews spiked in 2012. All other years were relatively consistent with a slight upward trend. There was also an upward trend in hate crimes against gay males. Finally, there appears to be a downward trend in hate crimes against blacks during this period.
What about the counties? I have not dealt with the counties, but I think that is the next place to explore. I can make bar graphs by county instead of by year.
plot3 <- hatenew |> ggplot() + geom_bar(aes(x=county, y=crimecount, fill = victim_cat), position = “dodge”, stat = “identity”) + labs(fill = “Hate Crime Type”, y = “Number of Hate Crime Incidents”, title = “Hate Crime Type in NY Counties Between 2010-2016”, caption = “Source: NY State Division of Criminal Justice Services”) plot3
So many counties There are too many counties for this plot to make sense, but maybe we can just look at the 5 counties with the highest number of incidents. - use “group_by” to group each row by counties - use summarize to get the total sum of incidents by county - use arrange(desc) to arrange those sums of total incidents by counties in descending order.
counties <- hatenew |> group_by(year, county)|> summarize(sum = sum(crimecount)) |> arrange(desc(sum))
summarise() has grouped output by ‘year’. You can override using the .groups argument. counties
A tibble: 277 × 3
Groups: year [7]
year county sum
counties2 <- hatenew |> group_by(county)|> summarize(sum = sum(crimecount)) |> slice_max(order_by = sum, n=5) counties2
A tibble: 5 × 2
county sum
plot4 <- hatenew |> filter(county %in% c(“Kings”, “New York”, “Suffolk”, “Nassau”, “Queens”)) |> ggplot() + geom_bar(aes(x=county, y=crimecount, fill = victim_cat), position = “dodge”, stat = “identity”) + labs(y = “Number of Hate Crime Incidents”, title = “5 Counties in NY with Highest Incidents of Hate Crimes”, subtitle = “Between 2010-2016”, fill = “Hate Crime Type”, caption = “Source: NY State Division of Criminal Justice Services”) plot4
How would calculations be affected by looking at hate crimes in counties per year by population densities? Bring in census data for populations of New York counties. These are estimates from the 2010 census.
setwd(“C:/Users/rsaidi/Dropbox/Rachel/MontColl/Datasets/Datasets”) nypop <- read_csv(“newyorkpopulation.csv”)
Rows: 62 Columns: 8 ── Column specification ──────────────────────────────────────────────────────── Delimiter: “,” chr (1): Geography dbl (7): 2010, 2011, 2012, 2013, 2014, 2015, 2016
ℹ Use spec() to retrieve the full column specification for this data. ℹ Specify the column types or set show_col_types = FALSE to quiet this message. Clean the county name to match the other dataset Rename the variable “Geography” as “county” so that it matches in the other dataset.
nypop\(Geography <- gsub(" , New York", "", nypop\)Geography) nypop\(Geography <- gsub("County", "", nypop\)Geography) nypoplong <- nypop |> rename(county = Geography) |> gather(“year”, “population”, 2:8) nypoplong\(year <- as.double(nypoplong\)year) head(nypoplong)
A tibble: 6 × 3
county year population
Clean the nypoplong12 variable, county, so that matches the counties12 variable by Cutting off the “, New York” portion of the county listing
nypoplong12 <- nypoplong |> filter(year == 2012) |> arrange(desc(population)) |> head(10) nypoplong12\(county<-gsub(" , New York","",nypoplong12\)county) nypoplong12
A tibble: 10 × 3
county year population
Recall the total hate crime counts: Kings 713 New York 459 Suffolk 360 Nassau 298 Queens 235
Filter hate crimes just for 2012 as well counties12 <- counties |> filter(year == 2012) |> arrange(desc(sum)) counties12
A tibble: 41 × 3
Groups: year [1]
year county sum
A tibble: 41 × 4
Groups: year [1]
year county sum population
A tibble: 41 × 5
Groups: year [1]
year county sum population rate
dt <- datajoinrate[,c(“county”,“rate”)] dt
A tibble: 41 × 2
county rate
Follow Up Aggregating some of the categories aggregategroups <- hatecrimes |> pivot_longer( cols = 4:44, names_to = “victim_cat”, values_to = “crimecount” ) unique(aggregategroups$victim_cat)
[1] “anti-male”
[2] “anti-female”
[3] “anti-transgender”
[4] “anti-genderidentityexpression”
[5] “anti-age”
[6] ”anti-white”
[7] ”anti-black”
[8] ”anti-americanindian/alaskannative”
[9] ”anti-asian”
[10] ”anti-nativehawaiian/pacificislander”
[11] ”anti-multi-racialgroups”
[12] ”anti-otherrace”
[13] ”anti-jewish”
[14] ”anti-catholic”
[15] ”anti-protestant”
[16] ”anti-islamic(muslim)”
[17] ”anti-multi-religiousgroups”
[18] ”anti-atheism/agnosticism”
[19] ”anti-religiouspracticegenerally”
[20] ”anti-otherreligion”
[21] ”anti-buddhist”
[22] ”anti-easternorthodox(greek,russian,etc.)” [23] ”anti-hindu”
[24] ”anti-jehovahswitness”
[25] ”anti-mormon”
[26] ”anti-otherchristian”
[27] ”anti-sikh”
[28] ”anti-hispanic”
[29] ”anti-arab”
[30] ”anti-otherethnicity/nationalorigin”
[31] ”anti-non-hispanic”
[32] “anti-gaymale”
[33] “anti-gayfemale”
[34] “anti-gay(maleandfemale)”
[35] “anti-heterosexual”
[36] “anti-bisexual”
[37] “anti-physicaldisability”
[38] “anti-mentaldisability”
[39] “totalincidents”
[40] “totalvictims”
[41] “totaloffenders”
aggregategroups <- aggregategroups |> mutate(group = case_when( victim_cat %in% c(“anti-transgender”, “anti-gayfemale”, “anti-gendervictim_catendityexpression”, “anti-gaymale”, “anti-gay(maleandfemale”, “anti-bisexual”) ~ “anti-lgbtq”, victim_cat %in% c(“anti-multi-racialgroups”, “anti-jewish”, “anti-protestant”, “anti-multi-religousgroups”, “anti-religiouspracticegenerally”, “anti-buddhist”, “anti-hindu”, “anti-mormon”, “anti-sikh”, “anti-catholic”, “anti-islamic(muslim)”, “anti-atheism/agnosticism”, “anti-otherreligion”, “anti-easternorthodox(greek,russian,etc.)”, “anti-jehovahswitness”, “anti-otherchristian”) ~ “anti-religion”, victim_cat %in% c(“anti-asian”, “anti-arab”, “anti-non-hispanic”, “anti-white”, “anti-americanindian/alaskannative”, “anti-nativehawaiian/pacificislander”, “anti-otherrace”, “anti-hispanic”, “anti-otherethnicity/nationalorigin”) ~ “anti-ethnicity”, victim_cat %in% c(“anti-physicaldisability”, “anti-mentaldisability”) ~ “anti-disability”, victim_cat %in% c(“anti-female”, “anti-male”) ~ “anti-gender”, TRUE ~ “others”)) aggregategroups
A tibble: 17,343 × 6
county year crimetype victim_cat crimecount group
A tibble: 1,692 × 5
county year crimetype victim_cat crimecount
Write about the positive and negative aspects of this hate crimes data set
This New York hate crime data set definitely helps increase the public awareness of its occurrence. It helps in clarifying the bias-motivated instances that take place for certain groups. This dat aset can provide researchers with valuable information for studying the root causes of hate crimes.On the other hand, not all incidents may be reported, which may make the data inaccurate or skewed.
List 2 different paths you would like to (hypothetically) study about this data set.
One Path I’d like to study is to determine the demographic of the perp when a crime is reported. This will help analyze if certain demographics are targeting others. Another path I would like to study is the arrest rates when crimes are reported.
Describe 2 things you would do to follow up after seeing the output from the hate crimes tutorial
After seeing this hate crime data set I would do my own research to verify the accuracy of some figures.Another thing I would do is try to reach out to higher ups and law enforcement to try to improve the well-being of the targeted groups, and improve the arrests made of the perpetrators.