Essay/Student Response on the hateCrimes2010 data set
The hateCrimes2010 data set does a good job at describing hate crimes in each individual county, but it does have its own set of problems. For example, the title of the data set is titled “hateCrimes 2010” which can be misleading because it may tell the audience that the data set only measures the hate crimes in 2010. The data set actually measures hate crimes in the 2010s, so to make the data set title more accurate it can be renamed “hateCrimes2010s” by adding a “s” at the end to signify that the data set measures hate crimes during the 2010s instead of only the year 2010. Another issue is that the data set does not specify which state the counties belong to. I am assuming that the counties are from the state of New York. The title can be changed once again to “NewYorkhateCrimes2010s” to be more specific to new users.
If I were to hypothetically to go down a path studying the data set, I would create a map visualization of hate crimes in New York and investigate temporal trends. I would be interested in using GIS to create a map which would visualize the density of all hate crimes or individual hate crime types. This would help officials to visualize which areas of New York would have the highest rates of hate crimes. I am also interested in using the dataset to measure how prevalent hate crimes are in certain years for each individual county. And if I were to be given the dates of each hate crime committed and be able to integrate them into the dataset I would be able to analyze if the time of year has a correlation with the volume of hate crimes committed for certain groups.
After viewing and tweaking some of the code, I would like to further investigate the statistics of subsets in the data frame. This would require me to create the last chunk of code but instead of LGBTQ, I would do religion, ability, and race. I would like to investigate the volume of these over the years for each subset. I am also interested in creating a map visualization to measure the volume of each hate crime subset over the years. Like an animated map of New York that would measure the density of each hate crime type committed over the years of 2010-2016.
I would also want to investigate what counties have the highest rates of hate crimes, and investigate if there may be any factors that may correlate with the volume of hate crimes. These factors can include socioeconomic status, age, diversity rates, etc. This may give insight on if certain types of hate crimes are correlated to these factors.
Let’s open up the dataset
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 423 Columns: 44
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): County, Crime Type
dbl (42): Year, Anti-Male, Anti-Female, Anti-Transgender, Anti-Gender Identi...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 62 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Geography, 2016
dbl (6): 2010, 2011, 2012, 2013, 2014, 2015
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let’s make a five number summary for each variable
summary(hatecrimes)
county year crimetype anti-male
Length:423 Min. :2010 Length:423 Min. :0.000000
Class :character 1st Qu.:2011 Class :character 1st Qu.:0.000000
Mode :character Median :2013 Mode :character Median :0.000000
Mean :2013 Mean :0.007092
3rd Qu.:2015 3rd Qu.:0.000000
Max. :2016 Max. :1.000000
anti-female anti-transgender anti-genderidentityexpression
Min. :0.00000 Min. :0.00000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.00000 Median :0.00000 Median :0.00000
Mean :0.01655 Mean :0.04728 Mean :0.05674
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.00000 Max. :5.00000 Max. :3.00000
anti-age* anti-white anti-black
Min. :0.00000 Min. : 0.0000 Min. : 0.000
1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 0.000
Median :0.00000 Median : 0.0000 Median : 1.000
Mean :0.05201 Mean : 0.3357 Mean : 1.761
3rd Qu.:0.00000 3rd Qu.: 0.0000 3rd Qu.: 2.000
Max. :9.00000 Max. :11.0000 Max. :18.000
anti-americanindian/alaskannative anti-asian
Min. :0.000000 Min. :0.0000
1st Qu.:0.000000 1st Qu.:0.0000
Median :0.000000 Median :0.0000
Mean :0.007092 Mean :0.1773
3rd Qu.:0.000000 3rd Qu.:0.0000
Max. :1.000000 Max. :8.0000
anti-nativehawaiian/pacificislander anti-multi-racialgroups anti-otherrace
Min. :0 Min. :0.00000 Min. :0
1st Qu.:0 1st Qu.:0.00000 1st Qu.:0
Median :0 Median :0.00000 Median :0
Mean :0 Mean :0.08511 Mean :0
3rd Qu.:0 3rd Qu.:0.00000 3rd Qu.:0
Max. :0 Max. :3.00000 Max. :0
anti-jewish anti-catholic anti-protestant anti-islamic(muslim)
Min. : 0.000 Min. : 0.0000 Min. :0.00000 Min. : 0.0000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.: 0.0000
Median : 0.000 Median : 0.0000 Median :0.00000 Median : 0.0000
Mean : 3.981 Mean : 0.2695 Mean :0.02364 Mean : 0.4704
3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.:0.00000 3rd Qu.: 0.0000
Max. :82.000 Max. :12.0000 Max. :1.00000 Max. :10.0000
anti-multi-religiousgroups anti-atheism/agnosticism
Min. : 0.00000 Min. :0
1st Qu.: 0.00000 1st Qu.:0
Median : 0.00000 Median :0
Mean : 0.07565 Mean :0
3rd Qu.: 0.00000 3rd Qu.:0
Max. :10.00000 Max. :0
anti-religiouspracticegenerally anti-otherreligion anti-buddhist
Min. :0.000000 Min. :0.000 Min. :0
1st Qu.:0.000000 1st Qu.:0.000 1st Qu.:0
Median :0.000000 Median :0.000 Median :0
Mean :0.007092 Mean :0.104 Mean :0
3rd Qu.:0.000000 3rd Qu.:0.000 3rd Qu.:0
Max. :2.000000 Max. :4.000 Max. :0
anti-easternorthodox(greek,russian,etc.) anti-hindu
Min. :0.000000 Min. :0.000000
1st Qu.:0.000000 1st Qu.:0.000000
Median :0.000000 Median :0.000000
Mean :0.002364 Mean :0.002364
3rd Qu.:0.000000 3rd Qu.:0.000000
Max. :1.000000 Max. :1.000000
anti-jehovahswitness anti-mormon anti-otherchristian anti-sikh
Min. :0 Min. :0 Min. :0.00000 Min. :0
1st Qu.:0 1st Qu.:0 1st Qu.:0.00000 1st Qu.:0
Median :0 Median :0 Median :0.00000 Median :0
Mean :0 Mean :0 Mean :0.01655 Mean :0
3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.00000 3rd Qu.:0
Max. :0 Max. :0 Max. :3.00000 Max. :0
anti-hispanic anti-arab anti-otherethnicity/nationalorigin
Min. : 0.0000 Min. :0.00000 Min. : 0.0000
1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.: 0.0000
Median : 0.0000 Median :0.00000 Median : 0.0000
Mean : 0.3735 Mean :0.06619 Mean : 0.2837
3rd Qu.: 0.0000 3rd Qu.:0.00000 3rd Qu.: 0.0000
Max. :17.0000 Max. :2.00000 Max. :19.0000
anti-non-hispanic* anti-gaymale anti-gayfemale anti-gay(maleandfemale)
Min. :0 Min. : 0.000 Min. :0.0000 Min. :0.0000
1st Qu.:0 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0 Median : 0.000 Median :0.0000 Median :0.0000
Mean :0 Mean : 1.499 Mean :0.2411 Mean :0.1017
3rd Qu.:0 3rd Qu.: 1.000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :0 Max. :36.000 Max. :8.0000 Max. :4.0000
anti-heterosexual anti-bisexual anti-physicaldisability
Min. :0.000000 Min. :0.000000 Min. :0.00000
1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000
Median :0.000000 Median :0.000000 Median :0.00000
Mean :0.002364 Mean :0.004728 Mean :0.01182
3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000
Max. :1.000000 Max. :1.000000 Max. :1.00000
anti-mentaldisability totalincidents totalvictims totaloffenders
Min. :0.000000 Min. : 1.00 Min. : 1.00 Min. : 1.00
1st Qu.:0.000000 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 1.00
Median :0.000000 Median : 3.00 Median : 3.00 Median : 3.00
Mean :0.009456 Mean : 10.09 Mean : 10.48 Mean : 11.77
3rd Qu.:0.000000 3rd Qu.: 10.00 3rd Qu.: 10.00 3rd Qu.: 11.00
Max. :1.000000 Max. :101.00 Max. :106.00 Max. :113.00
plot2 <- hatenew |>ggplot() +geom_bar(aes(x=year, y=crimecount, fill = victim_cat),position ="dodge", stat ="identity") +labs(fill ="Hate Crime Type",y ="Number of Hate Crime Incidents",title ="Hate Crime Type in NY Counties Between 2010-2016",caption ="Source: NY State Division of Criminal Justice Services")plot2
plot3 <- hatenew |>ggplot() +geom_bar(aes(x=county, y=crimecount, fill = victim_cat),position ="dodge", stat ="identity") +labs(fill ="Hate Crime Type",y ="Number of Hate Crime Incidents",title ="Hate Crime Type in NY Counties Between 2010-2016",caption ="Source: NY State Division of Criminal Justice Services")plot3
# A tibble: 5 × 2
county sum
<chr> <dbl>
1 Kings 570
2 Suffolk 317
3 Nassau 283
4 New York 266
5 Queens 175
plot4 <- hatenew |>filter(county %in%c("Kings", "New York", "Suffolk", "Nassau", "Queens")) |>ggplot() +geom_bar(aes(x=county, y=crimecount, fill = victim_cat),position ="dodge", stat ="identity") +labs(y ="Number of Hate Crime Incidents",title ="5 Counties in NY with Highest Incidents of Hate Crimes",subtitle ="Between 2010-2016", fill ="Hate Crime Type",caption ="Source: NY State Division of Criminal Justice Services")plot4
# A tibble: 6 × 4
county Year Population year
<chr> <chr> <dbl> <dbl>
1 Albany , New York 2010 304078 2010
2 Allegany , New York 2010 48949 2010
3 Bronx , New York 2010 1388240 2010
4 Broome , New York 2010 200469 2010
5 Cattaraugus , New York 2010 80249 2010
6 Cayuga , New York 2010 79844 2010
nypoplong12 <- nypoplong |>filter(nypoplong$Year ==2012)|>arrange(desc(Population)) |>head(10)nypoplong12$county<-gsub(" , New York","",nypoplong12$county)nypoplong12
# A tibble: 10 × 4
county Year Population year
<chr> <chr> <dbl> <dbl>
1 Kings 2012 2572282 2012
2 Queens 2012 2278024 2012
3 New York 2012 1625121 2012
4 Suffolk 2012 1499382 2012
5 Bronx 2012 1414774 2012
6 Nassau 2012 1350748 2012
7 Westchester 2012 961073 2012
8 Erie 2012 920792 2012
9 Monroe 2012 748947 2012
10 Richmond 2012 470978 2012
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `rate = sum_clean/population_clean * 1e+05`.
Caused by warning in `sum_clean / population_clean`:
! longer object length is not a multiple of shorter object length
datajoinrate
# A tibble: 41 × 6
year county sum Year Population rate
<dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 2012 Suffolk 80 2012 1499382 5.34
2 2012 Kings 124 2012 2572282 4.82
3 2012 Nassau 48 2012 1350748 3.55
4 2012 Richmond 16 2012 470978 3.40
5 2012 New York 54 2012 1625121 3.32
6 2012 Erie 20 2012 920792 2.17
7 2012 Queens 38 2012 2278024 1.67
8 2012 Multiple 14 <NA> NA 1.46
9 2012 Bronx 19 2012 1414774 1.34
10 2012 Dutchess 9 <NA> NA 1.20
# ℹ 31 more rows
dt <- datajoinrate[,c("county","rate")]dt
# A tibble: 41 × 2
county rate
<chr> <dbl>
1 Suffolk 5.34
2 Kings 4.82
3 Nassau 3.55
4 Richmond 3.40
5 New York 3.32
6 Erie 2.17
7 Queens 1.67
8 Multiple 1.46
9 Bronx 1.34
10 Dutchess 1.20
# ℹ 31 more rows