hate crimes

Author

Sata

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
 setwd("C:/Users/satad/Desktop/data110")
  hatecrimes2010 <- read_csv("hateCrimes2010.csv")
Rows: 423 Columns: 44
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, Crime Type
dbl (42): Year, Anti-Male, Anti-Female, Anti-Transgender, Anti-Gender Identi...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
 head(hatecrimes2010)
# A tibble: 6 × 44
  County    Year `Crime Type`       `Anti-Male` `Anti-Female` `Anti-Transgender`
  <chr>    <dbl> <chr>                    <dbl>         <dbl>              <dbl>
1 Albany    2016 Crimes Against Pe…           0             0                  0
2 Albany    2016 Property Crimes              0             0                  0
3 Allegany  2016 Property Crimes              0             0                  0
4 Bronx     2016 Crimes Against Pe…           0             0                  4
5 Bronx     2016 Property Crimes              0             0                  0
6 Broome    2016 Crimes Against Pe…           0             0                  0
# ℹ 38 more variables: `Anti-Gender Identity Expression` <dbl>,
#   `Anti-Age*` <dbl>, `Anti-White` <dbl>, `Anti-Black` <dbl>,
#   `Anti-American Indian/Alaskan Native` <dbl>, `Anti-Asian` <dbl>,
#   `Anti-Native Hawaiian/Pacific Islander` <dbl>,
#   `Anti-Multi-Racial Groups` <dbl>, `Anti-Other Race` <dbl>,
#   `Anti-Jewish` <dbl>, `Anti-Catholic` <dbl>, `Anti-Protestant` <dbl>,
#   `Anti-Islamic (Muslim)` <dbl>, `Anti-Multi-Religious Groups` <dbl>, …

#install tinytex

library(tinytex)

#make all lowercase

names(hatecrimes2010) <- tolower(names(hatecrimes2010))
names(hatecrimes2010) <- gsub(" ", "_", names(hatecrimes2010))
head(hatecrimes2010)
# A tibble: 6 × 44
  county    year crime_type         `anti-male` `anti-female` `anti-transgender`
  <chr>    <dbl> <chr>                    <dbl>         <dbl>              <dbl>
1 Albany    2016 Crimes Against Pe…           0             0                  0
2 Albany    2016 Property Crimes              0             0                  0
3 Allegany  2016 Property Crimes              0             0                  0
4 Bronx     2016 Crimes Against Pe…           0             0                  4
5 Bronx     2016 Property Crimes              0             0                  0
6 Broome    2016 Crimes Against Pe…           0             0                  0
# ℹ 38 more variables: `anti-gender_identity_expression` <dbl>,
#   `anti-age*` <dbl>, `anti-white` <dbl>, `anti-black` <dbl>,
#   `anti-american_indian/alaskan_native` <dbl>, `anti-asian` <dbl>,
#   `anti-native_hawaiian/pacific_islander` <dbl>,
#   `anti-multi-racial_groups` <dbl>, `anti-other_race` <dbl>,
#   `anti-jewish` <dbl>, `anti-catholic` <dbl>, `anti-protestant` <dbl>,
#   `anti-islamic_(muslim)` <dbl>, `anti-multi-religious_groups` <dbl>, …
summary(hatecrimes2010)
    county               year       crime_type          anti-male       
 Length:423         Min.   :2010   Length:423         Min.   :0.000000  
 Class :character   1st Qu.:2011   Class :character   1st Qu.:0.000000  
 Mode  :character   Median :2013   Mode  :character   Median :0.000000  
                    Mean   :2013                      Mean   :0.007092  
                    3rd Qu.:2015                      3rd Qu.:0.000000  
                    Max.   :2016                      Max.   :1.000000  
  anti-female      anti-transgender  anti-gender_identity_expression
 Min.   :0.00000   Min.   :0.00000   Min.   :0.00000                
 1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000                
 Median :0.00000   Median :0.00000   Median :0.00000                
 Mean   :0.01655   Mean   :0.04728   Mean   :0.05674                
 3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000                
 Max.   :1.00000   Max.   :5.00000   Max.   :3.00000                
   anti-age*         anti-white        anti-black    
 Min.   :0.00000   Min.   : 0.0000   Min.   : 0.000  
 1st Qu.:0.00000   1st Qu.: 0.0000   1st Qu.: 0.000  
 Median :0.00000   Median : 0.0000   Median : 1.000  
 Mean   :0.05201   Mean   : 0.3357   Mean   : 1.761  
 3rd Qu.:0.00000   3rd Qu.: 0.0000   3rd Qu.: 2.000  
 Max.   :9.00000   Max.   :11.0000   Max.   :18.000  
 anti-american_indian/alaskan_native   anti-asian    
 Min.   :0.000000                    Min.   :0.0000  
 1st Qu.:0.000000                    1st Qu.:0.0000  
 Median :0.000000                    Median :0.0000  
 Mean   :0.007092                    Mean   :0.1773  
 3rd Qu.:0.000000                    3rd Qu.:0.0000  
 Max.   :1.000000                    Max.   :8.0000  
 anti-native_hawaiian/pacific_islander anti-multi-racial_groups anti-other_race
 Min.   :0                             Min.   :0.00000          Min.   :0      
 1st Qu.:0                             1st Qu.:0.00000          1st Qu.:0      
 Median :0                             Median :0.00000          Median :0      
 Mean   :0                             Mean   :0.08511          Mean   :0      
 3rd Qu.:0                             3rd Qu.:0.00000          3rd Qu.:0      
 Max.   :0                             Max.   :3.00000          Max.   :0      
  anti-jewish     anti-catholic     anti-protestant   anti-islamic_(muslim)
 Min.   : 0.000   Min.   : 0.0000   Min.   :0.00000   Min.   : 0.0000      
 1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.:0.00000   1st Qu.: 0.0000      
 Median : 0.000   Median : 0.0000   Median :0.00000   Median : 0.0000      
 Mean   : 3.981   Mean   : 0.2695   Mean   :0.02364   Mean   : 0.4704      
 3rd Qu.: 3.000   3rd Qu.: 0.0000   3rd Qu.:0.00000   3rd Qu.: 0.0000      
 Max.   :82.000   Max.   :12.0000   Max.   :1.00000   Max.   :10.0000      
 anti-multi-religious_groups anti-atheism/agnosticism
 Min.   : 0.00000            Min.   :0               
 1st Qu.: 0.00000            1st Qu.:0               
 Median : 0.00000            Median :0               
 Mean   : 0.07565            Mean   :0               
 3rd Qu.: 0.00000            3rd Qu.:0               
 Max.   :10.00000            Max.   :0               
 anti-religious_practice_generally anti-other_religion anti-buddhist
 Min.   :0.000000                  Min.   :0.000       Min.   :0    
 1st Qu.:0.000000                  1st Qu.:0.000       1st Qu.:0    
 Median :0.000000                  Median :0.000       Median :0    
 Mean   :0.007092                  Mean   :0.104       Mean   :0    
 3rd Qu.:0.000000                  3rd Qu.:0.000       3rd Qu.:0    
 Max.   :2.000000                  Max.   :4.000       Max.   :0    
 anti-eastern_orthodox_(greek,_russian,_etc.)   anti-hindu      
 Min.   :0.000000                             Min.   :0.000000  
 1st Qu.:0.000000                             1st Qu.:0.000000  
 Median :0.000000                             Median :0.000000  
 Mean   :0.002364                             Mean   :0.002364  
 3rd Qu.:0.000000                             3rd Qu.:0.000000  
 Max.   :1.000000                             Max.   :1.000000  
 anti-jehovahs_witness  anti-mormon anti-other_christian   anti-sikh
 Min.   :0             Min.   :0    Min.   :0.00000      Min.   :0  
 1st Qu.:0             1st Qu.:0    1st Qu.:0.00000      1st Qu.:0  
 Median :0             Median :0    Median :0.00000      Median :0  
 Mean   :0             Mean   :0    Mean   :0.01655      Mean   :0  
 3rd Qu.:0             3rd Qu.:0    3rd Qu.:0.00000      3rd Qu.:0  
 Max.   :0             Max.   :0    Max.   :3.00000      Max.   :0  
 anti-hispanic       anti-arab       anti-other_ethnicity/national_origin
 Min.   : 0.0000   Min.   :0.00000   Min.   : 0.0000                     
 1st Qu.: 0.0000   1st Qu.:0.00000   1st Qu.: 0.0000                     
 Median : 0.0000   Median :0.00000   Median : 0.0000                     
 Mean   : 0.3735   Mean   :0.06619   Mean   : 0.2837                     
 3rd Qu.: 0.0000   3rd Qu.:0.00000   3rd Qu.: 0.0000                     
 Max.   :17.0000   Max.   :2.00000   Max.   :19.0000                     
 anti-non-hispanic* anti-gay_male    anti-gay_female 
 Min.   :0          Min.   : 0.000   Min.   :0.0000  
 1st Qu.:0          1st Qu.: 0.000   1st Qu.:0.0000  
 Median :0          Median : 0.000   Median :0.0000  
 Mean   :0          Mean   : 1.499   Mean   :0.2411  
 3rd Qu.:0          3rd Qu.: 1.000   3rd Qu.:0.0000  
 Max.   :0          Max.   :36.000   Max.   :8.0000  
 anti-gay_(male_and_female) anti-heterosexual  anti-bisexual     
 Min.   :0.0000             Min.   :0.000000   Min.   :0.000000  
 1st Qu.:0.0000             1st Qu.:0.000000   1st Qu.:0.000000  
 Median :0.0000             Median :0.000000   Median :0.000000  
 Mean   :0.1017             Mean   :0.002364   Mean   :0.004728  
 3rd Qu.:0.0000             3rd Qu.:0.000000   3rd Qu.:0.000000  
 Max.   :4.0000             Max.   :1.000000   Max.   :1.000000  
 anti-physical_disability anti-mental_disability total_incidents 
 Min.   :0.00000          Min.   :0.000000       Min.   :  1.00  
 1st Qu.:0.00000          1st Qu.:0.000000       1st Qu.:  1.00  
 Median :0.00000          Median :0.000000       Median :  3.00  
 Mean   :0.01182          Mean   :0.009456       Mean   : 10.09  
 3rd Qu.:0.00000          3rd Qu.:0.000000       3rd Qu.: 10.00  
 Max.   :1.00000          Max.   :1.000000       Max.   :101.00  
 total_victims    total_offenders 
 Min.   :  1.00   Min.   :  1.00  
 1st Qu.:  1.00   1st Qu.:  1.00  
 Median :  3.00   Median :  3.00  
 Mean   : 10.48   Mean   : 11.77  
 3rd Qu.: 10.00   3rd Qu.: 11.00  
 Max.   :106.00   Max.   :113.00  
hatecrimes2 <- hatecrimes2010 |>
  select(county , year, 'anti-black', 'anti-transgender','anti-jewish','anti-bisexual', 'anti-asian', 'anti-catholic','anti-female','anti-male','anti-white')
  head(hatecrimes2)
# A tibble: 6 × 11
  county    year `anti-black` `anti-transgender` `anti-jewish` `anti-bisexual`
  <chr>    <dbl>        <dbl>              <dbl>         <dbl>           <dbl>
1 Albany    2016            1                  0             0               0
2 Albany    2016            2                  0             0               0
3 Allegany  2016            1                  0             0               0
4 Bronx     2016            0                  4             0               0
5 Bronx     2016            0                  0             1               0
6 Broome    2016            1                  0             0               0
# ℹ 5 more variables: `anti-asian` <dbl>, `anti-catholic` <dbl>,
#   `anti-female` <dbl>, `anti-male` <dbl>, `anti-white` <dbl>

#let’s check the dimension

dim(hatecrimes2)
[1] 423  11

There are currently 11 variables with 423 rows.

summary(hatecrimes2)
    county               year        anti-black     anti-transgender 
 Length:423         Min.   :2010   Min.   : 0.000   Min.   :0.00000  
 Class :character   1st Qu.:2011   1st Qu.: 0.000   1st Qu.:0.00000  
 Mode  :character   Median :2013   Median : 1.000   Median :0.00000  
                    Mean   :2013   Mean   : 1.761   Mean   :0.04728  
                    3rd Qu.:2015   3rd Qu.: 2.000   3rd Qu.:0.00000  
                    Max.   :2016   Max.   :18.000   Max.   :5.00000  
  anti-jewish     anti-bisexual        anti-asian     anti-catholic    
 Min.   : 0.000   Min.   :0.000000   Min.   :0.0000   Min.   : 0.0000  
 1st Qu.: 0.000   1st Qu.:0.000000   1st Qu.:0.0000   1st Qu.: 0.0000  
 Median : 0.000   Median :0.000000   Median :0.0000   Median : 0.0000  
 Mean   : 3.981   Mean   :0.004728   Mean   :0.1773   Mean   : 0.2695  
 3rd Qu.: 3.000   3rd Qu.:0.000000   3rd Qu.:0.0000   3rd Qu.: 0.0000  
 Max.   :82.000   Max.   :1.000000   Max.   :8.0000   Max.   :12.0000  
  anti-female        anti-male          anti-white     
 Min.   :0.00000   Min.   :0.000000   Min.   : 0.0000  
 1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.: 0.0000  
 Median :0.00000   Median :0.000000   Median : 0.0000  
 Mean   :0.01655   Mean   :0.007092   Mean   : 0.3357  
 3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.: 0.0000  
 Max.   :1.00000   Max.   :1.000000   Max.   :11.0000  

#convert widr format to long format

hatelong <- hatecrimes2 |> 
    pivot_longer(
        cols = 3:11,
  names_to = "victim_cat",
   values_to = "crimecount")

#creation of a facetplot

hatecrimplot <-hatelong |> 
  ggplot(aes(year, crimecount))+
  geom_point()+
  aes(color = victim_cat)+
  facet_wrap(~victim_cat)
hatecrimplot

#let’s focus on ‘anti-jewish’, ‘anti-catholic’, and ‘anti-black’

hatenew <- hatelong |>
  filter( victim_cat %in% c("anti-black", "anti-jewish", "anti-catholic"))|>
  group_by(year, county) |>
  arrange(desc(crimecount))
hatenew
# A tibble: 1,269 × 4
# Groups:   year, county [277]
   county   year victim_cat  crimecount
   <chr>   <dbl> <chr>            <dbl>
 1 Kings    2012 anti-jewish         82
 2 Kings    2016 anti-jewish         51
 3 Suffolk  2014 anti-jewish         48
 4 Suffolk  2012 anti-jewish         48
 5 Kings    2011 anti-jewish         44
 6 Kings    2013 anti-jewish         41
 7 Kings    2010 anti-jewish         39
 8 Nassau   2011 anti-jewish         38
 9 Suffolk  2013 anti-jewish         37
10 Nassau   2016 anti-jewish         36
# ℹ 1,259 more rows

#let’s plot these 3 types of hatecrimes together

plot2 <- hatenew |>
  ggplot() +
  geom_bar(aes(x=year, y=crimecount, fill = victim_cat),
      position = "dodge", stat = "identity") +
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime Incidents",
       title = "Hate Crime Type in NY Counties Between 2010-2016",
       caption = "Source: NY State Division of Criminal Justice Services")
plot2

#let’s evaluate the couties

plot3 <- hatenew |>
  ggplot() +
  geom_bar(aes(x=county, y=crimecount, fill = victim_cat),
      position = "dodge", stat = "identity") +
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime Incidents",
       title = "Hate Crime Type in NY Counties Between 2010-2016",
       caption = "Source: NY State Division of Criminal Justice Services")
plot3

#let’s reduce the number of counties

county <- hatenew |>
  group_by(year, county)|>
  summarize(sum = sum(crimecount)) |>
  arrange(desc(sum))
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
county
# A tibble: 277 × 3
# Groups:   year [7]
    year county    sum
   <dbl> <chr>   <dbl>
 1  2012 Kings     130
 2  2010 Kings      90
 3  2016 Kings      88
 4  2012 Suffolk    87
 5  2014 Kings      75
 6  2013 Kings      72
 7  2015 Kings      71
 8  2011 Kings      69
 9  2013 Suffolk    69
10  2014 Suffolk    66
# ℹ 267 more rows

#let’s choose the top 5 county

counties2 <- hatenew |>
  group_by(county)|>
  summarize(sum = sum(crimecount)) |>
  slice_max(order_by = sum, n=5)
counties2
# A tibble: 5 × 2
  county     sum
  <chr>    <dbl>
1 Kings      595
2 Suffolk    342
3 Nassau     289
4 New York   278
5 Queens     187

#let’s create a barplot for the 5 top county

 plot4 <- hatenew |>
  filter(county %in% c("Kings", "Suffolk", "Nassau","New York", "Queens")) |>
  ggplot() +
  geom_bar(aes(x=county, y=crimecount, fill = victim_cat),
      position = "dodge", stat = "identity") +
  labs(y = "Number of Hate Crime Incidents",
       title = "5 Counties in NY with Highest Incidents of Hate Crimes",
       subtitle = "Between 2010-2016", 
       fill = "Hate Crime Type",
      caption = "Source: NY State Division of Criminal Justice Services")
plot4

  setwd("C:/Users/satad/Desktop/data110")
nypop <- read_csv("newyorkpopulation.csv")
Rows: 62 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Geography
dbl (7): 2010, 2011, 2012, 2013, 2014, 2015, 2016

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Rename the variable “Geography” as “county” so that it matches in the other dataset.

nypop$Geography <- gsub(" , New York", "", nypop$Geography)
nypop$Geography <- gsub("County", "", nypop$Geography)
nypoplong <- nypop |>
  rename(county = Geography) |>
  gather("year", "population", 2:8) 
nypoplong$year <- as.double(nypoplong$year)
head(nypoplong)
# A tibble: 6 × 3
  county                  year population
  <chr>                  <dbl>      <dbl>
1 Albany , New York       2010     304078
2 Allegany , New York     2010      48949
3 Bronx , New York        2010    1388240
4 Broome , New York       2010     200469
5 Cattaraugus , New York  2010      80249
6 Cayuga , New York       2010      79844

#Let’s focus on 2014

nypoplong12 <- nypoplong |>
  filter(year == 2014) |>
  arrange(desc(population)) |>
  head(10)
nypoplong12$county<-gsub(" , New York","",nypoplong12$county)
nypoplong12
# A tibble: 10 × 3
   county       year population
   <chr>       <dbl>      <dbl>
 1 Kings        2014    2612544
 2 Queens       2014    2314149
 3 New York     2014    1634468
 4 Suffolk      2014    1500008
 5 Bronx        2014    1437687
 6 Nassau       2014    1357799
 7 Westchester  2014     970255
 8 Erie         2014     923702
 9 Monroe       2014     750089
10 Richmond     2014     473142

#According to this result, we can see that over the year the population of kings county grow more and more. Even if there is a lot of hate crimes, people still come and live in that area.

#let’s filter hatecrimes on 2014 only.

county14 <- county |>
  filter(year == 2014) |>
  arrange(desc(sum)) 
county14
# A tibble: 36 × 3
# Groups:   year [1]
    year county     sum
   <dbl> <chr>    <dbl>
 1  2014 Kings       75
 2  2014 Suffolk     66
 3  2014 Nassau      36
 4  2014 New York    29
 5  2014 Queens      24
 6  2014 Richmond    24
 7  2014 Multiple    17
 8  2014 Erie        13
 9  2014 Bronx       12
10  2014 Orange       7
# ℹ 26 more rows

#let’s join the nY population and hatecrimes of 2014 together

datajoin <- county14 |>
  full_join(nypoplong12, by=c("county", "year"))
datajoin
# A tibble: 36 × 4
# Groups:   year [1]
    year county     sum population
   <dbl> <chr>    <dbl>      <dbl>
 1  2014 Kings       75    2612544
 2  2014 Suffolk     66    1500008
 3  2014 Nassau      36    1357799
 4  2014 New York    29    1634468
 5  2014 Queens      24    2314149
 6  2014 Richmond    24     473142
 7  2014 Multiple    17         NA
 8  2014 Erie        13     923702
 9  2014 Bronx       12    1437687
10  2014 Orange       7         NA
# ℹ 26 more rows

#calculate the rate of incidents per 100000

datajoinrate <- datajoin |>
  mutate(rate = sum/population*100000) |>
  arrange(desc(rate))
datajoinrate
# A tibble: 36 × 5
# Groups:   year [1]
    year county        sum population  rate
   <dbl> <chr>       <dbl>      <dbl> <dbl>
 1  2014 Richmond       24     473142 5.07 
 2  2014 Suffolk        66    1500008 4.40 
 3  2014 Kings          75    2612544 2.87 
 4  2014 Nassau         36    1357799 2.65 
 5  2014 New York       29    1634468 1.77 
 6  2014 Erie           13     923702 1.41 
 7  2014 Queens         24    2314149 1.04 
 8  2014 Bronx          12    1437687 0.835
 9  2014 Westchester     7     970255 0.721
10  2014 Monroe          0     750089 0    
# ℹ 26 more rows

#let’s see the highest rate in 2014

dt <- datajoinrate[,c("county","rate")]
dt
# A tibble: 36 × 2
   county       rate
   <chr>       <dbl>
 1 Richmond    5.07 
 2 Suffolk     4.40 
 3 Kings       2.87 
 4 Nassau      2.65 
 5 New York    1.77 
 6 Erie        1.41 
 7 Queens      1.04 
 8 Bronx       0.835
 9 Westchester 0.721
10 Monroe      0    
# ℹ 26 more rows

#Even if there is more hatecrimes in kings, we can literally see that the highest rate in 2014 was at richmond .

#essay

#This dataset is great because it brings together various years, giving us a chance to learn more. It helps us pinpoint the most popular counties in New York, which is useful information. As a data scientist, I can use this dataset to understand how hate crimes are affecting our community.However, there is some negatives aspects like every dataset. Indeed,gthis dataset has a lot of county that we can all study. I have to choose some of the county that I only work on . I decided to choose the most popular in order to evaluate it. #hypothetically, I want it to explore the county that do not have a lot of crime and also explore only one county and see incidents hate crime in there,For example, it will right to look at the hatecrimes at kings only , so I can see what hate crime type in gwenerate in there. #after seeing the output from the hatecrimes, I want to know if the hatecrimes dataset on 2023 is the still the same one or there is changement. I want to know if the actual New York is more dangereous than the one from 2010 Finally, The second things that I would like to follow up is if the purcentage of anti-Jewish got down between 2016 to 2023.