So now we know that there is possible bias in the dataset, what can we do with it?

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

#tinytex::install_tinytex()
library(tinytex)

setwd("D:/data/files")
hatecrimes <- read_csv("hateCrimes2010.csv")

## Rows: 423 Columns: 44
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): County, Crime Type
## dbl (42): Year, Anti-Male, Anti-Female, Anti-Transgender, Anti-Gender Identi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean up the data

names(hatecrimes) <- tolower(names(hatecrimes))
names(hatecrimes) <- gsub(" ","",names(hatecrimes))
str(hatecrimes)

## spc_tbl_ [423 × 44] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ county                                  : chr [1:423] "Albany" "Albany" "Allegany" "Bronx" ...
##  $ year                                    : num [1:423] 2016 2016 2016 2016 2016 ...
##  $ crimetype                               : chr [1:423] "Crimes Against Persons" "Property Crimes" "Property Crimes" "Crimes Against Persons" ...
##  $ anti-male                               : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-female                             : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-transgender                        : num [1:423] 0 0 0 4 0 0 0 0 0 0 ...
##  $ anti-genderidentityexpression           : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-age*                               : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-white                              : num [1:423] 0 0 0 1 1 0 0 0 0 0 ...
##  $ anti-black                              : num [1:423] 1 2 1 0 0 1 0 1 0 2 ...
##  $ anti-americanindian/alaskannative       : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-asian                              : num [1:423] 0 0 0 0 0 1 0 0 0 0 ...
##  $ anti-nativehawaiian/pacificislander     : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-multi-racialgroups                 : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-otherrace                          : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-jewish                             : num [1:423] 0 0 0 0 1 0 1 0 0 0 ...
##  $ anti-catholic                           : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-protestant                         : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-islamic(muslim)                    : num [1:423] 1 0 0 6 0 0 0 0 1 0 ...
##  $ anti-multi-religiousgroups              : num [1:423] 0 1 0 0 0 0 0 0 0 0 ...
##  $ anti-atheism/agnosticism                : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-religiouspracticegenerally         : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-otherreligion                      : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-buddhist                           : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-easternorthodox(greek,russian,etc.): num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-hindu                              : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-jehovahswitness                    : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-mormon                             : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-otherchristian                     : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-sikh                               : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-hispanic                           : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-arab                               : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-otherethnicity/nationalorigin      : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-non-hispanic*                      : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-gaymale                            : num [1:423] 1 0 0 8 0 1 0 0 0 0 ...
##  $ anti-gayfemale                          : num [1:423] 0 0 0 1 0 0 0 0 0 0 ...
##  $ anti-gay(maleandfemale)                 : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-heterosexual                       : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-bisexual                           : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-physicaldisability                 : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-mentaldisability                   : num [1:423] 0 0 0 0 0 0 0 0 0 0 ...
##  $ totalincidents                          : num [1:423] 3 3 1 20 2 3 1 1 1 2 ...
##  $ totalvictims                            : num [1:423] 4 3 1 20 2 3 1 1 1 2 ...
##  $ totaloffenders                          : num [1:423] 3 3 1 25 2 3 1 1 1 2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   County = col_character(),
##   ..   Year = col_double(),
##   ..   `Crime Type` = col_character(),
##   ..   `Anti-Male` = col_double(),
##   ..   `Anti-Female` = col_double(),
##   ..   `Anti-Transgender` = col_double(),
##   ..   `Anti-Gender Identity Expression` = col_double(),
##   ..   `Anti-Age*` = col_double(),
##   ..   `Anti-White` = col_double(),
##   ..   `Anti-Black` = col_double(),
##   ..   `Anti-American Indian/Alaskan Native` = col_double(),
##   ..   `Anti-Asian` = col_double(),
##   ..   `Anti-Native Hawaiian/Pacific Islander` = col_double(),
##   ..   `Anti-Multi-Racial Groups` = col_double(),
##   ..   `Anti-Other Race` = col_double(),
##   ..   `Anti-Jewish` = col_double(),
##   ..   `Anti-Catholic` = col_double(),
##   ..   `Anti-Protestant` = col_double(),
##   ..   `Anti-Islamic (Muslim)` = col_double(),
##   ..   `Anti-Multi-Religious Groups` = col_double(),
##   ..   `Anti-Atheism/Agnosticism` = col_double(),
##   ..   `Anti-Religious Practice Generally` = col_double(),
##   ..   `Anti-Other Religion` = col_double(),
##   ..   `Anti-Buddhist` = col_double(),
##   ..   `Anti-Eastern Orthodox (Greek, Russian, etc.)` = col_double(),
##   ..   `Anti-Hindu` = col_double(),
##   ..   `Anti-Jehovahs Witness` = col_double(),
##   ..   `Anti-Mormon` = col_double(),
##   ..   `Anti-Other Christian` = col_double(),
##   ..   `Anti-Sikh` = col_double(),
##   ..   `Anti-Hispanic` = col_double(),
##   ..   `Anti-Arab` = col_double(),
##   ..   `Anti-Other Ethnicity/National Origin` = col_double(),
##   ..   `Anti-Non-Hispanic*` = col_double(),
##   ..   `Anti-Gay Male` = col_double(),
##   ..   `Anti-Gay Female` = col_double(),
##   ..   `Anti-Gay (Male and Female)` = col_double(),
##   ..   `Anti-Heterosexual` = col_double(),
##   ..   `Anti-Bisexual` = col_double(),
##   ..   `Anti-Physical Disability` = col_double(),
##   ..   `Anti-Mental Disability` = col_double(),
##   ..   `Total Incidents` = col_double(),
##   ..   `Total Victims` = col_double(),
##   ..   `Total Offenders` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Select only certain hate-crimes

hatecrimes2 <- hatecrimes %>% 
  select(county, year, 'anti-black', 'anti-white', 'anti-jewish', 'anti-catholic','anti-age*','anti-islamic(muslim)', 'anti-gaymale', 'anti-hispanic') %>%
  group_by(county, year)
head(hatecrimes2)

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	anti-black <dbl>	anti-white <dbl>	anti-jewish <dbl>
Albany	2016	1	0	0
Albany	2016	2	0	0
Allegany	2016	1	0	0
Bronx	2016	0	1	0
Bronx	2016	0	1	1
Broome	2016	1	0	0

Check the dimensions and the summary to make sure no missing values

dim(hatecrimes2)

## [1] 423  10

# There are currently 13 variables with 423 rows.
summary(hatecrimes2)

##     county               year        anti-black       anti-white     
##  Length:423         Min.   :2010   Min.   : 0.000   Min.   : 0.0000  
##  Class :character   1st Qu.:2011   1st Qu.: 0.000   1st Qu.: 0.0000  
##  Mode  :character   Median :2013   Median : 1.000   Median : 0.0000  
##                     Mean   :2013   Mean   : 1.761   Mean   : 0.3357  
##                     3rd Qu.:2015   3rd Qu.: 2.000   3rd Qu.: 0.0000  
##                     Max.   :2016   Max.   :18.000   Max.   :11.0000  
##   anti-jewish     anti-catholic       anti-age*       anti-islamic(muslim)
##  Min.   : 0.000   Min.   : 0.0000   Min.   :0.00000   Min.   : 0.0000     
##  1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.:0.00000   1st Qu.: 0.0000     
##  Median : 0.000   Median : 0.0000   Median :0.00000   Median : 0.0000     
##  Mean   : 3.981   Mean   : 0.2695   Mean   :0.05201   Mean   : 0.4704     
##  3rd Qu.: 3.000   3rd Qu.: 0.0000   3rd Qu.:0.00000   3rd Qu.: 0.0000     
##  Max.   :82.000   Max.   :12.0000   Max.   :9.00000   Max.   :10.0000     
##   anti-gaymale    anti-hispanic    
##  Min.   : 0.000   Min.   : 0.0000  
##  1st Qu.: 0.000   1st Qu.: 0.0000  
##  Median : 0.000   Median : 0.0000  
##  Mean   : 1.499   Mean   : 0.3735  
##  3rd Qu.: 1.000   3rd Qu.: 0.0000  
##  Max.   :36.000   Max.   :17.0000

Use Facet_Wrap

hatecrimeslong <- hatecrimes2 %>% 
  tidyr::gather("id", "crimecount", 3:10) 

hatecrimesplot <-hatecrimeslong %>% 
  ggplot(., aes(year, crimecount))+
  geom_point()+
  aes(color = id)+
  facet_wrap(~id)
hatecrimesplot

Look deeper into crimes against blacks, gay males, and jews

hatenew <- hatecrimeslong %>%
  filter( id== "anti-black" | id == "anti-jewish" | id == "anti-gaymale")%>%
  group_by(year, county) %>%
  arrange(desc(crimecount))
hatenew

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	id <chr>	crimecount <dbl>
Kings	2012	anti-jewish	82
Kings	2016	anti-jewish	51
Suffolk	2014	anti-jewish	48
Suffolk	2012	anti-jewish	48
Kings	2011	anti-jewish	44
Kings	2013	anti-jewish	41
Kings	2010	anti-jewish	39
Nassau	2011	anti-jewish	38
Suffolk	2013	anti-jewish	37
Nassau	2016	anti-jewish	36

Plot these three types of hate crimes together

plot2 <- hatenew %>%
  ggplot() +
  geom_bar(aes(x=year, y=crimecount, fill = id),
      position = "dodge", stat = "identity") +
  ggtitle("Hate Crime Type in NY Counties Between 2010-2016") +
  ylab("Number of Hate Crime Incidents") + 
  labs(fill = "Hate Crime Type")
plot2

What about the counties?

plot3 <- hatenew %>%
  ggplot() +
  geom_bar(aes(x=county, y=crimecount, fill = id),
      position = "dodge", stat = "identity") +
  ggtitle("Hate Crime Type in NY Counties Between 2010-2016") +
  ylab("Number of Hate Crime Incidents") + 
  labs(fill = "Hate Crime Type")
plot3

So many counties

counties <- hatenew %>%
  group_by(county, year)%>%
  summarize(sum = sum(crimecount)) %>%
  arrange(desc(sum))

## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.

counties

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	sum <dbl>
Kings	2012	136
Kings	2010	110
Kings	2016	101
Kings	2013	96
Kings	2014	94
Kings	2015	90
Kings	2011	86
New York	2016	86
Suffolk	2012	83
New York	2013	75

plot4 <- hatenew %>%
  filter(county =="Kings" | county =="New York" | county == "Suffolk" | county == "Nassau" | county == "Queens") %>%
  ggplot() +
  geom_bar(aes(x=county, y=crimecount, fill = id),
      position = "dodge", stat = "identity") +
  labs(ylab = "Number of Hate Crime Incidents",
    title = "5 Counties in NY with Highest Incidents of Hate Crimes",
    subtitle = "Between 2010-2016", 
    fill = "Hate Crime Type")
plot4

How would calculations be affected by looking at hate crimes in counties per year by population densities?

nypop <- read_csv("newyorkpopulation.csv")

## Rows: 62 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Geography
## dbl (7): 2010, 2011, 2012, 2013, 2014, 2015, 2016
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean the county name to match the other dataset

nypop$Geography <- gsub(" , New York", "", nypop$Geography)
nypop$Geography <- gsub("County", "", nypop$Geography)
nypoplong <- nypop %>%
  rename(county = Geography) %>%
  gather("year", "population", 2:8) 
nypoplong$year <- as.double(nypoplong$year)
head(nypoplong)

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	population <dbl>
Albany , New York	2010	304078
Allegany , New York	2010	48949
Bronx , New York	2010	1388240
Broome , New York	2010	200469
Cattaraugus , New York	2010	80249
Cayuga , New York	2010	79844

Focus on 2012

nypoplong12 <- nypoplong %>%
  filter(year == 2012) %>%
  arrange(desc(population)) %>%
  head(10)
nypoplong12$county<-gsub(" , New York","",nypoplong12$county)
nypoplong12

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	population <dbl>
Kings	2012	2572282
Queens	2012	2278024
New York	2012	1625121
Suffolk	2012	1499382
Bronx	2012	1414774
Nassau	2012	1350748
Westchester	2012	961073
Erie	2012	920792
Monroe	2012	748947
Richmond	2012	470978

Filter hate crimes just for 2012 as well

counties12 <- counties %>%
  filter(year == 2012) %>%
  arrange(desc(sum)) 
counties12

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	sum <dbl>
Kings	2012	136
Suffolk	2012	83
New York	2012	71
Nassau	2012	48
Queens	2012	48
Erie	2012	28
Bronx	2012	23
Richmond	2012	18
Multiple	2012	14
Westchester	2012	13

Join the Hate Crimes data with NY population data for 2012

datajoin <- counties12 %>%
  full_join(nypoplong12, by=c("county", "year"))
datajoin

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	sum <dbl>	population <dbl>
Kings	2012	136	2572282
Suffolk	2012	83	1499382
New York	2012	71	1625121
Nassau	2012	48	1350748
Queens	2012	48	2278024
Erie	2012	28	920792
Bronx	2012	23	1414774
Richmond	2012	18	470978
Multiple	2012	14	NA
Westchester	2012	13	961073

Calculate the rate of incidents per 100,000. Then arrange in descending order

datajoinrate <- datajoin %>%
  mutate(rate = sum/population*100000) %>%
  arrange(desc(rate))
datajoinrate

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	sum <dbl>	population <dbl>	rate <dbl>
Suffolk	2012	83	1499382	5.535614
Kings	2012	136	2572282	5.287134
New York	2012	71	1625121	4.368905
Richmond	2012	18	470978	3.821835
Nassau	2012	48	1350748	3.553587
Erie	2012	28	920792	3.040860
Queens	2012	48	2278024	2.107089
Bronx	2012	23	1414774	1.625701
Westchester	2012	13	961073	1.352655
Monroe	2012	5	748947	0.667604

dt <- datajoinrate[,c("county","rate")]
dt

ABCDEFGHIJ0123456789

county <chr>	rate <dbl>
Suffolk	5.535614
Kings	5.287134
New York	4.368905
Richmond	3.821835
Nassau	3.553587
Erie	3.040860
Queens	2.107089
Bronx	1.625701
Westchester	1.352655
Monroe	0.667604

Follow Up

Aggregating some of the categories

aggregategroups <- hatecrimes %>%
  tidyr::gather("id", "crimecount", 4:44) 
unique(aggregategroups$id)

##  [1] "anti-male"                               
##  [2] "anti-female"                             
##  [3] "anti-transgender"                        
##  [4] "anti-genderidentityexpression"           
##  [5] "anti-age*"                               
##  [6] "anti-white"                              
##  [7] "anti-black"                              
##  [8] "anti-americanindian/alaskannative"       
##  [9] "anti-asian"                              
## [10] "anti-nativehawaiian/pacificislander"     
## [11] "anti-multi-racialgroups"                 
## [12] "anti-otherrace"                          
## [13] "anti-jewish"                             
## [14] "anti-catholic"                           
## [15] "anti-protestant"                         
## [16] "anti-islamic(muslim)"                    
## [17] "anti-multi-religiousgroups"              
## [18] "anti-atheism/agnosticism"                
## [19] "anti-religiouspracticegenerally"         
## [20] "anti-otherreligion"                      
## [21] "anti-buddhist"                           
## [22] "anti-easternorthodox(greek,russian,etc.)"
## [23] "anti-hindu"                              
## [24] "anti-jehovahswitness"                    
## [25] "anti-mormon"                             
## [26] "anti-otherchristian"                     
## [27] "anti-sikh"                               
## [28] "anti-hispanic"                           
## [29] "anti-arab"                               
## [30] "anti-otherethnicity/nationalorigin"      
## [31] "anti-non-hispanic*"                      
## [32] "anti-gaymale"                            
## [33] "anti-gayfemale"                          
## [34] "anti-gay(maleandfemale)"                 
## [35] "anti-heterosexual"                       
## [36] "anti-bisexual"                           
## [37] "anti-physicaldisability"                 
## [38] "anti-mentaldisability"                   
## [39] "totalincidents"                          
## [40] "totalvictims"                            
## [41] "totaloffenders"

aggregategroups <- aggregategroups %>%
  mutate(group = case_when(
    id %in% c("anti-transgender", "anti-gayfemale", "anti-genderidendityexpression", "anti-gaymale", "anti-gay(maleandfemale", "anti-bisexual") ~ "anti-lgbtq",
    id %in% c("anti-multi-racialgroups", "anti-jewish", "anti-protestant", "anti-multi-religousgroups", "anti-religiouspracticegenerally", "anti-buddhist", "anti-hindu", "anti-mormon", "anti-sikh", "anti-catholic", "anti-islamic(muslim)", "anti-atheism/agnosticism", "anti-otherreligion", "anti-easternorthodox(greek,russian,etc.)", "anti-jehovahswitness", "anti-otherchristian") ~ "anti-religion", 
    id %in% c("anti-asian", "anti-arab", "anti-non-hispanic", "anti-white", "anti-americanindian/alaskannative", "anti-nativehawaiian/pacificislander", "anti-otherrace", "anti-hispanic", "anti-otherethnicity/nationalorigin") ~ "anti-ethnicity",
    id %in% c("anti-physicaldisability", "anti-mentaldisability") ~ "anti-disability",
    id %in% c("anti-female", "anti-male") ~ "anti-gender",
    TRUE ~ "others"))
aggregategroups

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	crimetype <chr>	id <chr>	group <chr>
Albany	2016	Crimes Against Persons	anti-male	anti-gender
Albany	2016	Property Crimes	anti-male	anti-gender
Allegany	2016	Property Crimes	anti-male	anti-gender
Bronx	2016	Crimes Against Persons	anti-male	anti-gender
Bronx	2016	Property Crimes	anti-male	anti-gender
Broome	2016	Crimes Against Persons	anti-male	anti-gender
Cayuga	2016	Property Crimes	anti-male	anti-gender
Chemung	2016	Crimes Against Persons	anti-male	anti-gender
Chemung	2016	Property Crimes	anti-male	anti-gender
Chenango	2016	Crimes Against Persons	anti-male	anti-gender

or create subset with just lgbtq

lgbtq <- hatecrimes %>%
   tidyr::gather("id", "crimecount", 4:44) %>%
  filter(id %in% c("anti-transgender", "anti-gayfemale", "anti-genderidendityexpression", "anti-gaymale", "anti-gay(maleandfemale", "anti-bisexual"))
lgbtq

ABCDEFGHIJ0123456789

county <chr>	year <dbl>	crimetype <chr>	id <chr>	crimecount <dbl>
Albany	2016	Crimes Against Persons	anti-transgender	0
Albany	2016	Property Crimes	anti-transgender	0
Allegany	2016	Property Crimes	anti-transgender	0
Bronx	2016	Crimes Against Persons	anti-transgender	4
Bronx	2016	Property Crimes	anti-transgender	0
Broome	2016	Crimes Against Persons	anti-transgender	0
Cayuga	2016	Property Crimes	anti-transgender	0
Chemung	2016	Crimes Against Persons	anti-transgender	0
Chemung	2016	Property Crimes	anti-transgender	0
Chenango	2016	Crimes Against Persons	anti-transgender	0

Eassy

Based on the preliminary analysis and understanding of the entire dataset and the results of the above R output, I think the strengths and weaknesses of the dataset are roughly as follows.First of all, from the data point of view, the data volume of this dataset is sufficient and contains a lot of information, which has high value of data mining. However, on the other hand, the dataset also has more redundant data, and the purity of information is not high enough, so it needs to spend more data operation means to refine and normalize the data. In terms of the practical significance behind the data, the social and criminal issues that the data set focuses on have strong social guiding significance. By analyzing the data set, we can help judge the occurrence of hate crime from multiple perspectives, such as geographical factors and gender factors, so as to better solve the continuing deterioration of the problem.In addition, the data set contains a large number of dummy variables, which is not conducive to the construction of the linear model, and is easy to cause problems such as the collinearity of the linear model.

Further, if I analyze the data set, I may conduct further analysis from the following two possible ways. First of all, the first way is to collect more granular data from regions such as King, so as to further explore the reasons for the differences in hate crimes in different regions and the differences of hate crimes in different regions. Secondly, it is necessary to analyze in more detail the changes of hate crimes in a wider range of time, as well as the nodes of important time changes, so as to analyze whether the introduction of policies such as the hate crimes act has a lasting and profound impact on hate crimes in time.

As for the above output results, I think the two important aspects that can continue are the reasons for the differences between hate crimes against Jews and hate crimes against LGBT groups. As can be seen from the above results, hate crimes against black groups have gradually decreased with the development of time, but hate crimes against Jews have a slight upward trend, so whether there is any hidden reason behind this cause is worth exploring. On the other hand, with the gradual increase of social tolerance, the tolerance of LGBT groups has also increased significantly. What groups are the main perpetrators of hate crimes against LGBT groups and what are the reasons for them to commit hate crimes? These are all worthy of further exploration.

HateCrimes

So now we know that there is possible bias in the dataset, what can we do with it?

Clean up the data

Select only certain hate-crimes

Check the dimensions and the summary to make sure no missing values

Use Facet_Wrap

Look deeper into crimes against blacks, gay males, and jews

Plot these three types of hate crimes together

What about the counties?

So many counties

How would calculations be affected by looking at hate crimes in counties per year by population densities?

Clean the county name to match the other dataset

Focus on 2012

Filter hate crimes just for 2012 as well

Join the Hate Crimes data with NY population data for 2012

Calculate the rate of incidents per 100,000. Then arrange in descending order

Follow Up

Aggregating some of the categories

or create subset with just lgbtq

Eassy