The dataset that I chose was from the article “Higher Rates Of Hate Crimes Are Tied To Income Inequality”. My subset of data is what what are the average stats by State for education, income and those who voted for Trump.
Here is the article: https://fivethirtyeight.com/features/higher-rates-of-hate-crimes-are-tied-to-income-inequality/
HateCrimes <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/hate-crimes/hate_crimes.csv",header = TRUE, sep = ",")
HateCrimes <- subset(HateCrimes, select = c(1,2,5,10,12))
colnames(HateCrimes) <- c("State","Income","HSDegree","VotedTrump","AvgHateCrimes")
head(HateCrimes)
## State Income HSDegree VotedTrump AvgHateCrimes
## 1 Alabama 42278 0.821 0.63 1.8064105
## 2 Alaska 67629 0.914 0.53 1.6567001
## 3 Arizona 49254 0.842 0.50 3.4139280
## 4 Arkansas 44922 0.824 0.60 0.8692089
## 5 California 60487 0.806 0.33 2.3979859
## 6 Colorado 60940 0.893 0.44 2.8046888
I would like to see more data including news items over time. In order to see if there is a pattern in the data directly affected by news stories.
summary(HateCrimes)
## State Income HSDegree VotedTrump
## Alabama : 1 Min. :35521 Min. :0.7990 Min. :0.040
## Alaska : 1 1st Qu.:48657 1st Qu.:0.8405 1st Qu.:0.415
## Arizona : 1 Median :54916 Median :0.8740 Median :0.490
## Arkansas : 1 Mean :55224 Mean :0.8691 Mean :0.490
## California: 1 3rd Qu.:60719 3rd Qu.:0.8980 3rd Qu.:0.575
## Colorado : 1 Max. :76165 Max. :0.9180 Max. :0.700
## (Other) :45
## AvgHateCrimes
## Min. : 0.2669
## 1st Qu.: 1.2931
## Median : 1.9871
## Mean : 2.3676
## 3rd Qu.: 3.1843
## Max. :10.9535
## NA's :1