Research Question
In the project proposal i intend to use the dataset from fivethirtyeight called “hate_crimes”. The dataset is described below. The research question i would like to answer is are their any significant relationships between hatecrimes in the US to other parameters in the dataset such as unemployment, median household income, race, etc.
library(fivethirtyeight)
library(DT)
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
colnames((hate_crimes))
## [1] "state" "state_abbrev"
## [3] "median_house_inc" "share_unemp_seas"
## [5] "share_pop_metro" "share_pop_hs"
## [7] "share_non_citizen" "share_white_poverty"
## [9] "gini_index" "share_non_white"
## [11] "share_vote_trump" "hate_crimes_per_100k_splc"
## [13] "avg_hatecrimes_per_100k_fbi"
What are the cases and how many are there?
length(hate_crimes$avg_hatecrimes_per_100k_fbi)
## [1] 51
there are about 51 cases , 1 in each of the US states and the district of columbia
Describe the method of data collection
the method of data collection is simply using a dataset from the fivethirtyeight package in R which summarizes mean hate crime by state
datatable(hate_crimes)
What type of study is this?
This is an observational study since we are not creating an experiment to test any hypothesis. We are taking data that has been produced and collected over time and using this for our downstream analyses.
Data source: FiveThiryEight
https://github.com/fivethirtyeight/data
Response: what is the response variable, and what type is it (numerical/categorical)?
The response variable for this is numerical and is the variable hatecrime per 100k
The explanatory variables for this dataset are median house income, white poverty, non sitizen, population and presidential vote.
summary statistics for the dataset are shown below:
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
describe(hate_crimes)
## hate_crimes
##
## 13 Variables 51 Observations
## --------------------------------------------------------------------------------
## state
## n missing distinct
## 51 0 51
##
## lowest : Alabama Alaska Arizona Arkansas California
## highest: Virginia Washington West Virginia Wisconsin Wyoming
## --------------------------------------------------------------------------------
## state_abbrev
## n missing distinct
## 51 0 51
##
## lowest : AK AL AR AZ CA, highest: VT WA WI WV WY
## --------------------------------------------------------------------------------
## median_house_inc
## n missing distinct Info Mean Gmd .05 .10
## 51 0 51 1 55224 10575 42342 43716
## .25 .50 .75 .90 .95
## 48657 54916 60719 67629 70692
##
## lowest : 35521 39552 42278 42406 42786, highest: 68277 70161 71223 73397 76165
## --------------------------------------------------------------------------------
## share_unemp_seas
## n missing distinct Info Mean Gmd .05 .10
## 51 0 32 0.999 0.04957 0.01235 0.0340 0.0360
## .25 .50 .75 .90 .95
## 0.0420 0.0510 0.0575 0.0630 0.0670
##
## lowest : 0.028 0.029 0.034 0.035 0.036, highest: 0.063 0.064 0.067 0.068 0.073
## --------------------------------------------------------------------------------
## share_pop_metro
## n missing distinct Info Mean Gmd .05 .10
## 51 0 31 0.998 0.7502 0.2059 0.400 0.510
## .25 .50 .75 .90 .95
## 0.630 0.790 0.895 0.970 0.985
##
## lowest : 0.31 0.34 0.35 0.45 0.50, highest: 0.92 0.94 0.96 0.97 1.00
## --------------------------------------------------------------------------------
## share_pop_hs
## n missing distinct Info Mean Gmd .05 .10
## 51 0 40 1 0.8691 0.03925 0.8115 0.8220
## .25 .50 .75 .90 .95
## 0.8405 0.8740 0.8980 0.9100 0.9140
##
## lowest : 0.799 0.804 0.806 0.817 0.821, highest: 0.910 0.913 0.914 0.915 0.918
## --------------------------------------------------------------------------------
## share_non_citizen
## n missing distinct Info Mean Gmd .05 .10
## 48 3 12 0.985 0.05458 0.03516 0.0135 0.0200
## .25 .50 .75 .90 .95
## 0.0300 0.0450 0.0800 0.1000 0.1100
##
## lowest : 0.01 0.02 0.03 0.04 0.05, highest: 0.08 0.09 0.10 0.11 0.13
##
## Value 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
## Frequency 3 4 9 8 4 4 2 5 2 3 3
## Proportion 0.062 0.083 0.188 0.167 0.083 0.083 0.042 0.104 0.042 0.062 0.062
##
## Value 0.13
## Frequency 1
## Proportion 0.021
## --------------------------------------------------------------------------------
## share_white_poverty
## n missing distinct Info Mean Gmd .05 .10
## 51 0 12 0.98 0.09176 0.02729 0.060 0.060
## .25 .50 .75 .90 .95
## 0.075 0.090 0.100 0.120 0.135
##
## lowest : 0.04 0.05 0.06 0.07 0.08, highest: 0.11 0.12 0.13 0.14 0.17
##
## Value 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
## Frequency 1 1 4 7 7 11 8 3 5 1 2
## Proportion 0.020 0.020 0.078 0.137 0.137 0.216 0.157 0.059 0.098 0.020 0.039
##
## Value 0.17
## Frequency 1
## Proportion 0.020
## --------------------------------------------------------------------------------
## gini_index
## n missing distinct Info Mean Gmd .05 .10
## 51 0 39 0.999 0.4538 0.02278 0.4240 0.4300
## .25 .50 .75 .90 .95
## 0.4400 0.4540 0.4665 0.4740 0.4805
##
## lowest : 0.419 0.422 0.423 0.425 0.427, highest: 0.474 0.475 0.486 0.499 0.532
## --------------------------------------------------------------------------------
## share_non_white
## n missing distinct Info Mean Gmd .05 .10
## 51 0 34 0.999 0.3157 0.186 0.090 0.150
## .25 .50 .75 .90 .95
## 0.195 0.280 0.420 0.500 0.615
##
## lowest : 0.06 0.07 0.09 0.10 0.15, highest: 0.56 0.61 0.62 0.63 0.81
## --------------------------------------------------------------------------------
## share_vote_trump
## n missing distinct Info Mean Gmd .05 .10
## 51 0 33 0.999 0.49 0.1303 0.330 0.350
## .25 .50 .75 .90 .95
## 0.415 0.490 0.575 0.630 0.645
##
## lowest : 0.04 0.30 0.33 0.34 0.35, highest: 0.63 0.64 0.65 0.69 0.70
## --------------------------------------------------------------------------------
## hate_crimes_per_100k_splc
## n missing distinct Info Mean Gmd .05 .10
## 47 4 47 1 0.3041 0.2355 0.08343 0.10790
## .25 .50 .75 .90 .95
## 0.14271 0.22620 0.35693 0.62034 0.66348
##
## lowest : 0.06744680 0.06906077 0.07830591 0.09540164 0.10515247
## highest: 0.62747993 0.63081059 0.67748765 0.83284961 1.52230172
## --------------------------------------------------------------------------------
## avg_hatecrimes_per_100k_fbi
## n missing distinct Info Mean Gmd .05 .10
## 50 1 50 1 2.368 1.671 0.4896 0.6905
## .25 .50 .75 .90 .95
## 1.2931 1.9871 3.1843 3.8568 4.5935
##
## lowest : 0.2669408 0.4120118 0.4309276 0.5613956 0.6227460
## highest: 4.2078896 4.4132026 4.7410699 4.8018993 10.9534797
## --------------------------------------------------------------------------------