Research Question
In the project proposal i intend to use the dataset from fivethirtyeight called “hate_crimes”. The dataset is described below. The research question i would like to answer is are their any significant relationships between hatecrimes in the US to other parameters in the dataset such as unemployment, median household income, race, etc.

library(fivethirtyeight)
library(DT)
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
colnames((hate_crimes))
##  [1] "state"                       "state_abbrev"               
##  [3] "median_house_inc"            "share_unemp_seas"           
##  [5] "share_pop_metro"             "share_pop_hs"               
##  [7] "share_non_citizen"           "share_white_poverty"        
##  [9] "gini_index"                  "share_non_white"            
## [11] "share_vote_trump"            "hate_crimes_per_100k_splc"  
## [13] "avg_hatecrimes_per_100k_fbi"

What are the cases and how many are there?

length(hate_crimes$avg_hatecrimes_per_100k_fbi)
## [1] 51

there are about 51 cases , 1 in each of the US states and the district of columbia

Describe the method of data collection
the method of data collection is simply using a dataset from the fivethirtyeight package in R which summarizes mean hate crime by state


datatable(hate_crimes)

What type of study is this?
This is an observational study since we are not creating an experiment to test any hypothesis. We are taking data that has been produced and collected over time and using this for our downstream analyses.

Data source: FiveThiryEight
https://github.com/fivethirtyeight/data

Response: what is the response variable, and what type is it (numerical/categorical)?
The response variable for this is numerical and is the variable hatecrime per 100k
The explanatory variables for this dataset are median house income, white poverty, non sitizen, population and presidential vote.

summary statistics for the dataset are shown below:

library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
describe(hate_crimes)
## hate_crimes 
## 
##  13  Variables      51  Observations
## --------------------------------------------------------------------------------
## state 
##        n  missing distinct 
##       51        0       51 
## 
## lowest : Alabama       Alaska        Arizona       Arkansas      California   
## highest: Virginia      Washington    West Virginia Wisconsin     Wyoming      
## --------------------------------------------------------------------------------
## state_abbrev 
##        n  missing distinct 
##       51        0       51 
## 
## lowest : AK AL AR AZ CA, highest: VT WA WI WV WY
## --------------------------------------------------------------------------------
## median_house_inc 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       51        1    55224    10575    42342    43716 
##      .25      .50      .75      .90      .95 
##    48657    54916    60719    67629    70692 
## 
## lowest : 35521 39552 42278 42406 42786, highest: 68277 70161 71223 73397 76165
## --------------------------------------------------------------------------------
## share_unemp_seas 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       32    0.999  0.04957  0.01235   0.0340   0.0360 
##      .25      .50      .75      .90      .95 
##   0.0420   0.0510   0.0575   0.0630   0.0670 
## 
## lowest : 0.028 0.029 0.034 0.035 0.036, highest: 0.063 0.064 0.067 0.068 0.073
## --------------------------------------------------------------------------------
## share_pop_metro 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       31    0.998   0.7502   0.2059    0.400    0.510 
##      .25      .50      .75      .90      .95 
##    0.630    0.790    0.895    0.970    0.985 
## 
## lowest : 0.31 0.34 0.35 0.45 0.50, highest: 0.92 0.94 0.96 0.97 1.00
## --------------------------------------------------------------------------------
## share_pop_hs 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       40        1   0.8691  0.03925   0.8115   0.8220 
##      .25      .50      .75      .90      .95 
##   0.8405   0.8740   0.8980   0.9100   0.9140 
## 
## lowest : 0.799 0.804 0.806 0.817 0.821, highest: 0.910 0.913 0.914 0.915 0.918
## --------------------------------------------------------------------------------
## share_non_citizen 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        3       12    0.985  0.05458  0.03516   0.0135   0.0200 
##      .25      .50      .75      .90      .95 
##   0.0300   0.0450   0.0800   0.1000   0.1100 
## 
## lowest : 0.01 0.02 0.03 0.04 0.05, highest: 0.08 0.09 0.10 0.11 0.13
##                                                                             
## Value       0.01  0.02  0.03  0.04  0.05  0.06  0.07  0.08  0.09  0.10  0.11
## Frequency      3     4     9     8     4     4     2     5     2     3     3
## Proportion 0.062 0.083 0.188 0.167 0.083 0.083 0.042 0.104 0.042 0.062 0.062
##                 
## Value       0.13
## Frequency      1
## Proportion 0.021
## --------------------------------------------------------------------------------
## share_white_poverty 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       12     0.98  0.09176  0.02729    0.060    0.060 
##      .25      .50      .75      .90      .95 
##    0.075    0.090    0.100    0.120    0.135 
## 
## lowest : 0.04 0.05 0.06 0.07 0.08, highest: 0.11 0.12 0.13 0.14 0.17
##                                                                             
## Value       0.04  0.05  0.06  0.07  0.08  0.09  0.10  0.11  0.12  0.13  0.14
## Frequency      1     1     4     7     7    11     8     3     5     1     2
## Proportion 0.020 0.020 0.078 0.137 0.137 0.216 0.157 0.059 0.098 0.020 0.039
##                 
## Value       0.17
## Frequency      1
## Proportion 0.020
## --------------------------------------------------------------------------------
## gini_index 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       39    0.999   0.4538  0.02278   0.4240   0.4300 
##      .25      .50      .75      .90      .95 
##   0.4400   0.4540   0.4665   0.4740   0.4805 
## 
## lowest : 0.419 0.422 0.423 0.425 0.427, highest: 0.474 0.475 0.486 0.499 0.532
## --------------------------------------------------------------------------------
## share_non_white 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       34    0.999   0.3157    0.186    0.090    0.150 
##      .25      .50      .75      .90      .95 
##    0.195    0.280    0.420    0.500    0.615 
## 
## lowest : 0.06 0.07 0.09 0.10 0.15, highest: 0.56 0.61 0.62 0.63 0.81
## --------------------------------------------------------------------------------
## share_vote_trump 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       33    0.999     0.49   0.1303    0.330    0.350 
##      .25      .50      .75      .90      .95 
##    0.415    0.490    0.575    0.630    0.645 
## 
## lowest : 0.04 0.30 0.33 0.34 0.35, highest: 0.63 0.64 0.65 0.69 0.70
## --------------------------------------------------------------------------------
## hate_crimes_per_100k_splc 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       47        4       47        1   0.3041   0.2355  0.08343  0.10790 
##      .25      .50      .75      .90      .95 
##  0.14271  0.22620  0.35693  0.62034  0.66348 
## 
## lowest : 0.06744680 0.06906077 0.07830591 0.09540164 0.10515247
## highest: 0.62747993 0.63081059 0.67748765 0.83284961 1.52230172
## --------------------------------------------------------------------------------
## avg_hatecrimes_per_100k_fbi 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       50        1       50        1    2.368    1.671   0.4896   0.6905 
##      .25      .50      .75      .90      .95 
##   1.2931   1.9871   3.1843   3.8568   4.5935 
## 
## lowest :  0.2669408  0.4120118  0.4309276  0.5613956  0.6227460
## highest:  4.2078896  4.4132026  4.7410699  4.8018993 10.9534797
## --------------------------------------------------------------------------------