Part 1 - Introduction

Research Question

In the project proposal I intend to use the dataset from fivethirtyeight called “hate_crimes”. The dataset is described below. The research question i would like to answer is are their any significant relationships between hatecrimes in the US to other parameters in the dataset such as unemployment, median household income, race, etc.

A “hate crime” is defined as a crime that is based on a particular bias or prejudice. Several cases may be: African-Americans being policed differently than others potentially in fatal ways or, crimes against muslims or other islamic religious groups because of their faith, or crimes against those with specific sexual orientations.
This is an important question because in recent years there has been an uptick in recorded hate crimes. It is unclear if crime rates are increasing over time or if there is just more light being shed upon these situations due to the abundance of technology in today’s society vs previous decades. Nonetheless, it is important to understand the relationship between hate crimes and other potential factors to be to take measures both politically and socially to mitigate the issue.

Part 2 - Data

library(fivethirtyeight)
library(DT)
library(GGally)
library(Hmisc)
library(tidyverse)
library(knitr)
library(RColorBrewer)
library(broom)
colnames((hate_crimes))

##  [1] "state"                       "state_abbrev"               
##  [3] "median_house_inc"            "share_unemp_seas"           
##  [5] "share_pop_metro"             "share_pop_hs"               
##  [7] "share_non_citizen"           "share_white_poverty"        
##  [9] "gini_index"                  "share_non_white"            
## [11] "share_vote_trump"            "hate_crimes_per_100k_splc"  
## [13] "avg_hatecrimes_per_100k_fbi"

hatecrimes<-hate_crimes

there are about 51 cases in this dataset is hatecrimes and has 2 metrics in this dataset:
hate_crimes_per_100k_splc - This represents the hate crimes per every 100,000 people
avg_hatecrimes_per_100k_fbi - This represents Average annual hate crimes per every 100,000 people

It’s important to note that these aggregated observations and the data is not granular to where provides each individual hate crime as an observation and information about it.

The variables that are of interest will be most of the other variables in the dataset, these will be used as our predictor variables while hate crimes will be our response variable:
median_house_inc - Median Household income for the year of 2016
share_unemp_season - Share of the population that is unemployed
share_pop_metro - share of population that lives in a metropolitan area for the year of 2015
share_non_citizen - Share of the population that are not U.S. Citizens as of 2015
share_white_poverty - Share of white residents who live in poverty for 2015
gini_index - a measure of the distribution of income across income percentiles in a population
share_non_white - Share of the population that is not white for 2015
share_vote_trump - Share of 2016 U.S. presidential voters who voted for Donald Trump

This is an observational study because we are collecting historical data evaluating our hypothesis based on that. There will be no experimental design with placebo control groups and experimental groups. The scope of our inference will be generalized to the US population since this data provides a sample that is representative of every individual state. Because this is not a randomized control trial, we will not use these data to infer causality.

Part 3 - Exploratory data analysis

The following data table below allows the user to look through the raw data set from fivethirtyeights.

datatable(hate_crimes)

Visualizing the dataset and Conditions for inference

To do some exploratory analysis, We will employ tools that help us understand the distribution of our data, this includes providing summary stats accross all of the columns and We have a wide version of the dataset but we will create a long version as well for ease of looking at different parameters in our dataset.

Cycle through the tabs below to view the distribution and normality of our parameters in the dataset.

View Histogram of Data

The histograms below show all of our potential predictor variables as well as our response variable (avg hate crimes). Most of the predictor variables follow a normal or close to normal distribution. there is some skewness in a few datasets due to some of the outliers such as in gini_index, avghatecrimes_per100k_fbi, and hate_crimes_per_100k_splc there are outliers that cause the data to seem right skewed. similarly outliers in share_vote_trump cause a left skewness in that distribution.

We can infer from this that our linear models will:

1) Have near normal residuals
2) have constant variability

We will check for linearity and variablity around the residuals plot during downstream analysis

hatecrimes_long<-hatecrimes %>% 
  pivot_longer(cols = 3:length(hatecrimes), names_to = "Parameter")


hatecrimes_long %>% ggplot(mapping = aes(x = value, fill = Parameter))+
  geom_histogram(alpha = 0.4)+
  facet_wrap(Parameter~.,scales = 'free', ncol = 3)+
  #geom_density(fill = NA, linetype = 2, na.rm=T)+
  theme(panel.background = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA),
        panel.grid = element_blank(),
        legend.position = "top",
        legend.title = element_blank(),
        strip.background =element_blank())

View Density Plot of Data

The density plots below similarly show all of the parameters but with a smoothed curve instead of a histogram to better see the skewness, peaks, and distributions of our parameters.Most of the predictor variables follow a normal or close to normal distribution. As stated in the histogram tab. See the QQ_plot tab to see how far these parameters deviated from the gaussian distribution.

We can infer from this that our linear models will:

1) Have near normal residuals
2) have constant variability

We will check for linearity and variablity around the residuals plot during downstream analysis

hatecrimes_long %>% ggplot(mapping = aes(x = value, fill = Parameter))+
  #geom_histogram(alpha = 0.4)+
  facet_wrap(Parameter~.,scales = 'free', ncol = 3)+
  geom_density(fill = NA, linetype = 2, na.rm=T)+
  theme(panel.background = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA),
        panel.grid = element_blank(),
        legend.position = "top",
        legend.title = element_blank(),
        strip.background =element_blank())

View QQPlot Plot of Data

The QQ plots below similarly show all of the parameters but include the normal distribution line overlaid with the data to show where and how far the parameters deviate from the mean of our parameters.Most of the predictor variables follow a normal or close to normal distribution. As stated in the histogram tab. See the QQ_plot tab to see how far these parameters deviated from the gaussian distribution. For the analyses downstream, we will assume that these parameters satisfy the conditions for normality on our regressions.

We can infer from this that our linear models will:

1) Have near normal residuals
2) have constant variability

We will check for linearity and variablity around the residuals plot during downstream analysis

hatecrimes_long %>% ggplot(mapping = aes(sample = value))+
  #geom_histogram(alpha = 0.4)+
  stat_qq()+stat_qq_line()+
  facet_wrap(Parameter~.,scales = 'free', ncol = 3)+
  #geom_density(fill = NA, linetype = 2, na.rm=T)+
  theme(panel.background = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA),
        panel.grid = element_blank(),
        legend.position = "top",
        legend.title = element_blank(),
        strip.background =element_blank())

Summary Statistics for Dataset

summary statistics for the dataset are shown below:

describe(hatecrimes)

## hatecrimes 
## 
##  13  Variables      51  Observations
## --------------------------------------------------------------------------------
## state 
##        n  missing distinct 
##       51        0       51 
## 
## lowest : Alabama       Alaska        Arizona       Arkansas      California   
## highest: Virginia      Washington    West Virginia Wisconsin     Wyoming      
## --------------------------------------------------------------------------------
## state_abbrev 
##        n  missing distinct 
##       51        0       51 
## 
## lowest : AK AL AR AZ CA, highest: VT WA WI WV WY
## --------------------------------------------------------------------------------
## median_house_inc 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       51        1    55224    10575    42342    43716 
##      .25      .50      .75      .90      .95 
##    48657    54916    60719    67629    70692 
## 
## lowest : 35521 39552 42278 42406 42786, highest: 68277 70161 71223 73397 76165
## --------------------------------------------------------------------------------
## share_unemp_seas 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       32    0.999  0.04957  0.01235   0.0340   0.0360 
##      .25      .50      .75      .90      .95 
##   0.0420   0.0510   0.0575   0.0630   0.0670 
## 
## lowest : 0.028 0.029 0.034 0.035 0.036, highest: 0.063 0.064 0.067 0.068 0.073
## --------------------------------------------------------------------------------
## share_pop_metro 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       31    0.998   0.7502   0.2059    0.400    0.510 
##      .25      .50      .75      .90      .95 
##    0.630    0.790    0.895    0.970    0.985 
## 
## lowest : 0.31 0.34 0.35 0.45 0.50, highest: 0.92 0.94 0.96 0.97 1.00
## --------------------------------------------------------------------------------
## share_pop_hs 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       40        1   0.8691  0.03925   0.8115   0.8220 
##      .25      .50      .75      .90      .95 
##   0.8405   0.8740   0.8980   0.9100   0.9140 
## 
## lowest : 0.799 0.804 0.806 0.817 0.821, highest: 0.910 0.913 0.914 0.915 0.918
## --------------------------------------------------------------------------------
## share_non_citizen 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        3       12    0.985  0.05458  0.03516   0.0135   0.0200 
##      .25      .50      .75      .90      .95 
##   0.0300   0.0450   0.0800   0.1000   0.1100 
## 
## lowest : 0.01 0.02 0.03 0.04 0.05, highest: 0.08 0.09 0.10 0.11 0.13
##                                                                             
## Value       0.01  0.02  0.03  0.04  0.05  0.06  0.07  0.08  0.09  0.10  0.11
## Frequency      3     4     9     8     4     4     2     5     2     3     3
## Proportion 0.062 0.083 0.188 0.167 0.083 0.083 0.042 0.104 0.042 0.062 0.062
##                 
## Value       0.13
## Frequency      1
## Proportion 0.021
## --------------------------------------------------------------------------------
## share_white_poverty 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       12     0.98  0.09176  0.02729    0.060    0.060 
##      .25      .50      .75      .90      .95 
##    0.075    0.090    0.100    0.120    0.135 
## 
## lowest : 0.04 0.05 0.06 0.07 0.08, highest: 0.11 0.12 0.13 0.14 0.17
##                                                                             
## Value       0.04  0.05  0.06  0.07  0.08  0.09  0.10  0.11  0.12  0.13  0.14
## Frequency      1     1     4     7     7    11     8     3     5     1     2
## Proportion 0.020 0.020 0.078 0.137 0.137 0.216 0.157 0.059 0.098 0.020 0.039
##                 
## Value       0.17
## Frequency      1
## Proportion 0.020
## --------------------------------------------------------------------------------
## gini_index 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       39    0.999   0.4538  0.02278   0.4240   0.4300 
##      .25      .50      .75      .90      .95 
##   0.4400   0.4540   0.4665   0.4740   0.4805 
## 
## lowest : 0.419 0.422 0.423 0.425 0.427, highest: 0.474 0.475 0.486 0.499 0.532
## --------------------------------------------------------------------------------
## share_non_white 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       34    0.999   0.3157    0.186    0.090    0.150 
##      .25      .50      .75      .90      .95 
##    0.195    0.280    0.420    0.500    0.615 
## 
## lowest : 0.06 0.07 0.09 0.10 0.15, highest: 0.56 0.61 0.62 0.63 0.81
## --------------------------------------------------------------------------------
## share_vote_trump 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       51        0       33    0.999     0.49   0.1303    0.330    0.350 
##      .25      .50      .75      .90      .95 
##    0.415    0.490    0.575    0.630    0.645 
## 
## lowest : 0.04 0.30 0.33 0.34 0.35, highest: 0.63 0.64 0.65 0.69 0.70
## --------------------------------------------------------------------------------
## hate_crimes_per_100k_splc 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       47        4       47        1   0.3041   0.2355  0.08343  0.10790 
##      .25      .50      .75      .90      .95 
##  0.14271  0.22620  0.35693  0.62034  0.66348 
## 
## lowest : 0.06744680 0.06906077 0.07830591 0.09540164 0.10515247
## highest: 0.62747993 0.63081059 0.67748765 0.83284961 1.52230172
## --------------------------------------------------------------------------------
## avg_hatecrimes_per_100k_fbi 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       50        1       50        1    2.368    1.671   0.4896   0.6905 
##      .25      .50      .75      .90      .95 
##   1.2931   1.9871   3.1843   3.8568   4.5935 
## 
## lowest :  0.2669408  0.4120118  0.4309276  0.5613956  0.6227460
## highest:  4.2078896  4.4132026  4.7410699  4.8018993 10.9534797
## --------------------------------------------------------------------------------

Removing the one from out response vaoutlier:

hatecrimes<-hatecrimes %>% 
  filter(state!= "District of Columbia")

Part 4 - Inference And Analysis

responsevariable<- unique(hatecrimes_long$Parameter)[10]

predictorvariables<-unique(hatecrimes_long$Parameter)[1:9]

plottheme<-theme(panel.background = element_blank(),
          panel.border = element_blank(),
          panel.grid = element_blank())

hatecrimeslm<-function(data,x,y){
  linearM<-lm(formula(paste(y,"~",x)), data)
  
  residual<-residuals(linearM)
  intercept<-round(linearM$coefficients[[1]],2)
  slope<-round(linearM$coefficients[[2]],4)
  adjr2<-round(summary(linearM)$r.squared,2)
    #as.character(as.expression(eq)))
  p_value<-round(summary(linearM)$coefficients[,4][[2]],3)
  
  
  p1<-ggplot(data = data,mapping = aes_string(x, y))+
    geom_point(pch = 21, color = "black", fill ="skyblue",alpha = 0.7,size =3 )+
    geom_smooth(method = "lm")+
    plottheme+
    labs(subtitle = paste0("Y = ", intercept,"+",slope,"x",
                          "\nR^2 = ", adjr2,
                          "\nP-Value = ", p_value),
         title = "Linear Model Plot")
  
  resplot<- augment(linearM)
  
  p2<-ggplot(resplot,aes(x = .fitted, y = .resid))+
    geom_point(pch = 21, color = "black", fill ="skyblue",alpha = 0.7,size =3 )+
    geom_segment(aes(x = .fitted,
                     xend =.fitted,
                     y = .resid,
                     yend =0),
                 linetype = 2,
                 color = "red")+
    geom_hline(yintercept = 0)+
    plottheme+
    labs(title = "Residuals Plot")
  
  print(p1)
  print(p2)
  print(summary(linearM))
}

Linear Regression

Median Household Income

hatecrimeslm(hatecrimes,predictorvariables[1],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25922 -0.10473 -0.02883  0.07081  0.53087 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      -2.668e-02  1.553e-01  -0.172   0.8643  
## median_house_inc  5.582e-06  2.810e-06   1.987   0.0532 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1722 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.08232,    Adjusted R-squared:  0.06147 
## F-statistic: 3.947 on 1 and 44 DF,  p-value: 0.0532

Null Hypothesis: There is no relationship between median household income and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between median household income and hatecrimes/100k people

Based on the above, we do notice that there is a slight trend but a weak correlation coeffient of 0.08. the P_value is also greater than 0.05 indicating that there is insufficient evidence to reject the null hypothesis and so we cannot conclude dependence.

Share of Unemployed

hatecrimeslm(hatecrimes,predictorvariables[2],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20504 -0.13690 -0.04921  0.07256  0.58248 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        0.3975     0.1387   2.865  0.00637 **
## share_unemp_seas  -2.3729     2.6966  -0.880  0.38367   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1782 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.01729,    Adjusted R-squared:  -0.005041 
## F-statistic: 0.7743 on 1 and 44 DF,  p-value: 0.3837

Null Hypothesis: There is no relationship between share of unemployed and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between share of unemployed income and hatecrimes/100k people

Based on the above, there is no visible trend and expectedly a weak correlation coeffient of 0.02. the P_value is also much greater than 0.05 indicating that there is insufficient evidence to reject the null hypothesis and so we cannot conclude dependence.

Share of Population in Metro-area

hatecrimeslm(hatecrimes,predictorvariables[3],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20663 -0.13658 -0.05263  0.06568  0.55253 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      0.25794    0.12514   2.061   0.0452 *
## share_pop_metro  0.02572    0.15992   0.161   0.8730  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1797 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.0005875,  Adjusted R-squared:  -0.02213 
## F-statistic: 0.02587 on 1 and 44 DF,  p-value: 0.873

Null Hypothesis: There is no relationship between share of population living in a metro area and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between share of population living in a metro area and hatecrimes/100k people

Based on the above, There is no visible trend and expectedly a weak correlation coeffient of 0. the P_value is also much greater than 0.05 indicating that there is insufficient evidence to reject the null hypothesis and so we cannot conclude dependence.

Share Population with HS degree

hatecrimeslm(hatecrimes,predictorvariables[4],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24092 -0.10631 -0.01158  0.09437  0.49999 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -1.6294     0.6190  -2.632  0.01166 * 
## share_pop_hs   2.2023     0.7144   3.083  0.00353 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.163 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.1776, Adjusted R-squared:  0.159 
## F-statistic: 9.505 on 1 and 44 DF,  p-value: 0.003532

Null Hypothesis: There is no relationship between share of population with a HS degree and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between share of population with a HS degree and hatecrimes/100k people

Based on the above, we do notice that there is a slight trend but a weak correlation coeffient of 0.18. the P_value is also less than 0.05 indicating that we do have sufficient evidence to support the r^2 of 0.18, although not very strong.

Share Population Non-Citizens

hatecrimeslm(hatecrimes,predictorvariables[5],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21881 -0.12712 -0.05224  0.06825  0.55177 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.25302    0.05362   4.719 2.64e-05 ***
## share_non_citizen  0.40084    0.86619   0.463    0.646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1729 on 42 degrees of freedom
##   (6 observations deleted due to missingness)
## Multiple R-squared:  0.005073,   Adjusted R-squared:  -0.01862 
## F-statistic: 0.2141 on 1 and 42 DF,  p-value: 0.6459

Null Hypothesis: There is no relationship between share of population who are non-citizens and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between share of population who are non-citizens and hatecrimes/100k people

Based on the above, There is no visible trend and expectedly a weak correlation coeffient of 0.01. the P_value is also much greater than 0.05 indicating that there is insufficient evidence to reject the null hypothesis and so we cannot conclude dependence.

Share of Population with White Poverty

hatecrimeslm(hatecrimes,predictorvariables[6],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21628 -0.14116 -0.05286  0.07532  0.55976 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           0.3447     0.1052   3.277  0.00205 **
## share_white_poverty  -0.7163     1.0870  -0.659  0.51337   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1789 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.009772,   Adjusted R-squared:  -0.01273 
## F-statistic: 0.4342 on 1 and 44 DF,  p-value: 0.5134

Null Hypothesis: There is no relationship between share of population with white poverty and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between share of population with white poverty and hatecrimes/100k people

Based on the above, There is no visible trend and expectedly a weak correlation coeffient of 0.01. the P_value is also much greater than 0.05 indicating that there is insufficient evidence to reject the null hypothesis and so we cannot conclude dependence.

Gini Index

hatecrimeslm(hatecrimes,predictorvariables[7],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20381 -0.14320 -0.04881  0.08871  0.54960 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.8009     0.6917   1.158    0.253
## gini_index   -1.1528     1.5227  -0.757    0.453
## 
## Residual standard error: 0.1786 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.01286,    Adjusted R-squared:  -0.009577 
## F-statistic: 0.5731 on 1 and 44 DF,  p-value: 0.4531

Null Hypothesis: There is no relationship between the gini index and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between the gini index and hatecrimes/100k people

Based on the above, There is no visible trend and expectedly a weak correlation coeffient of 0.01. the P_value is also much greater than 0.05 indicating that there is insufficient evidence to reject the null hypothesis and so we cannot conclude dependence.

Share Population Non-White

hatecrimeslm(hatecrimes,predictorvariables[8],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.22449 -0.12993 -0.03772  0.10778  0.53929 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.37907    0.06078   6.236 1.52e-07 ***
## share_non_white -0.32890    0.17881  -1.839   0.0726 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1732 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.07141,    Adjusted R-squared:  0.0503 
## F-statistic: 3.383 on 1 and 44 DF,  p-value: 0.07261

Null Hypothesis: There is no relationship between share of population that is non-white and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between share of population that is non-white and hatecrimes/100k people

Based on the above, There is a very slight visible trend and a weak correlation coeffient of 0.07. the P_value is also slightly greater than 0.05 indicating that there is insufficient evidence to reject the null hypothesis and so we cannot conclude dependence.

Share Population Voted Trump

hatecrimeslm(hatecrimes,predictorvariables[9],responsevariable)

## 
## Call:
## lm(formula = formula(paste(y, "~", x)), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25833 -0.10270 -0.03169  0.04589  0.48816 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.6750     0.1325   5.093 7.08e-06 ***
## share_vote_trump  -0.8057     0.2642  -3.049  0.00387 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1633 on 44 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.1745, Adjusted R-squared:  0.1557 
## F-statistic: 9.299 on 1 and 44 DF,  p-value: 0.003873

Null Hypothesis: There is no relationship between share of population that voted for Trump and Hatecrimes/100k people
Alternate Hypothesis: There is a relationship between share of population that voted for Trump and hatecrimes/100k people

Based on the above, There is a visible trend and expectedly correlation coeffient of 0.17. the P_value is also less than 0.05 indicating that there is sufficient evidence to reject the null hypothesis in favor of supporting the alternative but with a coefficient strength for the model of 0.17.

Part 5 - Conclusion

In conclusion, our strongest relationship when looking at this from bi-variate analysis standpoint, (only one response and one predictor variable), obtain is 0.18 indicating that as the share of the population with only a HS degree increases, the number of hate crimes per 100k of people tend to increase. Similarly, and interestingly.. we obtain our second strongest coefficient of 0.17 infering that share of proportions increase for those who voted for Trump within a state, the lower the hate crimes. This to me implies states that have more partisan divisiveness have higher rates of crimes. so all red, or all blue states will likely have lower hate crime rates.
Both regression coefficients had P<0.05 and are statistically significant, the conditions for inference were also met with linearity, normal (after outlier removal), and scattered residuals.

References:

Openintro Statistics, Fourth Edition, David Diez

Data606 Project

Joshua Registe

3/29/2020

Part 1 - Introduction

Part 2 - Data

Part 3 - Exploratory data analysis

Visualizing the dataset and Conditions for inference

View Histogram of Data

View Density Plot of Data

View QQPlot Plot of Data

Summary Statistics for Dataset

Part 4 - Inference And Analysis

Linear Regression

Median Household Income

Gini Index

Part 5 - Conclusion

References: