# load data
hate_url<- "https://raw.githubusercontent.com/fivethirtyeight/data/master/hate-crimes/hate_crimes.csv"
hate_url <-read.csv(hate_url)
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Does unemployment drive hate crimes as well? Is there a relationship between unemployment rate and hate crimes?
What are the cases, and how many are there?
Each case represents a state in the united states. There 51 observations in the given data set, only 47 was taken because NA value was excluded.
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(extraoperators)
library(psych)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
#step1 pull the useful column
hate_url <- hate_url %>%
select("state","median_household_income","share_unemployed_seasonal","hate_crimes_per_100k_splc","avg_hatecrimes_per_100k_fbi")
#step2 exclude the NA
hate_url <- hate_url[complete.cases(hate_url),]
#step3 added a new column to combine both case from FBI and Southern Poverty Law Center number
hate_url$hate_crimes_combine <- hate_url$hate_crimes_per_100k_splc+hate_url$avg_hatecrimes_per_100k_fbi
#step4 get the Median which is 0.05200
summary(hate_url)
## state median_household_income share_unemployed_seasonal
## Length:47 Min. :35521 Min. :0.02900
## Class :character 1st Qu.:47630 1st Qu.:0.04350
## Mode :character Median :54310 Median :0.05200
## Mean :54802 Mean :0.05087
## 3rd Qu.:60598 3rd Qu.:0.05800
## Max. :76165 Max. :0.07300
## hate_crimes_per_100k_splc avg_hatecrimes_per_100k_fbi hate_crimes_combine
## Min. :0.06745 Min. : 0.412 Min. : 0.5324
## 1st Qu.:0.14271 1st Qu.: 1.304 1st Qu.: 1.4788
## Median :0.22620 Median : 1.937 Median : 2.2272
## Mean :0.30409 Mean : 2.342 Mean : 2.6460
## 3rd Qu.:0.35693 3rd Qu.: 3.119 3rd Qu.: 3.4408
## Max. :1.52230 Max. :10.953 Max. :12.4758
#step5 define high and low unemployed rate by Median
hate_url$high_unemployed <-hate_url$share_unemployed_seasonal %g% 0.05200
Describe the method of data collection.
The data are from FBI and Southern Poverty Law Center.
The FBI Uniform Crime Reporting Program collects hate crime data from law enforcement agencies. the UCR Program collects data on only prosecutable hate crimes, which make up a fraction of hate incidents (which includes non-prosecutable offenses, such as circulation of white nationalist recruitment materials on college campuses).
The Southern Poverty Law Center uses media accounts and people’s self-reports to assess the situation.
What type of study is this (observational/experiment)?
It is observational
If you collected the data, state self-collected. If not, provide a citation/link.
https://github.com/fivethirtyeight/data/tree/master/hate-crimes
What is the response variable? Is it quantitative or qualitative?
the response variable is hate_crimes_combine, it is numeric.
You should have two independent variables, one quantitative and one qualitative.
high_unemployed as qualitative, and share_unemployed_seasonal as quantitative.
If the share_unemployed_seasonal is higher than median, then the reply is true, else is false.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
table(hate_url$high_unemployed, useNA='ifany')
##
## FALSE TRUE
## 27 20
describe(hate_url$hate_crimes_combine)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 47 2.65 1.9 2.23 2.43 1.46 0.53 12.48 11.94 2.93 12.61 0.28
describe(hate_url$share_unemployed_seasonal)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 47 0.05 0.01 0.05 0.05 0.01 0.03 0.07 0.04 0.03 -0.71 0
describeBy(hate_url$hate_crimes_combine,
group = hate_url$high_unemployed, mat=TRUE)
## item group1 vars n mean sd median trimmed mad min
## X11 1 FALSE 1 27 2.638884 1.262283 2.257538 2.585261 1.545220 0.8855916
## X12 2 TRUE 1 20 2.655671 2.560312 2.120228 2.182719 1.046404 0.5324321
## max range skew kurtosis se
## X11 5.43271 4.547118 0.341475 -0.9504367 0.2429265
## X12 12.47578 11.943349 2.753245 7.9135465 0.5725033
ggplot(hate_url, aes(x=hate_crimes_combine)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(hate_url, aes(x=share_unemployed_seasonal)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.