Siddharth Vangari s3894864
Last updated: 27 May, 2023
An act of hatred against a group of people can be called a hate crime if the perpetrator, the victim, or anyone else is motivated by hostility towards them or their views
There are five strands of hate crimes that the police force focuses on: racial or ethnic identity crimes, religious hatred crimes, sexual orientation crimes, disability crimes, and transphobia crimes.
The police can improve their response to these crimes by collecting data on these crimes
The report will analyze only crimes based on racism, despite the fact that the data covers five types of hate crimes.
As opposed to focusing on hate crimes as a whole, I will address hate crimes based on race and religion.
Through a two-sample t-test, I will examine whether race and religion are statistically significantly different in terms of most crimes committed.
A comparison of the motivating factors Race and Religion will be performed on the Number_of_offenses variable.
Missing values will be removed using boxplot visualisations and outliers will be found using boxplot visualisations
Visual checks for normality will be conducted using QQ plots
This data set is available here: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1023568/prc-hate-crime-open-data-121021.ods
It involves variables:
Financial Year: [2011/12, 2012/13, 2013/14, 2014/15, 2015/16, 2016/17, 2017/18, 2018/19 ,2019/20, 2020/21] (FACTOR)
Force Name: English and Welsh police forces [Avon and Somerset, Bedfordshire, British Transport Police, Cambridgeshire, …] (FACTOR)
Motivating Factor: The type of hate crime committed [Disability, Race, Religion, Sexual orientation, Transgender identity] (FACTOR)
Number of Offences: Hate crimes reported (NUMERIC)
Using the read.csv() function, the data set was imported into R. Using the head() function, the data set was organized.
Hatecrime <- read.csv("C:/Users/vanga/OneDrive/Desktop/rmit/Sem 3/Applied Analytics/Final Assignment/hate_crime_data.csv")
knitr::kable(head(Hatecrime, 10))
Financial.Year | Force.Name | Motivating.factor | Number.of.offences |
---|---|---|---|
2011/12 | Avon and Somerset | Disability | 113 |
2011/12 | Bedfordshire | Disability | 6 |
2011/12 | British Transport Police | Disability | 25 |
2011/12 | Cambridgeshire | Disability | 6 |
2011/12 | Cheshire | Disability | 7 |
2011/12 | Cleveland | Disability | 15 |
2011/12 | Cumbria | Disability | 17 |
2011/12 | Derbyshire | Disability | 12 |
2011/12 | Devon and Cornwall | Disability | 7 |
2011/12 | Dorset | Disability | 9 |
Financial_Year | Force_Name | Motivating_factor | Number_of_offences | |
---|---|---|---|---|
45 | 2011/12 | Avon and Somerset | Race | 1241 |
46 | 2011/12 | Bedfordshire | Race | 266 |
47 | 2011/12 | British Transport Police | Race | 1349 |
48 | 2011/12 | Cambridgeshire | Race | 338 |
49 | 2011/12 | Cheshire | Race | 293 |
50 | 2011/12 | Cleveland | Race | 307 |
51 | 2011/12 | Cumbria | Race | 194 |
52 | 2011/12 | Derbyshire | Race | 440 |
53 | 2011/12 | Devon and Cornwall | Race | 737 |
54 | 2011/12 | Dorset | Race | 226 |
Compared to the average number of crimes motivated by religion, the average number of crimes motivated by race is higher.
In this case, the data is positively skewed, because both means exceed the medians.
Hatecrimefiltered %>% group_by(Motivating_factor) %>% summarise(Min = min(Number_of_offences,na.rm = TRUE),
Q1 = quantile(Number_of_offences,probs = .25,na.rm = TRUE),
Median = median(Number_of_offences, na.rm = TRUE),
Q3 = round(quantile(Number_of_offences,probs = .75,na.rm = TRUE),1),
Max = max(Number_of_offences,na.rm = TRUE),
Mean = round(mean(Number_of_offences, na.rm = TRUE),1),
SD = round(sd(Number_of_offences, na.rm = TRUE),1),
n = n(),
Missing = sum(is.na(Number_of_offences))) -> crime_by_Motivating_Factor
knitr::kable(crime_by_Motivating_Factor)
Motivating_factor | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
---|---|---|---|---|---|---|---|---|---|
Race | 51 | 418.5 | 744 | 1437.0 | 21938 | 1433.9 | 2435.3 | 483 | 0 |
Religion | 0 | 19.0 | 43 | 105.5 | 2506 | 120.0 | 290.4 | 483 | 0 |
# Create histogram plots
plot_Race <- ggplot(Hatecrimefiltered[Hatecrimefiltered$Motivating_factor == "Race", ], aes(x = Number_of_offences)) +
geom_histogram(fill = "blue", color = "white", bins = 10) +
labs(x = "Number of Offences", y = "Frequency") +
ggtitle("Histogram of Crimes Due to Race")
plot_Religion <- ggplot(Hatecrimefiltered[Hatecrimefiltered$Motivating_factor == "Religion", ], aes(x = Number_of_offences)) +
geom_histogram(fill = "red", color = "white", bins = 10) +
labs(x = "Number of Offences", y = "Frequency") +
ggtitle("Histogram of Crimes Due to Religion")
# Arrange the histograms side by side
grid.arrange(plot_Race, plot_Religion, nrow = 1)
# Calculate the quartiles and IQR
Q1 <- quantile(Hatecrimefiltered$Number_of_offences, 0.25)
Q3 <- quantile(Hatecrimefiltered$Number_of_offences, 0.75)
IQR <- Q3 - Q1
# Define the lower and upper bounds for outliers
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
# Remove outliers from the data
HatecrimeCleaned <- Hatecrimefiltered[Hatecrimefiltered$Number_of_offences >= lower_bound & Hatecrimefiltered$Number_of_offences <= upper_bound, ]
# Print the filtered data
head(HatecrimeCleaned)
is.outlier = function(x){(x < summary(x)[2] - 1.5*IQR(x))|(x > summary(x)[5] + 1.5*IQR(x))}
sum(is.outlier(HatecrimeCleaned$Number_of_offences))
## [1] 45
boxplot(Number_of_offences~Motivating_factor, data = HatecrimeCleaned, xlab = "Motivating Factor",
ylab = "Number of Offences", main = "Number Of Offences Motivated By Race To Religion", col=c("blue", "pink"))
H0: There is no statistically significant difference between the average number of offences motivated by Race and Religion.
\[H_0: \mu_1 = \mu_2 \] HA: There is a statistically significant difference between the average number of offences motivated by Race and religion.
\[H_A: \mu_1 \ne \mu_2\]
# Filter the data for offences motivated by race
raceData <- HatecrimeCleaned$Number_of_offences[HatecrimeCleaned$Motivating_factor == "Race"]
qqnorm(raceData)
qqline(raceData)
# Filter the data for offences motivated by race
religionData <- HatecrimeCleaned$Number_of_offences[HatecrimeCleaned$Motivating_factor == "Religion"]
qqnorm(religionData)
qqline(religionData)
# Levene's Test of Equal Variance
leveneTest(Number_of_offences~Motivating_factor, data = HatecrimeCleaned) %>% as.data.frame()
Based on Levene’s equal variance test, p = 2.512278e-63 when comparing the number of crimes motivated by race and religion.
Statistical significance was determined since the p value was less than 0.05. The variance cannot be assumed to be equal since the p value was less than 0.05.
t.test(
Number_of_offences~Motivating_factor,
data = HatecrimeCleaned,
var.equal = FALSE,
alternative = "two.sided"
)
##
## Welch Two Sample t-test
##
## data: Number_of_offences by Motivating_factor
## t = 25.194, df = 512.64, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Race and group Religion is not equal to 0
## 95 percent confidence interval:
## 574.9778 672.2357
## sample estimates:
## mean in group Race mean in group Religion
## 721.18204 97.57531
## [1] 623.6067
The average number of hate crimes motivated by race and religion was examined using a two-sample t-test.
Despite non-normal distributions, we were able to apply the central limit theorem and use the two-sample t-test due to the large sample sizes (n=484).
There was an unequal variance between the two groups based on Levene’s test for homogeneity of variance.
T-test results without assuming equal variance for two-samples:
Race and religion were statistically significantly different in the number of hate crimes committed.
t(df = 513) = 25.12, p<2.2e-16, 95% CI for the difference in means [721.18 - 97.57].
Race-motivated hate crimes are significantly more common than religion-motivated hate crimes, according to the analysis.