Belle Arcus s3951382
Last updated: 26 May, 2022
A hate crime is defined as any criminal offence perceived by the victim or anyone motivated by hostility or prejudice towards a personal characteristic
The police force primarily focuses on five strands of hate crime: race or ethnicity, religion or beliefs, sexual orientation, disability, and transgender identity
Data on these offences can improve police response
Although the data focuses on five strands of hate crime, only sexual orientation and transgender identity will be analysed in this report
Therefore, rather than focusing on hate crime as a whole, I will be focusing on hate crime within the LGBTQ+ community
In this report, I will analyse if there is a statistically significant difference between average number of offences motivated by sexual orientation and transgender identity using a two-sample t-test
Summary statistics will be performed on the Number_of_offences variable, comparing the motivating factors sexual orientation and transgender identity
Outliers will be found using boxplot visualisations and missing values will be removed
QQ plots will be used to visually check for normality
hatecrime = read.csv('C:\\Users\\barcus\\OneDrive - RMIT University\\Desktop\\Applied Analytics\\HateCrime.csv')
knitr::kable(head(hatecrime,5))| ï..Financial.Year | Force.Name | Motivating.factor | Number.of.offences |
|---|---|---|---|
| 2011/12 | Avon and Somerset | Disability | 113 |
| 2011/12 | Bedfordshire | Disability | 6 |
| 2011/12 | British Transport Police | Disability | 25 |
| 2011/12 | Cambridgeshire | Disability | 6 |
| 2011/12 | Cheshire | Disability | 7 |
| Financial_Year | Force_Name | Motivating_factor | Number_of_offences |
|---|---|---|---|
| 2011/12 | Avon and Somerset | Transgender identity | 16 |
| 2011/12 | Bedfordshire | Transgender identity | 1 |
| 2011/12 | British Transport Police | Transgender identity | 5 |
| 2011/12 | Cambridgeshire | Transgender identity | 1 |
| 2011/12 | Cheshire | Transgender identity | 5 |
The average number of offences motivated by sexual orientation is higher than the average number of offences motivated by transgender identity
As both means are higher than the medians, the data is therefore positively skewed
hatecrimefiltered %>% group_by(Motivating_factor) %>% summarise(Min = min(Number_of_offences,na.rm = TRUE),
Q1 = quantile(Number_of_offences,probs = .25,na.rm = TRUE),
Median = median(Number_of_offences, na.rm = TRUE),
Q3 = round(quantile(Number_of_offences,probs = .75,na.rm = TRUE),1),
Max = max(Number_of_offences,na.rm = TRUE),
Mean = round(mean(Number_of_offences, na.rm = TRUE),1),
SD = round(sd(Number_of_offences, na.rm = TRUE),1),
n = n(),
Missing = sum(is.na(Number_of_offences))) -> CrimeByMotivatingFactor
knitr::kable(CrimeByMotivatingFactor)| Motivating_factor | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| Sexual orientation | 5 | 56 | 121 | 240 | 3035 | 218.1 | 341.8 | 440 | 1 |
| Transgender identity | 0 | 6 | 17 | 37 | 292 | 30.3 | 39.8 | 440 | 1 |
plotNormalHistogram(hatecrime_sexuality$Number_of_offences, col = 'red', main = 'Number of Offences Motivated By Sexuality - Distribution', length = 500)
plotNormalHistogram(hatecrime_trans$Number_of_offences, col = 'red', main = 'Number of Offences Motivated By Transgender Identity - Distribution', length = 500) Note: Missing values were removed as they made up less than 5% of the data
is.outlier = function(x){(x < summary(x)[2] - 1.5*IQR(x))|(x > summary(x)[5] + 1.5*IQR(x))}
sum(is.outlier(hatecrimecomplete$Number_of_offences))## [1] 92
boxplot(Number_of_offences~Motivating_factor, data = hatecrimecomplete, xlab = "Motivating Factor",
ylab = "Number of Offences", main = "Number Of Offences Motivated By Sexuality Compared To Transgender Identity", col=c("blue", "pink"))H0: There is no statistically significant difference between the average number of offences motivated by sexual orientation and transgender identity
\[H_0: \mu_1 - \mu_2 = 0 \] HA: There is a statistically significant difference between the average number of offences motivated by sexual orientation and transgender identity
\[H_A: \mu_1 - \mu_2 \ne 0 \]
hatecrime_sexuality$Number_of_offences %>% qqPlot(dist="norm", main = 'Number Of Offences Motivated By Sexuality')## [1] 377 421
hatecrime_trans$Number_of_offences %>% qqPlot(dist="norm", main = 'Number Of Offences Motivated By Transgender Identity')## [1] 377 421
Normality
Both QQ plots do not follow a normal distribution as data points lie outside the 95% CI. Data points outside the 95% CI are heavier on the right tail, causing a positive skew
However, due to large sample sizes (sexual orientation n=440, transgender identity n=440), I can continue with my test
Central Limit Theorem
Because the sample size is greater than 30, I can assume that the sampling distributions are normal, regardless of whether or not the underlying population distribution is
Therefore, I can still perform a two-sample t-test
\[H_0: \sigma^2_1 = \sigma^2_2 \] \[H_A: \sigma^2_1 \ne \sigma^2_2 \]
The p value for the Levene’s test of equal variance for the number of offences motivated by sexual orientation compared to transgender identity was p=1.44e-18
Since the p value is less 0.05,the results are statistically significant and we cannot assume equal variance
t.test(
Number_of_offences~Motivating_factor,
data = hatecrimecomplete,
var.equal = FALSE,
alternative = "two.sided"
)##
## Welch Two Sample t-test
##
## data: Number_of_offences by Motivating_factor
## t = 11.432, df = 449.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Sexual orientation and group Transgender identity is not equal to 0
## 95 percent confidence interval:
## 155.4816 220.0355
## sample estimates:
## mean in group Sexual orientation mean in group Transgender identity
## 218.10478 30.34624
## [1] 187.7585
A two-sample t-test was used to test for a significant difference between the average number of offences motivated by sexual orientation, and the average number of offences motivated by transgender identity. While the distribution of the number of offences for both groups does not appear normal, the central limit theorem states that we can proceed with a two-sample t-test due to large sample sizes (n=440 n=440). The Levene’s test of homogeneity of variance suggested that equal variance could not be assumed. The results of the two-sample t-test, not assuming equal variance, found a statistically significant difference between the number of offences motivated by sexual orientation and transgender identity, t(df = 450) = 11.43, p<2.2e-16, 95% CI for the difference in means [155.48 220.04]. The results of the investigation suggest that the number of offences motivated by sexual orientation is statistically significantly higher than offences motivated by transgender identity
Between 2011 and 2020, the mean number of offences motivated by sexual orientation is 218.1 whereas the mean number of offences motivated by transgender identity is 30.35. There is a noticeably large difference and it shows that there are, on average, 187.76 more offences motivated by sexuality than transgender identity.
This was proved by a two-sample t-test which produced a p value less than 0.05, thus allowing us to reject the null hypothesis that there is no difference between the mean number of offences
One strength was the sample size itself (440 samples for each group) because the larger the sample size is, the more accurate the generalisations about the population will be
A limitation is that sample might not reflect the population due to human bias/error. Minority groups are less likely to report crimes to the police, so the data will only reflect reported cases of hate crime
Expanding the data to only include reported cases of hate crime to the police, but also reported cases online such as reddit. This will provide a more accurate representation of crime committed due to prejudice.
A further analysis into race and gender within these two groups might provide more interesting results
In conclusion, while there are significantly more hate crimes motivated by sexuality than by trans identity, the LGBTQ+ community continually faces prejudice in society