knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
To answer this question, I looked into openintro.org where I found the Fatal Police Shootings (A subset of the Washington Post database. Contains records of every fatal police shooting by an on-duty officer since January 1, 2015.) dataset. The dataset is extensive as it contains 6421 cases (basically rows but in the context of the dataset, it’s the victims in which they recorded). and contains 12 columns which include manner_of_death, race, city and etc. To answer the question, I will only be looking at 2 columns which are signs_of_mental_illness and armed.
fatal<- read.csv("fatal_police_shootings.csv")
head(fatal)
## date manner_of_death armed age gender race city state
## 1 2015-01-02 shot gun 53 M A Shelton WA
## 2 2015-01-02 shot gun 47 M W Aloha OR
## 3 2015-01-03 shot and Tasered unarmed 23 M H Wichita KS
## 4 2015-01-04 shot toy weapon 32 M W San Francisco CA
## 5 2015-01-04 shot nail gun 39 M H Evans CO
## 6 2015-01-04 shot gun 18 M W Guthrie OK
## signs_of_mental_illness threat_level flee body_camera
## 1 True attack Not fleeing False
## 2 False attack Not fleeing False
## 3 False other Not fleeing False
## 4 True attack Not fleeing False
## 5 False attack Not fleeing False
## 6 False attack Not fleeing False
summary(fatal)
## date manner_of_death armed age
## Length:6421 Length:6421 Length:6421 Min. : 6.00
## Class :character Class :character Class :character 1st Qu.:27.00
## Mode :character Mode :character Mode :character Median :35.00
## Mean :37.09
## 3rd Qu.:45.00
## Max. :91.00
## NA's :285
## gender race city state
## Length:6421 Length:6421 Length:6421 Length:6421
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## signs_of_mental_illness threat_level flee
## Length:6421 Length:6421 Length:6421
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## body_camera
## Length:6421
## Class :character
## Mode :character
##
##
##
##
To analyze the dataset, I first selected the signs_of_mental_illness and armed variables with the select function. I procceded to remove and missing values from these two variables using the filter function. I then created a new variable named armed_status using the mutate function, which simplified cases into either armed or unarmed from the armed variable.
shoot <- fatal |>
select(signs_of_mental_illness,armed)
shoot <- shoot |>
filter(!is.na(signs_of_mental_illness), !is.na(armed))
shoot<- shoot |>
mutate(armed_status = ifelse(armed == "unarmed", "Unarmed", "Armed"))
Since I’ll be looking at two categorical variables, signs_of_mental_illness and armed, I’ll be doing a Chi-Squared Test of Independence.This test will help determine whether there’s a relationship between a person showing signs of mental illness and whether they were armed during a fatal police shooting. I created a contingency table showing the relationship between mental illness and armed status.For visualization, I made a side-by-side bar plot to compare the counts of armed and unarmed individuals based on whether signs of mental illness were present (either true or false).
\(H_0\) : There is no association between signs of mental illness and whether a person was armed.
\(H_a\) : There is an association between signs of mental illness and whether a person was armed.
table(shoot$signs_of_mental_illness, shoot$armed_status)
##
## Armed Unarmed
## False 4617 331
## True 1391 82
barplot(
table(shoot$signs_of_mental_illness, shoot$armed_status),
beside = TRUE,
col = c("green", "red"),
legend.text = c("No Signs of Mental Illness", "Signs of Mental Illness"),
xlab = "Signs of Mental Illness",
ylab = "Count",
main = "Mental Illness Status vs Armed Status"
)
#In the bar plot, the red bars represent cases where signs of mental illness were present(TRUE), while green bars represent cases with no signs of mental illness (FALSE).
chisq.test(table(shoot$signs_of_mental_illness, shoot$armed_status))
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(shoot$signs_of_mental_illness, shoot$armed_status)
## X-squared = 2.1944, df = 1, p-value = 0.1385
After conducting the Chi-Squared Test of Independence, the results show:
chi-square statistic of 2.19,
a degree of freedom of 1,
a p-value of 0.1385
Since the p-value is greater than the significance level of α = 0.05, we fail to reject the null hypothesis. In the context of the question, it means there’s no evidence of an association between signs of mental illness and whether a person was armed during a fatal police shooting.
The analysis was used to determine whether there’s an association between signs of mental illness and whether a person was armed during a fatal police shooting. The Chi-Squared Test of Independence showed there was no relationship between the two variables (signs_of_mental_illness and armed_status), as the p-value of 0.1385 was greater than the significance level of α = 0.05.
The analysis results suggests that signs of mental illness aren’t associated with a person’s armed status during a fatal police shooting. For future research, exploring the other variables in the dataset such as flee (if a person fled or not), threat level (how much of a threat was the person), and body camera usage, can help understand if they influence armed status, as the analysis was limited to signs of mental illness. Including a person’s age in the analysis may also provide additional insight, as it could help examine whether age is related to more or less signs of mental illness the individuals showed, thus influencing their armed status.