Is there any connection of the race of the person with them being armed? I am using this data set called “Fatal Police Shootings” from the website openintro. It contains records of every fatal police shooting by an on-duty officer since January 1, 2015. And it is a data frame with 6421 rows and 12 variables.We are focusing on only two variables being race and armed as they are the only variables that are needed to answer the question.
Load the libraries
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
[1] "gun" "unarmed"
[3] "toy weapon" "nail gun"
[5] "knife" ""
[7] "shovel" "vehicle"
[9] "hammer" "hatchet"
[11] "sword" "machete"
[13] "box cutter" "undetermined"
[15] "metal object" "screwdriver"
[17] "lawn mower blade" "flagpole"
[19] "guns and explosives" "cordless drill"
[21] "crossbow" "BB gun"
[23] "metal pole" "Taser"
[25] "metal pipe" "metal hand tool"
[27] "blunt object" "metal stick"
[29] "sharp object" "meat cleaver"
[31] "carjack" "chain"
[33] "contractor's level" "railroad spikes"
[35] "stapler" "beer bottle"
[37] "unknown weapon" "binoculars"
[39] "bean-bag gun" "baseball bat and fireplace poker"
[41] "straight edge razor" "gun and knife"
[43] "ax" "brick"
[45] "baseball bat" "hand torch"
[47] "chain saw" "garden tool"
[49] "scissors" "pole"
[51] "pick-axe" "flashlight"
[53] "baton" "spear"
[55] "chair" "pitchfork"
[57] "hatchet and gun" "rock"
[59] "piece of wood" "pipe"
[61] "glass shard" "motorcycle"
[63] "pepper spray" "metal rake"
[65] "crowbar" "oar"
[67] "machete and gun" "tire iron"
[69] "air conditioner" "pole and knife"
[71] "baseball bat and bottle" "fireworks"
[73] "pen" "chainsaw"
[75] "gun and sword" "gun and car"
[77] "pellet gun" "claimed to be armed"
[79] "incendiary device" "samurai sword"
[81] "bow and arrow" "gun and vehicle"
[83] "vehicle and gun" "wrench"
[85] "walking stick" "barstool"
[87] "grenade" "BB gun and vehicle"
[89] "wasp spray" "air pistol"
[91] "Airsoft pistol" "baseball bat and knife"
[93] "vehicle and machete" "ice pick"
[95] "car, knife and mace" "bottle"
[97] "gun and machete" "microphone"
[99] "knife and vehicle"
Using the unique code, it allows me to see everything that is in the armed variable. And I can see the “” which is a blank on the dataset. So I will need to remove it
My first attempt didn’t work as it didn’t get rid of the “” and instead replace all the blanks for everything, and it leads to things like airNApistol, knifeNAandNAvehicle and more.
My second attempt made it where everything is just NA values and nothing more.
Removing the “”
shootings_clean <- shootings |>select(race, armed) |>mutate(armed =trimws(armed)) |>#(got the code trimws from this website I linked on the bottom )filter(!is.na(armed), armed !="")
Using the code triws I was able to get rid of the ““. And I just filter as another layer to get rid of the possible na that could be in the variable.
unique(shootings_clean$armed)
[1] "gun" "unarmed"
[3] "toy weapon" "nail gun"
[5] "knife" "shovel"
[7] "vehicle" "hammer"
[9] "hatchet" "sword"
[11] "machete" "box cutter"
[13] "undetermined" "metal object"
[15] "screwdriver" "lawn mower blade"
[17] "flagpole" "guns and explosives"
[19] "cordless drill" "crossbow"
[21] "BB gun" "metal pole"
[23] "Taser" "metal pipe"
[25] "metal hand tool" "blunt object"
[27] "metal stick" "sharp object"
[29] "meat cleaver" "carjack"
[31] "chain" "contractor's level"
[33] "railroad spikes" "stapler"
[35] "beer bottle" "unknown weapon"
[37] "binoculars" "bean-bag gun"
[39] "baseball bat and fireplace poker" "straight edge razor"
[41] "gun and knife" "ax"
[43] "brick" "baseball bat"
[45] "hand torch" "chain saw"
[47] "garden tool" "scissors"
[49] "pole" "pick-axe"
[51] "flashlight" "baton"
[53] "spear" "chair"
[55] "pitchfork" "hatchet and gun"
[57] "rock" "piece of wood"
[59] "pipe" "glass shard"
[61] "motorcycle" "pepper spray"
[63] "metal rake" "crowbar"
[65] "oar" "machete and gun"
[67] "tire iron" "air conditioner"
[69] "pole and knife" "baseball bat and bottle"
[71] "fireworks" "pen"
[73] "chainsaw" "gun and sword"
[75] "gun and car" "pellet gun"
[77] "claimed to be armed" "incendiary device"
[79] "samurai sword" "bow and arrow"
[81] "gun and vehicle" "vehicle and gun"
[83] "wrench" "walking stick"
[85] "barstool" "grenade"
[87] "BB gun and vehicle" "wasp spray"
[89] "air pistol" "Airsoft pistol"
[91] "baseball bat and knife" "vehicle and machete"
[93] "ice pick" "car, knife and mace"
[95] "bottle" "gun and machete"
[97] "microphone" "knife and vehicle"
I did this to check if I was able to remove the blank. And by taking a look at it, I couldn’t see “” meaning I have succeeded in removing it.
summary(shootings_clean)
race armed
Length:6213 Length:6213
Class :character Class :character
Mode :character Mode :character
This gave me a summary of the two variables I am using.
This code allows me to make a graph. If I don’t make it where it is yes or no if the person is armed or not, the code wouldn’t allow me to make a graph.
This gave me a count to all the weapons that were showed in the data set. And made it easier for me to see how often a certain weapon was at the scene.
ggplot(shootings_clean, aes(x = race, fill = is_armed)) +geom_bar() +labs(title ="Distribution of Armed vs Unarmed by Race",x ="Race",y ="Count", ) +theme_minimal()
Finally, this is the graph that I have created.It will allow me to answer my question.
Conclusion
The largest number of fatal shootings involved individuals identified as White (W), followed by Black (B) and Hispanic (H) individuals.While fatal shootings of individuals identified as Asian (A), Native American (N), Other (O), or Unknown(not being showed on the x-axis) occur at significantly lower frequencies in this dataset.And this graph does give us an answer to if there any connection of the race of the person with them being armed. The blue in the graph is yes where they are armed while red is that they are unarmed. Most of the people that died were armed with a few being unarmed and is reflected on the graph. For the Native Americans and other races, they were always armed. And for the Asians, they were mostly always armed. And there is a decent amount of people being unarmed for the Whites and Blacks. Meaning yes there is a connection of the race of the person with them being armed. There could be some implications to the dataset which will affect the answer of the question. There is a chance where certain groups are “over-represented,” or there is bias towards certain groups.And we need to know where these people got shot and use it as it could affect the data as well. For the future, I need to use other variables as well instead of just two. I didn’t use the variables flee and signs of mental illness and they could be helpful to answer the question. And I need to watch out for geographic trend as well.