This assignment is an analysis of Philadelphia Shooting Victim data collected and created by the Philadelphia Police Department. The data was collected as part of the Open Government and Police Data Initiatives. It was also collected to illustrate the level and context of shooting incidents across Philadelphia. The data ranges from 1/1/15 until current day as it is updated by the Statistics Unit of the PPD daily. I obtained the dataset from opendataphilly.org URL to dataset: https://opendataphilly.org/datasets/shooting-victims/ Codebook for the dataset: https://metadata.phila.gov/#home/datasetdetails/5719551277d6389f3005a610/representationdetails/5719551277d6389f3005a614/ The Codebook provides general information about the dataset, such as a description, the development process, date range, coordinate system, etc. It also provides information regarding each of the fields or columns included within the dataset. This information includes the field name, its alias, a description of it, and a classification of the data type within the field. For example the field “DIST” depicts the district in which each of the shooting incidents took place, and the data is numeric.
The data set did contain some missing variables and required cleaning. To clean the dataset I deleted rows that had missing values only pertaining to certain columns. I did this to preserve the data and ensure that I wasn’t removing anything of importance. I did this with: filter(complete.cases(sex, age, year, race, lat, lng, wound, inside, outside, latino)) Subsetting: Since the dataset ranges from 2015-2024, I decided to subset it into a smaller dataset just containing data from 2023 and 2024. I did this mainly to clear up a lot of the overclustering that was occurring within the interactive maps. I did this with:filter(shootings_clean, year %in% c(2023, 2024))
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## the_geom the_geom_webmercator objectid year
## Length:2637 Length:2637 Min. :16236437 Min. :2023
## Class :character Class :character 1st Qu.:16237130 1st Qu.:2023
## Mode :character Mode :character Median :16237796 Median :2023
## Mean :16237795 Mean :2023
## 3rd Qu.:16238459 3rd Qu.:2024
## Max. :16239125 Max. :2024
##
## dc_key code date_ time
## Min. :2.023e+11 Min. : 100.0 Length:2637 Length:2637
## 1st Qu.:2.023e+11 1st Qu.: 100.0 Class :character Class :character
## Median :2.023e+11 Median : 400.0 Mode :character Mode :character
## Mean :2.024e+11 Mean : 405.9
## 3rd Qu.:2.024e+11 3rd Qu.: 400.0
## Max. :2.024e+11 Max. :3700.0
## NA's :5
## race sex age wound
## Length:2637 Length:2637 Min. : 0.00 Length:2637
## Class :character Class :character 1st Qu.:21.00 Class :character
## Mode :character Mode :character Median :29.00 Mode :character
## Mean :30.93
## 3rd Qu.:37.00
## Max. :88.00
##
## officer_involved offender_injured offender_deceased location
## Length:2637 Length:2637 Length:2637 Length:2637
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## latino point_x point_y dist
## Min. :0.000 Min. :-75.27 Min. :39.88 Min. : 1.00
## 1st Qu.:0.000 1st Qu.:-75.19 1st Qu.:39.97 1st Qu.:15.00
## Median :0.000 Median :-75.15 Median :39.99 Median :22.00
## Mean :0.149 Mean :-75.16 Mean :39.99 Mean :20.55
## 3rd Qu.:0.000 3rd Qu.:-75.13 3rd Qu.:40.02 3rd Qu.:25.00
## Max. :1.000 Max. :-74.96 Max. :40.13 Max. :77.00
##
## inside outside fatal lat
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :39.88
## 1st Qu.:0.00000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:39.97
## Median :0.00000 Median :1.0000 Median :0.0000 Median :39.99
## Mean :0.08343 Mean :0.9166 Mean :0.2181 Mean :39.99
## 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:40.02
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :40.13
##
## lng
## Min. :-75.27
## 1st Qu.:-75.19
## Median :-75.15
## Mean :-75.16
## 3rd Qu.:-75.13
## Max. :-74.96
##
The above information is a summary of the dataset after cleaning and subsetting.
The above bar graph depicts the number differing races of the shooting victims included in the dataset. The number of Asian, Black, and White victims are all depicted. We see that there is a significantly larger amount of Black victims in comparison to those who were Asian or White. There were about 2,072 Black victims, 544 White victims, and 21 Asian Victims.
The above bar graph depicts the amount of shooting victims who were and were not of Latino ethnicity. We see that a smaller number of victims were Latino compared to those who were not. About 393 victims were Latino, while 2,244 were not Latino. The inclusion of a Latino category is likely due to the fact that Latino is more so an ethnicity as opposed to a race. Moreover, the races in the previous graph could also be of Latino ethnicity which is why a seperate category was needed. However, this data is important for depicting how the Latino community in Philadelphia may be impacted by shootings and gun violence.
The bar graph above depicts the differing numbers of male and female shooting victims. We clearly see that there are significantly more male victims than female victims. There are about 354 female victims and about 2283 male victims. This could possibly be attributed to more males being involved in gang violence and crime, however, more analysis would need to conducted to conclude this.
The above bar graph depicts the number of shooting victims reported in 2023 and 2024. Interestingly, we see that 2023 had a greater number of shooting victims than 2024. 2023 had a total of 1,641 shooting victims while 2024 has had a total of 996. Some of this difference may be due to the fact that 2024 is not completely over yet.
The above bar graph depicts the number of shooting victims that were shot inside or outside of a building/home. We clearly see that more victims were shot outside of a building/home compared to inside. About 220 victims were shot within a building/home while about 2,417 were shot outside.
The interactive map of all Philadelphia shooting victims from 2023-2024 is depicted above. Each of the red point represents an individual shooting victim. By clicking on each point views can see the address of the shooting, the date of the shoot, the race of the victim, and whether or not the shooting was fatal (0=No, 1=Yes). Viewers can also see the clustering of points depicting locations where shooting incidents may be more common. For example, there are points clustered along both Market and Broad Street Subway Lines. There is also more dense clustering in the North and West portions of the City.
Hypothesis: There is a significant difference in the proportion of fatal incidents between males and females. Null Hypothesis: There is no relationship between incident fatality and victim gender.
Hypothesis: There is a significant difference in the proportion of fatal incidents between Victims of different Races Null Hypothesis: There is no relationship between indicent fatality and victim race.
Hypothesis: There is a signficant relationship between incident fatality and whether or not the shooting occurred inside or outside. Null Hypothesis: There is no relationship between incident fatality and whether the incident occurred inside or outside.
Hypothesis 1 Results: P-value is less than 0.05 showing that there is a significant relationship between fatality and the gender of the shooting victim.
library(dplyr)
shootings <- read.csv("shootings.csv")
shootings_clean <- shootings %>%
filter(complete.cases(sex, age, year, race, lat, lng, wound, inside, outside, latino))
shootings23_24 <- filter(shootings_clean, year %in% c(2023, 2024))
HTshootings <- shootings23_24 %>% filter(!is.na(fatal) & !is.na(sex))
contingency_table <- table(HTshootings$fatal, HTshootings$sex)
print(contingency_table)
##
## F M
## 0 296 1766
## 1 58 517
chi_squared_test <- chisq.test(contingency_table)
print(chi_squared_test)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table
## X-squared = 6.6847, df = 1, p-value = 0.009724
Hypothesis 2 Results: P-value is more than 0.05 showing that there is no clear relationship between race and incident fatality.
shootings <- read.csv("shootings.csv")
shootings_clean <- shootings %>%
filter(complete.cases(sex, age, year, race, lat, lng, wound, inside, outside, latino))
shootings23_24 <- filter(shootings_clean, year %in% c(2023, 2024))
HTshootings2 <- shootings23_24 %>% filter(!is.na(fatal) & !is.na(race))
contingency_table2 <- table(HTshootings2$fatal, HTshootings2$race)
print(contingency_table2)
##
## A B W
## 0 18 1625 419
## 1 3 447 125
fisher_test <- fisher.test(contingency_table2)
print(fisher_test)
##
## Fisher's Exact Test for Count Data
##
## data: contingency_table2
## p-value = 0.5881
## alternative hypothesis: two.sided
Hypothesis 3 Results: P-value is less than 0.05 showing that there is a significant relationship between the incident environment (inside/outside) and fatality rates.
shootings <- read.csv("shootings.csv")
shootings_clean <- shootings %>%
filter(complete.cases(sex, age, year, race, lat, lng, wound, inside, outside, latino))
shootings23_24 <- filter(shootings_clean, year %in% c(2023, 2024))
HTshootings3 <- shootings23_24 %>% filter(!is.na(fatal) & !is.na(inside))
contingency_table3 <- table(HTshootings3$fatal, HTshootings3$inside)
print(contingency_table3)
##
## 0 1
## 0 1909 153
## 1 508 67
chi_squared_test <- chisq.test(contingency_table3)
print(chi_squared_test)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table3
## X-squared = 9.9855, df = 1, p-value = 0.001578
Overall, this dataset on Philadelphia shooting victims was really revealing. With further expertise, there is much more that could still be learnt and depicted. However, from what I was able to complete I learned that between the years of 2023 and 2024 there have been more black shooting victims than that of any other race. There have been more male victims than female. There were more shootings in 2023 than there has been in 2024, and more people were likely to have been shot outside in comparison to inside a building or home. The Mapping I was able to complete also revealed alot about the possible shooting hotspots throughout the city of Philadelphia. Like I mentioned before, we see clustering around the subway lines and within more socioeconomically disadvantaged areas. There is definitely further analysis that can be conducted and depcited through graphing and mapping. For example, A map could be created depicting different colored points that correspond with the race of each shooting victim. There are almost endless possibilities for what can be done. Some of the ethical concerns that may result from the use of this data for analysis has to do with the validity of the hypothesis tests. There is a possibility that I conducted them incorrectly or even left missing values within the data that could have swayed the results. There may also be possible issues concerning the entering of data that is done by the PPD. Human error exists, so its possible that there is some misinformation or bad formatting.
Fogarty, B. (2023). Quantitative Social Science Data with R (2nd ed.). SAGE Publications, Ltd. (UK). https://bookshelf.vitalsource.com/books/9781529614237
Philadelphia Police Department. (2015, January 1). Shooting victims. OpenDataPhilly. https://opendataphilly.org/datasets/shooting-victims/