Overview

This assignment is an analysis of Philadelphia Shooting Victim data collected and created by the Philadelphia Police Department. The data was collected as part of the Open Government and Police Data Initiatives. It was also collected to illustrate the level and context of shooting incidents across Philadelphia. The data ranges from 1/1/15 until current day as it is updated by the Statistics Unit of the PPD daily. I obtained the dataset from opendataphilly.org URL to dataset: https://opendataphilly.org/datasets/shooting-victims/ Codebook for the dataset: https://metadata.phila.gov/#home/datasetdetails/5719551277d6389f3005a610/representationdetails/5719551277d6389f3005a614/ The Codebook provides general information about the dataset, such as a description, the development process, date range, coordinate system, etc. It also provides information regarding each of the fields or columns included within the dataset. This information includes the field name, its alias, a description of it, and a classification of the data type within the field. For example the field “DIST” depicts the district in which each of the shooting incidents took place, and the data is numeric.

Preparing the Dataset

Cleaning and Subsetting

The data set did contain some missing variables and required cleaning. To clean the dataset I deleted rows that had missing values only pertaining to certain columns. I did this to preserve the data and ensure that I wasn’t removing anything of importance. I did this with: filter(complete.cases(sex, age, year, race, lat, lng, wound, inside, outside, latino)) Subsetting: Since the dataset ranges from 2015-2024, I decided to subset it into a smaller dataset just containing data from 2023 and 2024. I did this mainly to clear up a lot of the overclustering that was occurring within the interactive maps. I did this with:filter(shootings_clean, year %in% c(2023, 2024))

Overview of the Dataset

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##    the_geom         the_geom_webmercator    objectid             year     
##  Length:2637        Length:2637          Min.   :16236437   Min.   :2023  
##  Class :character   Class :character     1st Qu.:16237130   1st Qu.:2023  
##  Mode  :character   Mode  :character     Median :16237796   Median :2023  
##                                          Mean   :16237795   Mean   :2023  
##                                          3rd Qu.:16238459   3rd Qu.:2024  
##                                          Max.   :16239125   Max.   :2024  
##                                                                           
##      dc_key               code           date_               time          
##  Min.   :2.023e+11   Min.   : 100.0   Length:2637        Length:2637       
##  1st Qu.:2.023e+11   1st Qu.: 100.0   Class :character   Class :character  
##  Median :2.023e+11   Median : 400.0   Mode  :character   Mode  :character  
##  Mean   :2.024e+11   Mean   : 405.9                                        
##  3rd Qu.:2.024e+11   3rd Qu.: 400.0                                        
##  Max.   :2.024e+11   Max.   :3700.0                                        
##                      NA's   :5                                             
##      race               sex                 age           wound          
##  Length:2637        Length:2637        Min.   : 0.00   Length:2637       
##  Class :character   Class :character   1st Qu.:21.00   Class :character  
##  Mode  :character   Mode  :character   Median :29.00   Mode  :character  
##                                        Mean   :30.93                     
##                                        3rd Qu.:37.00                     
##                                        Max.   :88.00                     
##                                                                          
##  officer_involved   offender_injured   offender_deceased    location        
##  Length:2637        Length:2637        Length:2637        Length:2637       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      latino         point_x          point_y           dist      
##  Min.   :0.000   Min.   :-75.27   Min.   :39.88   Min.   : 1.00  
##  1st Qu.:0.000   1st Qu.:-75.19   1st Qu.:39.97   1st Qu.:15.00  
##  Median :0.000   Median :-75.15   Median :39.99   Median :22.00  
##  Mean   :0.149   Mean   :-75.16   Mean   :39.99   Mean   :20.55  
##  3rd Qu.:0.000   3rd Qu.:-75.13   3rd Qu.:40.02   3rd Qu.:25.00  
##  Max.   :1.000   Max.   :-74.96   Max.   :40.13   Max.   :77.00  
##                                                                  
##      inside           outside           fatal             lat       
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :39.88  
##  1st Qu.:0.00000   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:39.97  
##  Median :0.00000   Median :1.0000   Median :0.0000   Median :39.99  
##  Mean   :0.08343   Mean   :0.9166   Mean   :0.2181   Mean   :39.99  
##  3rd Qu.:0.00000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:40.02  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :40.13  
##                                                                     
##       lng        
##  Min.   :-75.27  
##  1st Qu.:-75.19  
##  Median :-75.15  
##  Mean   :-75.16  
##  3rd Qu.:-75.13  
##  Max.   :-74.96  
## 

The above information is a summary of the dataset after cleaning and subsetting.

The above bar graph depicts the number differing races of the shooting victims included in the dataset. The number of Asian, Black, and White victims are all depicted. We see that there is a significantly larger amount of Black victims in comparison to those who were Asian or White. There were about 2,072 Black victims, 544 White victims, and 21 Asian Victims.

The above bar graph depicts the amount of shooting victims who were and were not of Latino ethnicity. We see that a smaller number of victims were Latino compared to those who were not. About 393 victims were Latino, while 2,244 were not Latino. The inclusion of a Latino category is likely due to the fact that Latino is more so an ethnicity as opposed to a race. Moreover, the races in the previous graph could also be of Latino ethnicity which is why a seperate category was needed. However, this data is important for depicting how the Latino community in Philadelphia may be impacted by shootings and gun violence.

The bar graph above depicts the differing numbers of male and female shooting victims. We clearly see that there are significantly more male victims than female victims. There are about 354 female victims and about 2283 male victims. This could possibly be attributed to more males being involved in gang violence and crime, however, more analysis would need to conducted to conclude this.

The above bar graph depicts the number of shooting victims reported in 2023 and 2024. Interestingly, we see that 2023 had a greater number of shooting victims than 2024. 2023 had a total of 1,641 shooting victims while 2024 has had a total of 996. Some of this difference may be due to the fact that 2024 is not completely over yet.

The above bar graph depicts the number of shooting victims that were shot inside or outside of a building/home. We clearly see that more victims were shot outside of a building/home compared to inside. About 220 victims were shot within a building/home while about 2,417 were shot outside.

The interactive map of all Philadelphia shooting victims from 2023-2024 is depicted above. Each of the red point represents an individual shooting victim. By clicking on each point views can see the address of the shooting, the date of the shoot, the race of the victim, and whether or not the shooting was fatal (0=No, 1=Yes). Viewers can also see the clustering of points depicting locations where shooting incidents may be more common. For example, there are points clustered along both Market and Broad Street Subway Lines. There is also more dense clustering in the North and West portions of the City.

Hypotheses

  1. Hypothesis: There is a significant difference in the proportion of fatal incidents between males and females. Null Hypothesis: There is no relationship between incident fatality and victim gender.

  2. Hypothesis: There is a significant difference in the proportion of fatal incidents between Victims of different Races Null Hypothesis: There is no relationship between indicent fatality and victim race.

  3. Hypothesis: There is a signficant relationship between incident fatality and whether or not the shooting occurred inside or outside. Null Hypothesis: There is no relationship between incident fatality and whether the incident occurred inside or outside.

Hypothesis Testing

Hypothesis 1 Results: P-value is less than 0.05 showing that there is a significant relationship between fatality and the gender of the shooting victim.

library(dplyr)
shootings <- read.csv("shootings.csv")
shootings_clean <- shootings %>%
  filter(complete.cases(sex, age, year, race, lat, lng, wound, inside, outside, latino))
shootings23_24 <- filter(shootings_clean, year %in% c(2023, 2024))
HTshootings <- shootings23_24 %>% filter(!is.na(fatal) & !is.na(sex))
contingency_table <- table(HTshootings$fatal, HTshootings$sex)
print(contingency_table)
##    
##        F    M
##   0  296 1766
##   1   58  517
chi_squared_test <- chisq.test(contingency_table)
print(chi_squared_test)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  contingency_table
## X-squared = 6.6847, df = 1, p-value = 0.009724

Hypothesis 2 Results: P-value is more than 0.05 showing that there is no clear relationship between race and incident fatality.

shootings <- read.csv("shootings.csv")
shootings_clean <- shootings %>%
  filter(complete.cases(sex, age, year, race, lat, lng, wound, inside, outside, latino))
shootings23_24 <- filter(shootings_clean, year %in% c(2023, 2024))
HTshootings2 <- shootings23_24 %>% filter(!is.na(fatal) & !is.na(race))
contingency_table2 <- table(HTshootings2$fatal, HTshootings2$race)
print(contingency_table2)
##    
##        A    B    W
##   0   18 1625  419
##   1    3  447  125
fisher_test <- fisher.test(contingency_table2)
print(fisher_test)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  contingency_table2
## p-value = 0.5881
## alternative hypothesis: two.sided

Hypothesis 3 Results: P-value is less than 0.05 showing that there is a significant relationship between the incident environment (inside/outside) and fatality rates.

shootings <- read.csv("shootings.csv")
shootings_clean <- shootings %>%
  filter(complete.cases(sex, age, year, race, lat, lng, wound, inside, outside, latino))
shootings23_24 <- filter(shootings_clean, year %in% c(2023, 2024))
HTshootings3 <- shootings23_24 %>% filter(!is.na(fatal) & !is.na(inside))
contingency_table3 <- table(HTshootings3$fatal, HTshootings3$inside)
print(contingency_table3)
##    
##        0    1
##   0 1909  153
##   1  508   67
chi_squared_test <- chisq.test(contingency_table3)
print(chi_squared_test)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  contingency_table3
## X-squared = 9.9855, df = 1, p-value = 0.001578

Conclusion

Overall, this dataset on Philadelphia shooting victims was really revealing. With further expertise, there is much more that could still be learnt and depicted. However, from what I was able to complete I learned that between the years of 2023 and 2024 there have been more black shooting victims than that of any other race. There have been more male victims than female. There were more shootings in 2023 than there has been in 2024, and more people were likely to have been shot outside in comparison to inside a building or home. The Mapping I was able to complete also revealed alot about the possible shooting hotspots throughout the city of Philadelphia. Like I mentioned before, we see clustering around the subway lines and within more socioeconomically disadvantaged areas. There is definitely further analysis that can be conducted and depcited through graphing and mapping. For example, A map could be created depicting different colored points that correspond with the race of each shooting victim. There are almost endless possibilities for what can be done. Some of the ethical concerns that may result from the use of this data for analysis has to do with the validity of the hypothesis tests. There is a possibility that I conducted them incorrectly or even left missing values within the data that could have swayed the results. There may also be possible issues concerning the entering of data that is done by the PPD. Human error exists, so its possible that there is some misinformation or bad formatting.

References

Fogarty, B. (2023). Quantitative Social Science Data with R (2nd ed.). SAGE Publications, Ltd. (UK). https://bookshelf.vitalsource.com/books/9781529614237

Philadelphia Police Department. (2015, January 1). Shooting victims. OpenDataPhilly. https://opendataphilly.org/datasets/shooting-victims/