Mass Shooting has been a problem unique to the United States mainly because of its unique laws embedded in the Second Amendment (“The Right of the people to keep and bear Arms”) of the US constitution. Other developed countries do not have the extent of mass shooting in the US. In this project, we analyzed verifiable mass shooting incidents (incidents with at least four victims) obtained from the Gun Violence Archive (GVA) to determine:
The data used in this analysis was obtained from Gun Violence Archive (GVA). GVA is a non-profit organization that collects and aggregates all verifiable gun violence incidents in the United States for easy access for anyone on their website. According to GVA, mass shootings are defined as mass shooting incidents where there are at least 4 victims.
We used the mass shootings data from 2014 to 2021 available on the GVA website. The current year(2021) does not contain full data, so we used a cut off date of November 20th, 2021 for the current year.
After downloading the data on mass shooting from 2014 - 2021 separately from the GVA website, we hosted them on the github repository, then read those files individually into “R”, and combined the whole datasets into one(1) dataset called mass_shooting and stored that in github as well. The original data source did not have the longitude and latitude of the locations of incidents that was necessary in order to develop the leaflet map, so we used google API from the ggmap package to obtain the longitude and latitude of all locations of mass_shootings incidents by passing the complete address (“Geo_Address”) as argument to the mutate_geocode() function (Note: The function requires an API key from google to work) from ggmap package. The data preparation codes can be found in another document located here
There are 10 variables in this dataset with 3,335 observations. The total observations indicate the total number of mass shooting incidents in the United States from 2014 to 2021 (November 20th, 2021).
The variables in the dataset are:Required Libraries
# Load Libraries
library(tidyverse)
library(lubridate)
library(stringr)
library(leaflet)
library(sp)
library(ggmap)
library(leaflet.extras)
library(htmltools)
library(plotly)
library(gridExtra)
library(scales)
library(ggrepel)
library(rgdal)
library(xml2)
library(rvest)
library(stringr)
library(stringi)
library(reactable)
library(syuzhet)
library(tm)
library(wordcloud)Load the data: The the data from a github repo:
# read the file
url <- "https://raw.githubusercontent.com/chinedu2301/data607-project-gun-violence/main/Data/mass_shooting.csv"
mass_shooting <- read_csv(url)
mass_shooting <- mass_shooting %>% select(Incident_ID:lat)Check the head
# Look at the head of the data
head(mass_shooting)## # A tibble: 6 x 10
## Incident_ID Incident_date State City Address No_killed No_injured Geo_Address
## <dbl> <date> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 271363 2014-12-29 Louisiana New ~ Poydra~ 0 4 Poydras an~
## 2 269679 2014-12-27 California Los ~ 8800 b~ 1 3 8800 block~
## 3 270036 2014-12-27 California Sacr~ 4000 b~ 0 4 4000 block~
## 4 269167 2014-12-26 Illinois East~ 2500 b~ 1 3 2500 block~
## 5 268598 2014-12-24 Missouri Sain~ 18th a~ 1 3 18th and P~
## 6 267792 2014-12-23 Kentucky Winc~ 260 Ox~ 1 3 260 Oxford~
## # ... with 2 more variables: lon <dbl>, lat <dbl>
Use glimpse to check the column types
glimpse(mass_shooting)## Rows: 3,335
## Columns: 10
## $ Incident_ID <dbl> 271363, 269679, 270036, 269167, 268598, 267792, 268282, ~
## $ Incident_date <date> 2014-12-29, 2014-12-27, 2014-12-27, 2014-12-26, 2014-12~
## $ State <chr> "Louisiana", "California", "California", "Illinois", "Mi~
## $ City <chr> "New Orleans", "Los Angeles", "Sacramento", "East St. Lo~
## $ Address <chr> "Poydras and Bolivar", "8800 block of South Figueroa Str~
## $ No_killed <dbl> 0, 1, 0, 1, 1, 1, 1, 4, 0, 2, 1, 1, 4, 0, 7, 1, 0, 0, 0,~
## $ No_injured <dbl> 4, 3, 4, 3, 3, 3, 3, 2, 5, 2, 4, 8, 0, 4, 0, 4, 5, 4, 4,~
## $ Geo_Address <chr> "Poydras and Bolivar, New Orleans, Louisiana", "8800 blo~
## $ lon <dbl> -90.08391, -118.28271, -121.44327, -90.12542, -90.20594,~
## $ lat <dbl> 29.95430, 33.95737, 38.64043, 38.61962, 38.63066, 38.003~
# extract year from date using the lubridate year() function
mass_shooting$year <- year(mass_shooting$Incident_date)
# plot a bar chart to show the distribution of mass shooting incidents by year
ms_year <- mass_shooting %>%
ggplot(aes(x=as.factor(year))) + geom_bar(stat='count', fill='purple') +
scale_y_continuous(labels=comma) + labs(x='Year', y='Total Incidents', title='Incidents by year') +
geom_label(stat = "count", aes(label = ..count.., y = ..count..)) +
theme_bw() + theme(plot.title = element_text(hjust = 0.5)) +
labs(title = "Mass Shooting Incidents by Year",
subtitle = "Mass shootings increased from 2018") +
theme(panel.background = element_rect(fill = "floralwhite"),
plot.background = element_rect(fill = "cornsilk")) +
labs(subtitle = NULL, caption = "Mass shootings increased from 2018")
# display the chart
ms_year# extract Quarters from date
mass_shooting$quarter <- quarter(mass_shooting$Incident_date)
# plot a bar chart to show the distribution of mass shooting incidents by quarter
ms_quarter <- mass_shooting %>% filter(year!=2013) %>% select(year, quarter) %>% group_by(year) %>%
count(quarter) %>%
ggplot(aes(x=as.factor(quarter), y=n, fill = quarter)) + geom_bar(stat='identity') +
scale_y_continuous(labels=comma) + facet_grid(.~year) +
labs(x='Quarter', y='Total Incidents', title='Incidents by Quarter') +
theme(plot.title = element_text(hjust = 0.5),
plot.background = element_rect(fill = "cornsilk")) +
labs(title = "Mass Shooting Incidents by Quarter",
subtitle = "The third quarter has the most incidents except for 2019") +
theme_bw() + theme(plot.title = element_text(hjust = 0.5),
panel.background = element_rect(fill = "beige"),
plot.background = element_rect(colour = "coral4")) +
labs(subtitle = NULL, caption = "The third quarter has the most mass shooting incidents except for 2019") + theme(plot.background = element_rect(fill = "cornsilk"))
# display the chart
ms_quarter# extract month from date using the lubridate year() function
mass_shooting$month <- month(mass_shooting$Incident_date, label=TRUE)
# plot a chart to show the distribution of mass shooting incidents by month
ms_month <- mass_shooting %>% count(month) %>%
ggplot(aes(x=month, y=n)) + geom_bar(stat='identity', fill='purple') +
scale_y_continuous(labels=comma) +
labs(x='month', y='Total Incidents', title='Incidents by month') + theme_bw() +
theme(plot.title = element_text(hjust = 0.5),
panel.background = element_rect(fill = "beige"),
plot.background = element_rect(fill = "cornsilk")) +
labs(title = "Mass Shooting Incidents by Month",
x = "Month", caption = "July has the most number of incidents followed by June considering all incidents from 2014 - 2021")
# display the chart
ms_month# extract day from date using lubridate wday() function
mass_shooting$weekday <- wday(mass_shooting$Incident_date, label=TRUE)
# Plot a chart to see the distribution by weekday
ms_wday <- mass_shooting %>% count(weekday) %>%
ggplot(aes(x=weekday, y=n)) + geom_bar(stat='identity', fill=rainbow(n=7)) +
scale_y_continuous(labels=comma) +
labs(x='Weekday', y='Number of incidents', title='Incidents by Weekday') + theme_bw() + theme(plot.title = element_text(hjust = 0.5),
panel.background = element_rect(fill = "floralwhite"),
plot.background = element_rect(fill = "cornsilk")) +labs(title = "Mass Shooting Incidents by Weekday",
y = "Total Incidents", caption = "Most mass shooting incidents occur on Saturday and Sundays (weekends)")
# display the chart
ms_wday# filter the datasets for each year
ms_2014 <- mass_shooting %>% filter(year(Incident_date) == 2014)
ms_2015 <- mass_shooting %>% filter(year(Incident_date) == 2015)
ms_2016 <- mass_shooting %>% filter(year(Incident_date) == 2016)
ms_2017 <- mass_shooting %>% filter(year(Incident_date) == 2017)
ms_2018 <- mass_shooting %>% filter(year(Incident_date) == 2018)
ms_2019 <- mass_shooting %>% filter(year(Incident_date) == 2019)
ms_2020 <- mass_shooting %>% filter(year(Incident_date) == 2020)
ms_2021 <- mass_shooting %>% filter(year(Incident_date) == 2021)
# Instantiate a leaflet map and plot map
ms_map <- leaflet() %>% addTiles() %>%
setView(lng = -95.7129, lat = 37.0902 , zoom = 4 ) %>%
addCircleMarkers(data = ms_2014, lng = ~lon, lat = ~lat, radius = 1,
popup = ~paste0("Incident Date: ", Incident_date, "<br>",
"Address: ", Geo_Address, "<br>",
"Number Killed: " , No_killed, "<br>",
"Number Injured: ", No_injured),
color = "magenta",
group = "2014") %>%
addCircleMarkers(data = ms_2015, lng = ~lon, lat = ~lat, radius = 1,
popup = ~paste0("Incident Date: ", Incident_date, "<br>",
"Address: ", Geo_Address, "<br>",
"Number Killed: " , No_killed, "<br>",
"Number Injured: ", No_injured),
color = "blue",
group = "2015") %>%
addCircleMarkers(data = ms_2016, lng = ~lon, lat = ~lat, radius = 1,
popup = ~paste0("Incident Date: ", Incident_date, "<br>",
"Address: ", Geo_Address, "<br>",
"Number Killed: " , No_killed, "<br>",
"Number Injured: ", No_injured),
color = "green",
group = "2016") %>%
addCircleMarkers(data = ms_2017, lng = ~lon, lat = ~lat, radius = 1,
popup = ~paste0("Incident Date: ", Incident_date, "<br>",
"Address: ", Geo_Address, "<br>",
"Number Killed: " , No_killed, "<br>",
"Number Injured: ", No_injured),
color = "brown",
group = "2017") %>%
addCircleMarkers(data = ms_2017, lng = ~lon, lat = ~lat, radius = 1,
popup = ~paste0("Incident Date: ", Incident_date, "<br>",
"Address: ", Geo_Address, "<br>",
"Number Killed: " , No_killed, "<br>",
"Number Injured: ", No_injured),
color = "purple",
group = "2018") %>%
addCircleMarkers(data = ms_2016, lng = ~lon, lat = ~lat, radius = 1,
popup = ~paste0("Incident Date: ", Incident_date, "<br>",
"Address: ", Geo_Address, "<br>",
"Number Killed: " , No_killed, "<br>",
"Number Injured: ", No_injured),
color = "black",
group = "2019") %>%
addCircleMarkers(data = ms_2020, lng = ~lon, lat = ~lat, radius = 1,
popup = ~paste0("Incident Date: ", Incident_date, "<br>",
"Address: ", Geo_Address, "<br>",
"Number Killed: " , No_killed, "<br>",
"Number Injured: ", No_injured),
color = "orange",
group = "2020") %>%
addCircleMarkers(data = ms_2021, lng = ~lon, lat = ~lat, radius = 1,
popup = ~paste0("Incident Date: ", Incident_date, "<br>",
"Address: ", Geo_Address, "<br>",
"Number Killed: " , No_killed, "<br>",
"Number Injured: ", No_injured),
color = "red",
group = "2021") %>%
addLayersControl(baseGroups = c("2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"))
# display the chart
ms_map# plot the mass shooting incidents by state
ms_state <- plotly:: ggplotly(mass_shooting %>% count(State) %>%
ggplot(aes(x=reorder(State, n), y=n, fill=n, text=State)) +
geom_bar(stat='identity', fill='red') + coord_flip() +
labs(x='', y='Number of incidents') + theme(plot.title = element_text(hjust = 0.5),
panel.background = element_rect(fill = "lightyellow"),
plot.background = element_rect(fill = "cornsilk")) +labs(title = "Mass Shootings by State",
x = NULL, y = "Total Incidents", caption = "Illinois, California, New York, and Pennsylvania has the most shootings") + theme(axis.line = element_line(size = 0.5)))
# display the chart
ms_stateLas Vegas mass shooting of October 1st, 2017 has the highest number of victims with 500 recorded victims
# Create a column for total victims
mass_shooting$victims <- mass_shooting$No_killed + mass_shooting$No_injured
# Subset the data for the top 10 victims
ms_top_10 <- mass_shooting %>%
select(Incident_date, State, City, No_killed, No_injured, victims) %>%
arrange(desc(victims)) %>% top_n(n=10, wt=victims)
# display the table
ms_top_10## # A tibble: 10 x 6
## Incident_date State City No_killed No_injured victims
## <date> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2017-10-01 Nevada Las Vegas 59 441 500
## 2 2016-06-12 Florida Orlando 50 53 103
## 3 2017-11-05 Texas Sutherland Springs 27 20 47
## 4 2019-08-03 Texas El Paso 23 23 46
## 5 2015-12-02 California San Bernardino 16 19 35
## 6 2018-02-14 Florida Pompano Beach (Parkland) 17 17 34
## 7 2019-08-31 Texas Odessa 8 23 31
## 8 2015-05-17 Texas Waco 9 18 27
## 9 2019-08-04 Ohio Dayton 10 17 27
## 10 2017-07-01 Arkansas Little Rock 0 25 25
The State of Illinois has the highest number of victims
# Create mass shooting victims columns
mass_shooting$victims <- mass_shooting$No_killed + mass_shooting$No_injured
#
ms_victims_by_state <- mass_shooting %>% group_by(State) %>% summarize(sumVic=sum(victims), sumInj=sum(No_injured), sumDeath=sum(No_killed), PercDeath=round(sumDeath/sumVic,2), sumIncidents=n(), vicPerInc=round(sumVic/sumIncidents,1)) %>% arrange(desc(sumVic))
# display the data
ms_victims_by_state## # A tibble: 49 x 7
## State sumVic sumInj sumDeath PercDeath sumIncidents vicPerInc
## <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 Illinois 1759 1518 241 0.14 355 5
## 2 California 1615 1260 355 0.22 317 5.1
## 3 Texas 1252 889 363 0.29 217 5.8
## 4 Florida 1180 905 275 0.23 203 5.8
## 5 New York 767 683 84 0.11 157 4.9
## 6 Louisiana 758 633 125 0.16 149 5.1
## 7 Pennsylvania 733 603 130 0.18 152 4.8
## 8 Ohio 718 576 142 0.2 135 5.3
## 9 Georgia 631 495 136 0.22 127 5
## 10 Nevada 619 535 84 0.14 26 23.8
## # ... with 39 more rows
# plot the chart for victims per incident by state
ms_victims_state_map <- ms_victims_by_state %>% filter(vicPerInc > 5) %>%
ggplot(aes(x=reorder(State, -vicPerInc), y=vicPerInc)) + geom_bar(stat='identity', fill='red') +
labs(x='State', y='Victims per incidents') + geom_text(aes(label = vicPerInc), vjust = 0) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme_bw() + coord_flip() + theme(plot.title = element_text(hjust = 0.5),
panel.background = element_rect(fill = "beige"),
plot.background = element_rect(fill = "cornsilk")) +labs(title = "Victims per Incidents by State",
caption = "Nevada has the most victims per incident because of the single shooting in Las Vegas in 2017 that had about 500 victims")
# display the chart
ms_victims_state_mapEvolving Trend - Unserialized and Untraceable FireArms
What are Ghost Guns?:
Ghost guns are unserialized and untraceable firearms that are constructed by individuals using unfinished frames or receivers. They are available for purchase by anyone, including prohibited purchasers, domestic abusers, and gun traffickers – without a background check.
The current gun laws do not require a background check to purchase a ghost gun kit or parts because the receivers and frames are unfinished. An individual can legally purchase a receiver or frame that is 80% complete or less, then modify the parts with a drill press or power drill, router jig, and other components into an effective and reliable firearm.
xml2 and rvest package and making use of the SelectorGadget chromes extension. The articles are:
Web Scrape The Federalist Article
# webscrape the federalist article on ghost guns
url_federalist <- "https://thefederalist.com/2018/08/02/3-d-printed-guns-reminder-gun-control-america-futile/"
federalist_root_html <- read_html(url_federalist)
federalist_body <- federalist_root_html %>%
html_nodes("p") %>% # indicates node name
html_text(trim = TRUE)
names(federalist_body) <- c("newspaper") # insert new column for newspaper name
federalist_body <- as_tibble(federalist_body)
federalist_body$newspaper <- "Federalist"Web Scrape The Tribune Article
# webscrape the tribune article on ghost guns
url_tribune <- "https://www.chicagotribune.com/nation-world/ct-aud-nw-nyt-cb-ghost-guns-20210409-v7osb6gxvfgdneo6z7v5dupdt4-story.html"
tribune_root_html <- read_html(url_tribune)
tribune_body<- tribune_root_html %>%
html_nodes(".heavy-text , .crd--cnt p") %>% # indicates nodes name
html_text(trim = TRUE)
names(tribune_body) <- c("newspaper")
tribune_body <- as_tibble(tribune_body)
tribune_body$newspaper <- "Tribune"Web Scrape The Guardian Article
# webscrape the guardian article on ghost guns
url_guardian <- "https://www.theguardian.com/us-news/2021/may/18/california-ghost-guns-deadly-toll"
guardian_root_html <- read_html(url_guardian)
guardian_body <- guardian_root_html %>%
html_nodes("h2 , .dcr-o5gy41") %>% # indicates nodes name
html_text(trim = TRUE)
names(guardian_body) <- c("newspaper")
guardian_body <- as_tibble(guardian_body)
guardian_body$newspaper <- "Guardian"The corpus will store the newspapers textual data, body, for text analysis
# combine all three web scrapped articles into a corpus
corpus <- rbind(federalist_body, guardian_body, tribune_body)
# create corpus
corpus <- Corpus(VectorSource(corpus))
#inspect(corpus)Transform/Clean the Corpus
Transform and clean the corpus
# clean the corpus
corpus_text <- tm_map(corpus[1], tolower)
corpus_text <- tm_map(corpus_text, removePunctuation)
corpus_text <- tm_map(corpus_text, removeNumbers)
cleanset <- tm_map(corpus_text, removeWords, stopwords('english'))Create a Term-document matrix
# text mining
tdm <- TermDocumentMatrix(cleanset)
tdm <- as.matrix(tdm)
words <- sort(rowSums(tdm), decreasing = TRUE)
df <- data.frame(word = names(words), freq = words)
head(df, 5)## word freq
## gun gun 77
## guns guns 64
## ghost ghost 55
## said said 23
## violence violence 20
The term-document matrix (tdm) will compare all the terms or words across each document. The tdm results was sorted and summarized in a decreasing order. The selected Ghost Guns Articles top 5 frequent words pertain to gun, guns, ghost, said, and violence.
Visualization of the most common words
most_word <- sort(rowSums(tdm), decreasing = TRUE)
set.seed(222)
wordcloud(words = names(most_word),
freq = most_word,
max.words = 500,
random.order = F,
min.freq = 3,
colors = brewer.pal(8, 'Dark2'),
scale = c(6, 0.4),
rot.per = 0.7)From the word cloud, we can see that “gun”, “guns”, “ghost”, “violence”, and “said” are the five(5) most important words in the corpus.
The sentiment lexicon used is the syuzhet package which will pull out the eight basic emotions as well as positive and negative sentiment from the selected newspaper articles. The expanding accessibility of ghost guns (aka: homemade weapons, or do-it-yourself guns) in our communities may create disparity where negative and positive sentiments are equally influential on emotions. The highest three emotions: fear, anger, and trust, also indicates the level of intensity a ghost gun as a firearm.
# get sentiment lexicon
sentiment <- get_nrc_sentiment(df$word)
# display the table of sentiments
head(sentiment, 5)## anger anticipation disgust fear joy sadness surprise trust negative positive
## 1 1 0 0 1 0 0 0 0 1 0
## 2 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 1 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0
## 5 1 0 0 1 0 1 0 0 1 0
# Barplot
barplot(colSums(sentiment),
las = 2,
col = rainbow(10),
ylab = 'Count',
main = 'Sentiment Scores Issue')
From the analysis using the dataset from GVA, we found that there has been an overall upward trend in mass shootings incidents in the United States from 2014 to present with a major increase from 2019 to 2020. Also, we found that the months of July and June have the most incidents of mass shootings. Furthermore, there are more mass shootings on Sundays and Saturdays (weekends) than other weekdays. In addition, Illinois (Chicago), California, Texas, Florida, and New York are the top states with the most incidents of mass shootings, and there is an average of 5 - 6 victims per mass shooting incident. Also from the sentiment analysis of the three political views towards ghost guns, the sentiment scores shows a high intensity of fear and it signals that society recognizes the influence ghost guns play in mass shootings, as an undetectable, unregulated, firearm. Although the federal Bureau of Alcohol, Tobacco, and Firearms (ATF) and state agencies enforces gun laws, ghost guns are not defined as " a firearm".