DATA607 Project - Mass Shootings

INTRODUCTION

Mass Shooting has been a problem unique to the United States mainly because of its unique laws embedded in the Second Amendment (“The Right of the people to keep and bear Arms”) of the US constitution. Other developed countries do not have the extent of mass shooting in the US. In this project, we analyzed verifiable mass shooting incidents (incidents with at least four victims) obtained from the Gun Violence Archive (GVA) to determine:

Trend in mass shootings from 2014 - Present (November 20th, 2021)

Quarter, month of the year, and day of the week with the most mass shootings incidents

Locations (State and Cities) in the United States with the most mass shooting incidents

Total Number of victims per mass shooting incident

Understand Sentiments in the new trends of unregulated/serialized gun ownership in the US

DATA

Data Source:

The data used in this analysis was obtained from Gun Violence Archive (GVA). GVA is a non-profit organization that collects and aggregates all verifiable gun violence incidents in the United States for easy access for anyone on their website. According to GVA, mass shootings are defined as mass shooting incidents where there are at least 4 victims.

Data Collection:

We used the mass shootings data from 2014 to 2021 available on the GVA website. The current year(2021) does not contain full data, so we used a cut off date of November 20th, 2021 for the current year.

csv: We downloaded the csv files for mass shootings from 2014 - 2021

API: We used google API to get the longitude and latitude of all locations of mass shooting incidents in the US according to 2014 - 2021 data from GVA.

Web: We web scrapped 3 News articles to perform sentiment analysis on opinions of Ghost Guns

Data Preparation:

After downloading the data on mass shooting from 2014 - 2021 separately from the GVA website, we hosted them on the github repository, then read those files individually into “R”, and combined the whole datasets into one(1) dataset called mass_shooting and stored that in github as well. The original data source did not have the longitude and latitude of the locations of incidents that was necessary in order to develop the leaflet map, so we used google API from the ggmap package to obtain the longitude and latitude of all locations of mass_shootings incidents by passing the complete address (“Geo_Address”) as argument to the mutate_geocode() function (Note: The function requires an API key from google to work) from ggmap package. The data preparation codes can be found in another document located here

Variables:

There are 10 variables in this dataset with 3,335 observations. The total observations indicate the total number of mass shooting incidents in the United States from 2014 to 2021 (November 20th, 2021).

The variables in the dataset are:

Incident_ID: Unique ID for each mass shooting incident

Incident_date: Actual date that the incident occurred

State: State where the incident occurred

City: City where the incident occurred

Address: Address where the incident occurred.

No_injured: Number of individuals that were injured in a given mass shooting incident

No_killed: Number of individuals killed in a mass shooting incident

Geo_Address: Geo_address where the incident occurred. Gotten from combining Address, City, and State

lon: Longitude of the location of incident gotten from google API by passing the Geo_Address as argument

lat: Latitude of the location of incident gotten from google API by passing the Geo_Address as argument

ANALYSIS

Required Libraries

# Load Libraries
library(tidyverse)
library(lubridate)
library(stringr)
library(leaflet)
library(sp)
library(ggmap)
library(leaflet.extras)
library(htmltools)
library(plotly)
library(gridExtra)
library(scales)
library(ggrepel)
library(rgdal)
library(xml2)
library(rvest)
library(stringr)
library(stringi)
library(reactable) 
library(syuzhet)
library(tm)
library(wordcloud)

Load the data: The the data from a github repo:

# read the file
url <- "https://raw.githubusercontent.com/chinedu2301/data607-project-gun-violence/main/Data/mass_shooting.csv"
mass_shooting <- read_csv(url)
mass_shooting <- mass_shooting %>% select(Incident_ID:lat)

Check the head

# Look at the head of the data
head(mass_shooting)

## # A tibble: 6 x 10
##   Incident_ID Incident_date State      City  Address No_killed No_injured Geo_Address
##         <dbl> <date>        <chr>      <chr> <chr>       <dbl>      <dbl> <chr>      
## 1      271363 2014-12-29    Louisiana  New ~ Poydra~         0          4 Poydras an~
## 2      269679 2014-12-27    California Los ~ 8800 b~         1          3 8800 block~
## 3      270036 2014-12-27    California Sacr~ 4000 b~         0          4 4000 block~
## 4      269167 2014-12-26    Illinois   East~ 2500 b~         1          3 2500 block~
## 5      268598 2014-12-24    Missouri   Sain~ 18th a~         1          3 18th and P~
## 6      267792 2014-12-23    Kentucky   Winc~ 260 Ox~         1          3 260 Oxford~
## # ... with 2 more variables: lon <dbl>, lat <dbl>

Use glimpse to check the column types

glimpse(mass_shooting)

## Rows: 3,335
## Columns: 10
## $ Incident_ID   <dbl> 271363, 269679, 270036, 269167, 268598, 267792, 268282, ~
## $ Incident_date <date> 2014-12-29, 2014-12-27, 2014-12-27, 2014-12-26, 2014-12~
## $ State         <chr> "Louisiana", "California", "California", "Illinois", "Mi~
## $ City          <chr> "New Orleans", "Los Angeles", "Sacramento", "East St. Lo~
## $ Address       <chr> "Poydras and Bolivar", "8800 block of South Figueroa Str~
## $ No_killed     <dbl> 0, 1, 0, 1, 1, 1, 1, 4, 0, 2, 1, 1, 4, 0, 7, 1, 0, 0, 0,~
## $ No_injured    <dbl> 4, 3, 4, 3, 3, 3, 3, 2, 5, 2, 4, 8, 0, 4, 0, 4, 5, 4, 4,~
## $ Geo_Address   <chr> "Poydras and Bolivar, New Orleans, Louisiana", "8800 blo~
## $ lon           <dbl> -90.08391, -118.28271, -121.44327, -90.12542, -90.20594,~
## $ lat           <dbl> 29.95430, 33.95737, 38.64043, 38.61962, 38.63066, 38.003~

Mass Shootings Analysis - DateTime

Mass Shootings by Year

# extract year from date using the lubridate year() function
mass_shooting$year <- year(mass_shooting$Incident_date) 

# plot a bar chart to show the distribution of mass shooting incidents by year
ms_year <- mass_shooting %>%
        ggplot(aes(x=as.factor(year))) + geom_bar(stat='count', fill='purple') +
        scale_y_continuous(labels=comma) + labs(x='Year', y='Total Incidents', title='Incidents by year') + 
        geom_label(stat = "count", aes(label = ..count.., y = ..count..)) + 
        theme_bw() + theme(plot.title = element_text(hjust = 0.5)) + 
        labs(title = "Mass Shooting Incidents by Year", 
             subtitle = "Mass shootings increased from 2018") + 
        theme(panel.background = element_rect(fill = "floralwhite"),
            plot.background = element_rect(fill = "cornsilk")) +
        labs(subtitle = NULL, caption = "Mass shootings increased from 2018")

# display the chart
ms_year

Mass Shootings by Quarter

# extract Quarters from date
mass_shooting$quarter <- quarter(mass_shooting$Incident_date) 

# plot a bar chart to show the distribution of mass shooting incidents by quarter
ms_quarter <- mass_shooting %>% filter(year!=2013) %>% select(year, quarter) %>% group_by(year) %>%
  count(quarter) %>%
  ggplot(aes(x=as.factor(quarter), y=n, fill = quarter)) + geom_bar(stat='identity') + 
        scale_y_continuous(labels=comma) + facet_grid(.~year) + 
  labs(x='Quarter', y='Total Incidents', title='Incidents by Quarter') + 
  theme(plot.title =    element_text(hjust = 0.5),
        plot.background = element_rect(fill = "cornsilk")) + 
  labs(title = "Mass Shooting Incidents by Quarter",
        subtitle = "The third quarter has the most incidents except for 2019") + 
  theme_bw() + theme(plot.title = element_text(hjust = 0.5),
        panel.background = element_rect(fill = "beige"),
        plot.background = element_rect(colour = "coral4")) +
  labs(subtitle = NULL, caption = "The third quarter has the most mass shooting incidents except for 2019") +   theme(plot.background = element_rect(fill = "cornsilk"))

# display the chart
ms_quarter

Mass Shootings by Month

# extract month from date using the lubridate year() function
mass_shooting$month <- month(mass_shooting$Incident_date, label=TRUE)

# plot a chart to show the distribution of mass shooting incidents by month
ms_month <- mass_shooting %>% count(month) %>%
        ggplot(aes(x=month, y=n)) + geom_bar(stat='identity', fill='purple') +
        scale_y_continuous(labels=comma) +
        labs(x='month', y='Total Incidents', title='Incidents by month') + theme_bw() + 
        theme(plot.title = element_text(hjust = 0.5),
            panel.background = element_rect(fill = "beige"),
            plot.background = element_rect(fill = "cornsilk")) +
        labs(title = "Mass Shooting Incidents by Month",
            x = "Month", caption = "July has the most number of incidents followed by June considering all incidents from 2014 - 2021")

# display the chart
ms_month

Mass Shootings by Weekday

# extract day from date using lubridate wday() function
mass_shooting$weekday <- wday(mass_shooting$Incident_date, label=TRUE)

# Plot a chart to see the distribution by weekday
ms_wday <- mass_shooting %>% count(weekday) %>%
        ggplot(aes(x=weekday, y=n)) + geom_bar(stat='identity', fill=rainbow(n=7)) +
            scale_y_continuous(labels=comma) +
            labs(x='Weekday', y='Number of incidents', title='Incidents by Weekday') + theme_bw() + theme(plot.title = element_text(hjust = 0.5),
    panel.background = element_rect(fill = "floralwhite"),
    plot.background = element_rect(fill = "cornsilk")) +labs(title = "Mass Shooting Incidents by Weekday",
    y = "Total Incidents", caption = "Most mass shooting incidents occur on Saturday and Sundays (weekends)")

# display the chart
ms_wday

Mass Shootings Analysis - Location

Mass Shootings by State - Map

# filter the datasets for each year
ms_2014 <- mass_shooting %>% filter(year(Incident_date) == 2014)
ms_2015 <- mass_shooting %>% filter(year(Incident_date) == 2015)
ms_2016 <- mass_shooting %>% filter(year(Incident_date) == 2016)
ms_2017 <- mass_shooting %>% filter(year(Incident_date) == 2017)
ms_2018 <- mass_shooting %>% filter(year(Incident_date) == 2018)
ms_2019 <- mass_shooting %>% filter(year(Incident_date) == 2019)
ms_2020 <- mass_shooting %>% filter(year(Incident_date) == 2020)
ms_2021 <- mass_shooting %>% filter(year(Incident_date) == 2021)

# Instantiate a leaflet map and plot map
ms_map <- leaflet() %>% addTiles() %>% 
  setView(lng = -95.7129, lat = 37.0902 , zoom = 4 ) %>%
  addCircleMarkers(data = ms_2014, lng = ~lon, lat = ~lat, radius = 1,
                   popup = ~paste0("Incident Date: ", Incident_date, "<br>",
                                   "Address: ", Geo_Address, "<br>",
                                   "Number Killed: " , No_killed, "<br>",
                                   "Number Injured: ", No_injured),
                   color = "magenta",
                   group = "2014") %>%
  addCircleMarkers(data = ms_2015, lng = ~lon, lat = ~lat, radius = 1,
                   popup = ~paste0("Incident Date: ", Incident_date, "<br>",
                                   "Address: ", Geo_Address, "<br>",
                                   "Number Killed: " , No_killed, "<br>",
                                   "Number Injured: ", No_injured),
                   color = "blue",
                   group = "2015") %>% 
  addCircleMarkers(data = ms_2016, lng = ~lon, lat = ~lat, radius = 1,
                   popup = ~paste0("Incident Date: ", Incident_date, "<br>",
                                   "Address: ", Geo_Address, "<br>",
                                   "Number Killed: " , No_killed, "<br>",
                                   "Number Injured: ", No_injured),
                   color = "green",
                   group = "2016") %>% 
  addCircleMarkers(data = ms_2017, lng = ~lon, lat = ~lat, radius = 1,
                   popup = ~paste0("Incident Date: ", Incident_date, "<br>",
                                   "Address: ", Geo_Address, "<br>",
                                   "Number Killed: " , No_killed, "<br>",
                                   "Number Injured: ", No_injured),
                   color = "brown",
                   group = "2017") %>% 
  addCircleMarkers(data = ms_2017, lng = ~lon, lat = ~lat, radius = 1,
                   popup = ~paste0("Incident Date: ", Incident_date, "<br>",
                                   "Address: ", Geo_Address, "<br>",
                                   "Number Killed: " , No_killed, "<br>",
                                   "Number Injured: ", No_injured),
                   color = "purple",
                   group = "2018") %>% 
  addCircleMarkers(data = ms_2016, lng = ~lon, lat = ~lat, radius = 1,
                   popup = ~paste0("Incident Date: ", Incident_date, "<br>",
                                   "Address: ", Geo_Address, "<br>",
                                   "Number Killed: " , No_killed, "<br>",
                                   "Number Injured: ", No_injured),
                   color = "black",
                   group = "2019") %>% 
  addCircleMarkers(data = ms_2020, lng = ~lon, lat = ~lat, radius = 1,
                   popup = ~paste0("Incident Date: ", Incident_date, "<br>",
                                   "Address: ", Geo_Address, "<br>",
                                   "Number Killed: " , No_killed, "<br>",
                                   "Number Injured: ", No_injured),
                   color = "orange",
                   group = "2020") %>% 
  addCircleMarkers(data = ms_2021, lng = ~lon, lat = ~lat, radius = 1,
                   popup = ~paste0("Incident Date: ", Incident_date, "<br>",
                                   "Address: ", Geo_Address, "<br>",
                                   "Number Killed: " , No_killed, "<br>",
                                   "Number Injured: ", No_injured),
                   color = "red",
                   group = "2021") %>% 
  addLayersControl(baseGroups = c("2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021")) 

# display the chart
ms_map

Mass Shootings by States - Bar Chart

# plot the mass shooting incidents by state 
ms_state <- plotly:: ggplotly(mass_shooting %>% count(State) %>%
        ggplot(aes(x=reorder(State, n), y=n, fill=n, text=State)) +
        geom_bar(stat='identity', fill='red') + coord_flip() +
            labs(x='', y='Number of incidents') + theme(plot.title = element_text(hjust = 0.5),
            panel.background = element_rect(fill = "lightyellow"),
            plot.background = element_rect(fill = "cornsilk")) +labs(title = "Mass Shootings by State",
    x = NULL, y = "Total Incidents", caption = "Illinois, California, New York, and Pennsylvania has the most shootings") + theme(axis.line = element_line(size = 0.5)))

# display the chart
ms_state

Mass Shooting Analysis - Victims

Top Ten(10) Incidents by Number of Victims

Las Vegas mass shooting of October 1st, 2017 has the highest number of victims with 500 recorded victims

# Create a column for total victims
mass_shooting$victims <- mass_shooting$No_killed + mass_shooting$No_injured

# Subset the data for the top 10 victims
ms_top_10 <- mass_shooting %>% 
  select(Incident_date, State, City, No_killed, No_injured, victims) %>% 
  arrange(desc(victims)) %>% top_n(n=10, wt=victims)

# display the table
ms_top_10

## # A tibble: 10 x 6
##    Incident_date State      City                     No_killed No_injured victims
##    <date>        <chr>      <chr>                        <dbl>      <dbl>   <dbl>
##  1 2017-10-01    Nevada     Las Vegas                       59        441     500
##  2 2016-06-12    Florida    Orlando                         50         53     103
##  3 2017-11-05    Texas      Sutherland Springs              27         20      47
##  4 2019-08-03    Texas      El Paso                         23         23      46
##  5 2015-12-02    California San Bernardino                  16         19      35
##  6 2018-02-14    Florida    Pompano Beach (Parkland)        17         17      34
##  7 2019-08-31    Texas      Odessa                           8         23      31
##  8 2015-05-17    Texas      Waco                             9         18      27
##  9 2019-08-04    Ohio       Dayton                          10         17      27
## 10 2017-07-01    Arkansas   Little Rock                      0         25      25

Victims per Incident

The State of Illinois has the highest number of victims

# Create mass shooting victims columns
mass_shooting$victims <- mass_shooting$No_killed + mass_shooting$No_injured

# 
ms_victims_by_state <- mass_shooting %>% group_by(State)  %>%   summarize(sumVic=sum(victims), sumInj=sum(No_injured), sumDeath=sum(No_killed), PercDeath=round(sumDeath/sumVic,2), sumIncidents=n(), vicPerInc=round(sumVic/sumIncidents,1)) %>% arrange(desc(sumVic))

# display the data
ms_victims_by_state

## # A tibble: 49 x 7
##    State        sumVic sumInj sumDeath PercDeath sumIncidents vicPerInc
##    <chr>         <dbl>  <dbl>    <dbl>     <dbl>        <int>     <dbl>
##  1 Illinois       1759   1518      241      0.14          355       5  
##  2 California     1615   1260      355      0.22          317       5.1
##  3 Texas          1252    889      363      0.29          217       5.8
##  4 Florida        1180    905      275      0.23          203       5.8
##  5 New York        767    683       84      0.11          157       4.9
##  6 Louisiana       758    633      125      0.16          149       5.1
##  7 Pennsylvania    733    603      130      0.18          152       4.8
##  8 Ohio            718    576      142      0.2           135       5.3
##  9 Georgia         631    495      136      0.22          127       5  
## 10 Nevada          619    535       84      0.14           26      23.8
## # ... with 39 more rows

Victims per Incident - Chart

# plot the chart for victims per incident by state
ms_victims_state_map <- ms_victims_by_state %>% filter(vicPerInc > 5) %>%
        ggplot(aes(x=reorder(State, -vicPerInc), y=vicPerInc)) + geom_bar(stat='identity', fill='red') +
        labs(x='State', y='Victims per incidents') + geom_text(aes(label = vicPerInc), vjust = 0) +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme_bw() + coord_flip() +                        theme(plot.title = element_text(hjust = 0.5),
            panel.background = element_rect(fill = "beige"),
            plot.background = element_rect(fill = "cornsilk")) +labs(title = "Victims per Incidents by State",
            caption = "Nevada has the most victims per incident because of the single shooting in Las Vegas in 2017 that had about 500 victims")

# display the chart
ms_victims_state_map

New Trends: Ghost Guns

Evolving Trend - Unserialized and Untraceable FireArms Science | HowStuffWorks

What are Ghost Guns?:
Ghost guns are unserialized and untraceable firearms that are constructed by individuals using unfinished frames or receivers. They are available for purchase by anyone, including prohibited purchasers, domestic abusers, and gun traffickers – without a background check.

The current gun laws do not require a background check to purchase a ghost gun kit or parts because the receivers and frames are unfinished. An individual can legally purchase a receiver or frame that is 80% complete or less, then modify the parts with a drill press or power drill, router jig, and other components into an effective and reliable firearm.

Sentiments on Ghost Guns

Ghost Guns are becoming more common, so we decided to do a sentiment analysis on what the public think about ghost guns using articles from three(3) different media outlets. So we web-scrapped Newspaper articles with xml2 and rvest package and making use of the SelectorGadget chromes extension. The articles are:

The Federalist - 3-D Printed Guns Are A Reminder That Gun Control In America Is Futile

The Guardian - Ordered online, assembled at home: the deadly toll of California’s ‘ghost guns’

Chicago Tribune - Ghost guns: What they are, and why they are an issue now

Web Scrapping News Article

Web Scrape The Federalist Article

# webscrape the federalist article on ghost guns
url_federalist <- "https://thefederalist.com/2018/08/02/3-d-printed-guns-reminder-gun-control-america-futile/"
federalist_root_html <- read_html(url_federalist)
federalist_body <- federalist_root_html %>%
  html_nodes("p") %>%  # indicates node name
  html_text(trim = TRUE) 

names(federalist_body) <- c("newspaper") # insert new column for newspaper name

federalist_body <- as_tibble(federalist_body)

federalist_body$newspaper <- "Federalist"

Web Scrape The Tribune Article

# webscrape the tribune article on ghost guns
url_tribune <- "https://www.chicagotribune.com/nation-world/ct-aud-nw-nyt-cb-ghost-guns-20210409-v7osb6gxvfgdneo6z7v5dupdt4-story.html"
tribune_root_html <- read_html(url_tribune)
tribune_body<- tribune_root_html %>%
  html_nodes(".heavy-text , .crd--cnt p") %>% # indicates nodes name
  html_text(trim = TRUE) 

names(tribune_body) <- c("newspaper")

tribune_body <- as_tibble(tribune_body)

tribune_body$newspaper <- "Tribune"

Web Scrape The Guardian Article

# webscrape the guardian article on ghost guns
url_guardian <- "https://www.theguardian.com/us-news/2021/may/18/california-ghost-guns-deadly-toll"
guardian_root_html <- read_html(url_guardian)
guardian_body <- guardian_root_html %>%
  html_nodes("h2 , .dcr-o5gy41") %>%  # indicates nodes name
  html_text(trim = TRUE)

names(guardian_body) <- c("newspaper")

guardian_body <- as_tibble(guardian_body)

guardian_body$newspaper <- "Guardian"

Create Corpus

The corpus will store the newspapers textual data, body, for text analysis

# combine all three web scrapped articles into a corpus
corpus <- rbind(federalist_body, guardian_body, tribune_body)

# create corpus
corpus <- Corpus(VectorSource(corpus))

#inspect(corpus)

Transform/Clean the Corpus
Transform and clean the corpus

# clean the corpus
corpus_text <- tm_map(corpus[1], tolower)
corpus_text <- tm_map(corpus_text, removePunctuation)
corpus_text <- tm_map(corpus_text, removeNumbers)
cleanset <- tm_map(corpus_text, removeWords, stopwords('english'))

Create a Term-document matrix

# text mining
tdm <- TermDocumentMatrix(cleanset)
tdm <- as.matrix(tdm)

words <- sort(rowSums(tdm), decreasing = TRUE)

df <- data.frame(word = names(words), freq = words)

head(df, 5)

##              word freq
## gun           gun   77
## guns         guns   64
## ghost       ghost   55
## said         said   23
## violence violence   20

The term-document matrix (tdm) will compare all the terms or words across each document. The tdm results was sorted and summarized in a decreasing order. The selected Ghost Guns Articles top 5 frequent words pertain to gun, guns, ghost, said, and violence.

Word Cloud

Visualization of the most common words

most_word <- sort(rowSums(tdm), decreasing = TRUE)
set.seed(222)
wordcloud(words = names(most_word),
          freq = most_word,
          max.words = 500,
          random.order = F,
          min.freq = 3,
          colors = brewer.pal(8, 'Dark2'),
          scale = c(6, 0.4),
          rot.per = 0.7)

From the word cloud, we can see that “gun”, “guns”, “ghost”, “violence”, and “said” are the five(5) most important words in the corpus.

Sentiment Analysis

The sentiment lexicon used is the syuzhet package which will pull out the eight basic emotions as well as positive and negative sentiment from the selected newspaper articles. The expanding accessibility of ghost guns (aka: homemade weapons, or do-it-yourself guns) in our communities may create disparity where negative and positive sentiments are equally influential on emotions. The highest three emotions: fear, anger, and trust, also indicates the level of intensity a ghost gun as a firearm.

# get sentiment lexicon
sentiment <- get_nrc_sentiment(df$word)

# display the table of sentiments
head(sentiment, 5)

##   anger anticipation disgust fear joy sadness surprise trust negative positive
## 1     1            0       0    1   0       0        0     0        1        0
## 2     0            0       0    0   0       0        0     0        0        0
## 3     0            0       0    1   0       0        0     0        0        0
## 4     0            0       0    0   0       0        0     0        0        0
## 5     1            0       0    1   0       1        0     0        1        0

# Barplot
barplot(colSums(sentiment),
        las = 2,
        col = rainbow(10),
        ylab = 'Count',
        main = 'Sentiment Scores Issue')

FINAL CONCLUSIONS ON MASS SHOOTINGS

From the analysis using the dataset from GVA, we found that there has been an overall upward trend in mass shootings incidents in the United States from 2014 to present with a major increase from 2019 to 2020. Also, we found that the months of July and June have the most incidents of mass shootings. Furthermore, there are more mass shootings on Sundays and Saturdays (weekends) than other weekdays. In addition, Illinois (Chicago), California, Texas, Florida, and New York are the top states with the most incidents of mass shootings, and there is an average of 5 - 6 victims per mass shooting incident. Also from the sentiment analysis of the three political views towards ghost guns, the sentiment scores shows a high intensity of fear and it signals that society recognizes the influence ghost guns play in mass shootings, as an undetectable, unregulated, firearm. Although the federal Bureau of Alcohol, Tobacco, and Firearms (ATF) and state agencies enforces gun laws, ghost guns are not defined as " a firearm".