1 Introduction

Dataset is from US Mass Shootings from Kaggle.
Mass shooting in the U.S. has been a quite big problem. Especcially, these day i saw topics related to shooting accidents on so many media because of the accident in Las Vegas. However, to be honest, i had no idea this problems were so big until I hear the shooting in Las Vegas since i was grown up in the society which has nothing to do with any types of wapons like guns.
I felt really embarassed being such fool. So, for myself and those who are not familiar with this social problems, i would like to try analyse this dataset and understand a little bit more about what is actually going on in the U.S. and hopefully this article will help you understand and have interests about guns problems.

2 Data Source

  1. Mass shootings in the United States by year
  2. Mother Jones
  3. USA Today
  4. Stanford: Mass Shootings in America

3 Motivation of this analysis

  1. Visualizations of the dataset to help readers understand more easily.
  2. What cities and states are more prone to attacks.
  3. To find out correlations with dates.
  4. Combine other dataset with this dataset to have further understandings and discoveries.
  5. To have better understanding about gun problems in the U.S.

4 Preparation

library(tidyverse) #for data cleaning and better visualizations
library(data.table) # to import handle dataset faster
library(maps) # for map visualization
library(stringr) # to manipulate character type data variables
library(plotly) # for interactive visualizations
library(DT) # interactive data table

shoot <- fread("Mass Shootings Dataset Ver 5.csv", data.table = F)
shoot <- shoot[,-1]
shoot$Date <- as.Date(shoot$Date, "%m/%d/%Y")
## Warning in strptime(x, format, tz = "GMT"): unknown timezone 'zone/tz/
## 2018c.1.0/zoneinfo/Australia/Sydney'
names(shoot)[11] <- "Total_victims"
names(shoot)[16] <- "Mental_Health_Issues"
shoot$Age <- as.numeric(shoot$Age)
## Warning: NAs introduced by coercion

5 Data Content

shoot %>% head()
##                                 Title               Location       Date
## 1          Texas church mass shooting Sutherland Springs, TX 2017-11-05
## 2 Walmart shooting in suburban Denver           Thornton, CO 2017-11-01
## 3     Edgewood businees park shooting           Edgewood, MD 2017-10-18
## 4       Las Vegas Strip mass shooting          Las Vegas, NV 2017-10-01
## 5          San Francisco UPS shooting      San Francisco, CA 2017-06-14
## 6   Pennsylvania supermarket shooting        Tunkhannock, PA 2017-06-07
##                                 Incident Area Open/Close Location
## 1                                      Church               Close
## 2                                    Wal-Mart                Open
## 3                            Remodeling Store               Close
## 4 Las Vegas Strip Concert outside Mandala Bay                Open
## 5                                UPS facility               Close
## 6                                Weis grocery               Close
##      Target     Cause
## 1    random   unknown
## 2    random   unknown
## 3 coworkers   unknown
## 4    random   unknown
## 5 coworkers          
## 6 coworkers terrorism
##                                                                                                                                                                                                                                                                                                                                         Summary
## 1                                                                                                                                                                                    Devin Patrick Kelley, 26, an ex-air force officer, shot and killed 26 people and wounded 20 at a church in Texas. He was found dead later in his vehicle. 
## 2                  Scott Allen Ostrem, 47, walked into a Walmart in a suburb north of Denver and fatally shot two men and a woman, then left the store and drove away. After an all-night manhunt, Ostrem, who had financial problems but no serious criminal history, was captured by police after being spotted near his apartment in Denver.
## 3 Radee Labeeb Prince, 37, fatally shot three people and wounded two others around 9am at Advance Granite Solutions, a home remodeling business where he worked near Baltimore. Hours later he shot and wounded a sixth person at a car dealership in Wilmington, Delaware. He was apprehended that evening following a manhunt by authorities.
## 4                                                                                                                                     Stephen Craig Paddock, opened fire from the 32nd floor of Manadalay Bay hotel at Last Vegas concert goers for no obvious reason. He shot himself and died on arrival of law enforcement agents. He was 64
## 5                                                                                                                                                             Jimmy Lam, 38, fatally shot three coworkers and wounded two others inside a UPS facility in San Francisco. Lam killed himself as law enforcement officers responded to the scene.
## 6                                                                             Randy Stair, a 24-year-old worker at Weis grocery fatally shot three of his fellow employees. He reportedly fired 59 rounds with a pair of shotguns before turning the gun on himself as another co-worker fled the scene for help and law enforcement responded.
##   Fatalities Injured Total_victims Policeman Killed Age Employeed (Y/N)
## 1         26      20            46                0  26              NA
## 2          3       0             3                0  47              NA
## 3          3       3             6                0  37              NA
## 4         59     527           585                1  64              NA
## 5          3       2             5                0  38               1
## 6          3       0             3               NA  24               1
##             Employed at Mental_Health_Issues  Race Gender Latitude
## 1                                         No White      M       NA
## 2                                         No White      M       NA
## 3 Advance Granite Store                   No Black      M       NA
## 4                                    Unclear White      M 36.18127
## 5                                        Yes Asian      M       NA
## 6          Weis grocery              Unclear White      M       NA
##   Longitude
## 1        NA
## 2        NA
## 3        NA
## 4 -115.1341
## 5        NA
## 6        NA

The dataset has detailed information about 328 mass shootings that happened from 1966 to 2017. The latest accident in this dataset was “Texas church mass shooting” on November 5th in 2017. The oldest accident was “University of Texas at Austin” on August 1st in 1966.

Here are some of the variables that describe details on each accident: * Location(Latitude/Longitude): Explains where exactly each accident happened; * Open/Close Location: Open location means that the accident happened at open space. Close location means that the accident happened inside a bulding; * Mental Health Issues: Inditates whether the criminal had mental issues or not;

6 Analysis

6.1 Geospatial Analysis

6.1.1 Map Visualization

world.map <- map_data("state")
ggplot() + 
  geom_map(data=world.map, map=world.map,
           aes(x=long, y=lat, group=group, map_id=region),
           fill="white", colour="black") + 
  geom_point(data = shoot, 
             aes(x = Longitude, y = Latitude, size = Fatalities), 
             colour = "red", alpha = .6) +
  xlim(-130, -65) + ylim(25,50) +
  labs(title = "Mass Shootings that happened from 1966 to 2017")
## Warning: Ignoring unknown aesthetics: x, y
## Warning: Removed 22 rows containing missing values (geom_point).

6.1.2 What states are more prone to those accidents?

shoot <- shoot %>% 
  separate(Location, into = c("City", "State"), sep = ", ")
## Warning: Expected 2 pieces. Additional pieces discarded in 4 rows [147,
## 176, 225, 241].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 46 rows [16,
## 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 30, 35, 36, 37, 38, 39, 41, 43,
## 44, ...].
pattern <- c("TX|CO|MD|NV|CA|PA|WA|LA")
replacement <- c("Texas", "Colorado", "Maryland", "Nevada", "California", "Pennsylvania", "Washington", "Los Angels")
shoot$State <- shoot$State %>% 
  str_replace_all(pattern = pattern, 
                  replacement = replacement)
## Warning in stri_replace_all_regex(string, pattern,
## fix_replacement(replacement), : longer object length is not a multiple of
## shorter object length
shoot$State <- shoot$State %>% str_replace_all(c("Texas " = "Texas", " Virginia" = "Virginia"))
shoot$State <- as.factor(shoot$State)
state_bar <- as.data.frame(table(shoot$State)) %>% 
  ggplot(aes(x = reorder(Var1, -Freq), y= Freq, fill = Var1)) + 
  geom_bar(stat = "identity", show.legend=F) + 
  labs(x = "State", y = "Count") + 
  theme(axis.text.x = element_text(angle = 90))
ggplotly(state_bar)

6.2 Time series analysis

shoot <- shoot %>% 
  mutate(Year = year(Date), 
         Month = month(Date),
         Weekday = weekdays(Date),
         Week = week(Date),
         WeekMonth = 1+Week-min(Week),
         weekdaynumber = 
           case_when(Weekday == "Monday" ~ 1, Weekday == "Tuesday" ~ 2, Weekday == "Wednesday" ~ 3,
                     Weekday == "Thursday" ~ 4, Weekday == "Friday" ~ 5, Weekday == "Saturday" ~ 6,
                     Weekday == "Sunday" ~ 7),
         MonthChar = 
           case_when(Month == 1 ~ "Jan", Month == 2 ~ "Feb", Month == 3 ~ "Mar", Month == 4 ~ "Apr", Month == 5 ~ "May", Month == 6 ~ "Jun",  
                     Month == 7 ~ "Jul", Month == 8 ~ "Aug", Month == 9 ~ "Sep", Month == 10 ~ "Oct", Month == 11 ~ "Nov", Month == 12 ~ "Dec"))

6.2.1 Calendar Plot 1

shoot %>% 
  plot_ly(x = ~ Month, y = ~Year, z = ~Total_victims, 
          type = "heatmap", mode = 'makers', 
          hoverinfo = 'text',
          text = ~paste(Year, "/",MonthChar,
                        ' ', Total_victims, 'victims', 
                        '"', Title, '"'))
## Warning: 'heatmap' objects don't have these attributes: 'mode'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'z', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'text', 'transpose', 'xtype', 'ytype', 'zsmooth', 'connectgaps', 'xgap', 'ygap', 'zauto', 'zmin', 'zmax', 'colorscale', 'autocolorscale', 'reversescale', 'showscale', 'colorbar', 'xaxis', 'yaxis', 'xcalendar', 'ycalendar', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'zsrc', 'xsrc', 'ysrc', 'textsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

The only thing you will recognise is that the Las vegas shooting that happened in october in 2017 stands out extraordinary. Because of that, we can not tell anything alse.
So, this lead me that, how about ploting every accidents apart from this shooting accident?

6.2.2 Calendar Plot 2

shoot %>% 
  filter(Title != "Las Vegas Strip mass shooting") %>%  
  plot_ly(x = ~ Month, y = ~Year, z = ~Total_victims, 
          type = "heatmap", mode = 'makers', 
          hoverinfo = 'text',
          text = ~paste(Year, "/",MonthChar,
                        ' ', Total_victims, 'victims', 
                        '"', Title, '"'))
## Warning: 'heatmap' objects don't have these attributes: 'mode'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'z', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'text', 'transpose', 'xtype', 'ytype', 'zsmooth', 'connectgaps', 'xgap', 'ygap', 'zauto', 'zmin', 'zmax', 'colorscale', 'autocolorscale', 'reversescale', 'showscale', 'colorbar', 'xaxis', 'yaxis', 'xcalendar', 'ycalendar', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'zsrc', 'xsrc', 'ysrc', 'textsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

Other than the Las Vegas shooting, this plot shows us so many other gun-related accidents that deserve attentions paid such as orland accident in 2016. The second one was “Aurora theater shooting” with 82 victims.
As I mentioned before, the last year’s accident at Las Vegas was by far the most shocking one and brought biggest threat to the entire world. However, those ones are also something we should pay attentions to. I hope this second calendar plot works for those people who want more understanding about the history of US gun shooting accidents history.
Let’s take a alittle closer look at those accidents. At this time, I am gonna take a look at 5 accidents that had unfortunately ended up with so many victims.

6.2.3 Datatable of Histrocal Shooting Accidents

shoot %>% 
  filter(Title != "Las Vegas Strip mass shooting") %>%
  arrange(desc(Total_victims)) %>% 
  select(Summary, Title:Date, -City, Age) %>% 
  head(5) %>% 
  datatable(options = 
              list(pageLength = 5,
                   lengthMenu = c(1,5)))

One common thing I can tell from those 5 incidents was all criminals was in 20s. They are so young.

Those are the wikipedia links of each shooting accident.
1. Orlando nightclub massacre 2. Aurora theater shooting 3. Virginia Tech massacre 4. University of Texas at Austin 5. Texas church mass shooting

It is really heartbreaking and devastatingly shoking to read those of all. I want you guys to read as well if you are ready to know what has actually happened behind all of those data.

Datasets are really important to understand things, however it is not enough to actually know what actually happened in our history. I really apreciate this dataset provider for giving me opportunities to get interested in this huge social problem in the biggest nation in the entire world.

6.3 Age Distribution

distribution <- shoot %>% 
  ggplot(aes(x = Age)) + 
  geom_histogram(col = "red", fill = "pink") + 
  labs(title = "Age Distibition of Criminals")
ggplotly(distribution)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 149 rows containing non-finite values (stat_bin).

Here is the distribution of criminals’ age. A surprising thing i tell from this plot is that there so many teen ager criminals.

shoot <- shoot %>% 
  mutate(Decade = (case_when(Year >= 1960 & Year < 1970 ~ "1960s", 
                             Year >= 1970 & Year < 1980 ~ "1970s", 
                             Year >= 1980 & Year < 1990 ~ "1980s", 
                             Year >= 1990 & Year < 2000 ~ "1990s", 
                             Year >= 2000 & Year < 2010 ~ "2000s", 
                             Year >= 2010 & Year < 2020 ~ "2010s")))
decade_boxplot <- shoot %>% 
  ggplot(aes(x = Decade, y = Age, fill = Decade)) +
  geom_boxplot() + 
  labs(x = "Each Decade", title = "Age Distribution of Each Dacade")
ggplotly(decade_boxplot)
## Warning: Removed 149 rows containing non-finite values (stat_boxplot).

The spread of criminals’ age has been getting bigger as the time passes. This may indicates that the accesibilities to guns are getting easier throughout the country.

pattern2 = c("black|Some other race|white")
replacement2 = c("Black", "Other", "White")
shoot$Race <- shoot$Race %>% 
  str_replace(pattern = pattern2, replacement = replacement2)
## Warning in stri_replace_first_regex(string, pattern,
## fix_replacement(replacement), : longer object length is not a multiple of
## shorter object length

6.4 Analysis of Criminals’ Profile

transition <- shoot %>% 
  ggplot(aes(x = Year, fill = Race)) + 
  geom_bar() +
  labs(title = "Transition of the races of criminals")
ggplotly(transition) %>% layout(showlegend = FALSE)
## Warning: position_stack requires non-overlapping x intervals
decade_trasition_race <- shoot %>% 
  ggplot(aes(x = Decade, fill = Race)) + 
  geom_histogram(stat = "count", show.legend = F, 
                 position = "fill") + 
  labs(title = "Transition of the Ratio of the Races of Criminals by Each Decade")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
ggplotly(decade_trasition_race) %>% layout(showlegend = FALSE)

In 20th century, the main races of criminals were white americans or european americans. However, in 21st century, the number of the other races increased dramatically.
Maybe the imiigrants have impacts to this number.

6.5 What were Mental Health Issues?

as.data.frame(table(shoot$Mental_Health_Issues))[-1,] %>%
  plot_ly(labels = ~Var1, values = ~Freq, type = 'pie') %>%
  layout(title = 'What were the mental health issues?',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

Really unfortunately this dataset doesn’t provide mental health issues terrorists had. Over the half of all the mental health issues are unknown or unclear.
I tried to find the dataset related to mental health issues, however i couldn’t.

7 Conclusion

First of all, my condolences and prayers to all the victims, their families and loved ones. People who were involved in those crimes would not even want to see and read this article. However, throughout this analysis, I could get better understanding what has been going on un the U.S, and I m really happy if more people get attentions by reading my report.
Obviously, It is not a easy problem to solve, however i believe that having better undarstaing will brondens the possibilities of success and reduce the risks.
Thank you for taking time to read my report.

Koki