Data 110 Final

School Shootings: DATA 110 Final Project

Source: Business Insider

Introduction

The United States is infamous for school shootings. If you hear the terms “Sandy Hook” or “Columbine” the first thing that Americans think of are school shootings. After the Columbine shooting, the Washington Post has been keeping track of data of school shootings after Columbine and the statistics of the school where the shooting has taken place. I have chosen the dataset because of one that has nearly occurred locally and I wanted to investigate the statistics of school shootings that have occurred in the past.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(highcharter)
Warning: package 'highcharter' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
dirty_shootings <- read_csv("C:/Users/omyue/Downloads/school-shootins-since-columbine.csv")
Rows: 216 Columns: 50
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (26): nces_school_id, school_name, nces_district_id, district_name, dat...
dbl  (21): uid, year, killed, injured, casualties, age_shooter1, shooter_dec...
num   (2): enrollment, white
time  (1): time

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
shootings <- dirty_shootings |>
  select(school_name, district_name, date, year, day_of_week, state, county, city, enrollment, killed, injured, casualties, age_shooter1, age_shooter2, lat, long)

Linear Regression Analysis

Plot 1: School Shootings since Columbine, 1999

Has the injuries and casualties since Columbine increased or decreased? Let’s create a visualization to answer that. The visualization will be a bar graph that measures the casualties, killed victims, and injured victims from school shootings over the years.

Pt 1A: Cleaning

We’re going to group the data we want by year for the visualization. First, we’re going to go into the dataset and group the data by year and compile the number of casualties, deaths, and injuries using the “group_by()” and “summarise()” functions. Then we will put that data into the dataset “shootingsp1” which will be specifically used for the visualization.

shootingsp1 <- shootings |>
  group_by(year) |>
  summarise(casualties_yr = sum(casualties), deaths_yr = sum(killed), injured_yr = sum(injured))

Pt 1B: Plotting the Graph

After cleaning and organizing our data, we can start plotting the graph.
The code generates a stacked bar chart using the highcharter package to depict school shooting casualties from 1999 to 2018. It begins by initializing the chart and specifying its type as a bar chart. The title, axis labels, and plot options are set next. Then, two series are added to represent the number of injured and killed individuals, with corresponding data extracted from “shootingsp1”. A theme is applied to the chart, and the tooltip is configured to display shared information with a specified border color. Finally, the resulting chart is rendered.

# Create the stacked bar chart
plot1 <- highchart() |> # puts the graph into the variable
  hc_chart(type = "bar") |> # sets the plot type
  hc_title(text = "School Shooting Casualties from 1999-2018") |> # sets the title label
  hc_xAxis(categories = shootingsp1$year) |> # sets label for x-axis
  hc_yAxis(title = list(text = "Number of Casualties")) |> # sets label for y-axis
  hc_plotOptions(series = list(stacking = "normal")) |> # makes the stacked bars
  hc_add_series(name = "Injured", data = shootingsp1$injured_yr, color = "grey") |> # labels and sets color
  hc_add_series(name = "Killed", data = shootingsp1$deaths_yr, color = "#d62828") |> # labels and sets color
  hc_add_theme(hc_theme_538()) |>
  hc_tooltip(shared = TRUE, # shows the numbers of kills and injuried
             borderColor = "black") 
plot1 # prints the graph

The bar graph is used to measure the total casualties caused by school shootings over the years. The side “x-axis” is the year, and the bottom “y-axis” is the number of casualties that have occurred that year. The bars are split to show the proportions of the casualties that are either injuries or kills.

Plot 2: Occurrence of shootings by U.S Region

Now, let’s explore the number of shootings that occur in each region over the years of 1999-2018.

First we’re going to create another separate data frame that will be used for the graph. We need to group the states by U.S region so we will use the “mutate()” function to do so. After that, we’re going to group the data by the region and year, and summarise the variables that we would like for the graph. We will then put all of the data into the dataframe “shootingsp2”

# groups the states by region
shootingsp2 <- shootings %>%
  mutate(region = ifelse(state %in% c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont", "New Jersey", "New York", "Pennsylvania"), "Northeast",
                ifelse(state %in% c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin", "Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota"), "Midwest",
                ifelse(state %in% c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "District of Columbia", "West Virginia", "Alabama", "Kentucky", "Mississippi", "Tennessee", "Arkansas", "Louisiana", "Oklahoma", "Texas", "Virgin Islands", "Puerto Rico"), "South",
                   ifelse(state %in% c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming", "Alaska", "California", "Hawaii", "Oregon", "Washington", "Guam"), "West", "Other")))))

# creates the data set for the second plot
shootingsp2 <- shootingsp2 |> # names the dataset
  group_by(region, year) |> # groups the rows by the region and the year
  summarise(num_shootings = n(), killed = sum(killed), injured = sum(injured), casualties = sum(casualties), state = state) |> # creates the values for each column
  arrange(region, state) # organizes the rows by state and region
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'region', 'year'. You can override using
the `.groups` argument.

Let’s start plotting the graph. First, we’re going to assign the colors that will represent each region in a vector called “cols”. Next, we’re going to set the highcharter graph type to “streamgraph”. We will set the x-axis to “year” and the y-axis to “num_shootings”, and group the data by region. We will also color each stream on the graph by using the vector “cols”. After setting up the basics, we go into detail by setting the theme to “hc_theme_538” and making the line color for each stream white.

After setting up the visual aspect, we we label the title, x-axis, and y-axis of the graph. Then we move the legend to the bottom right of the graph by utilizing the function “hc_legend()”. Then to unify the data points for each year we use “hc_tooltip” for the interactive tooltip.

library(highcharter)

cols <- c("#d62828", "#003049", "#f77f00", "#fcbf49")


highchart() %>%
  hc_add_series(data = shootingsp2, # sets the dataset
                type = "streamgraph", # sets graph type
                hcaes(x = year, y = num_shootings, group = region), # sets the data
                color = cols) %>% # sets color for each stream
  hc_add_theme(hc_theme_538()) %>% # sets the theme
  hc_plotOptions(streamgraph = list(
    stacking = "normal",
    marker = list(enabled = FALSE),
    lineWidth = 0.5,
    lineColor = "white"
  )) %>%
  hc_xAxis(title = list(text = "Year")) %>%
  hc_yAxis(title = list(text = "Number of Shootings")) %>%
  hc_title(text = "Occurrence of School Shootings in each U.S Region 1999-2018") %>%
  hc_legend(align = "right", verticalAlign = "bottom", layout = "vertical") |>
   hc_tooltip(shared = TRUE, # shows all regions per year
             borderColor = "black",
             pointFormat = "{series.name}: {point.y:.2f}<br>") # shows all y-axis values at the specific x-value

This streamgraph plot exhibits a comparison of shootings per U.S region per year. The x-axis is the year that the shooting occurred, and the y-axis measures the count of shootings that have occurred in the designated U.S region. Each “stream” represents a distinct U.S region, and the graph measures the count of shootings that occurs in the specific region from 1999-2018. The graph also allows the user to see the occurence of specific regions by utilizing the interactivity of the legend.

For example, the northeastern region of the U.S has had the least amount of shootings compared to the other regions of the U.S so it is more difficult to see beside the other U.S regions. To see the information of just the northeastern region, we can hide the other regions by clicking on their color on the legend. After that, we will only be able to see the stream for the northeastern region of the U.S.

Plot 3: Map of School Shootings Across the Nation

Now, where are these school shootings occurring? Is there a place where they usually occur? Let’s create a GIS (Geographic Information Systems) graph to visualize it

Creating a tooltip

First, let’s set up a tooltip that will show the user the information of each shooting on the map. The tooltip will show the location of the shooting, the number of casualties, the number of injuries, and the total number of casualties.

popupshootings <- paste0( # creates the tooltip for the map
      "<b>School Name: </b>", shootings$school_name, "<br>", # prints info on school name
      "<b>City: </b>", shootings$city, "<br>", # prints info on city
      "<b>State: </b>", shootings$state, "<br>", # prints info on the state
      "<b>Killed: </b>", shootings$killed, "<br>", # prints number of kills
      "<b>Injured: </b>", shootings$injured, "<br>", # prints number of injuries
      "<b>Total Casualties: </b>", shootings$casualties, "<br>" # prints number of casualties
    )

Graphing the GIS

Now we can execute the graph. First, we will call the library “leaflet” because it is used to graph GIS plots in R. We’re going to set the view to the United States, so we will set the coordinates to its respective latitude and longitude.

library(leaflet)
leaflet() |> # calls for leaflet
  setView(lng = -95.71, lat =  37.09, zoom = 4) |> # sets default view for the map
  addProviderTiles("CartoDB.DarkMatter") |> # sets the visual for the map
  addCircleMarkers( # adds the points on the map
    data = shootings, # sets the data
    radius = shootings$casualties, # sets the size of the map points
    color = "red",
    popup = popupshootings # creates the tooltip
    ) 
Assuming "long" and "lat" are longitude and latitude, respectively
Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with either
missing or invalid lat/lon values and will be ignored

This visualization shows a map indicating the locations of school shootings in the U.S, distinguished by red points. Some of the points on the map are larger than others— that is because the points with a larger radius have higher casualties. Some of the largest points on the graph include infamous shootings such as Sandy Hook, Columbine, and the Parkland shootings. When interacting with a point, a tooltip will display information about the shooting. These include the location of the shooting, the amount of victims injured and killed, the school name, and the total number of casualties.

Tableau Dashboard

School Shootings Tableau Dashboard - Yuengling

The tableau dashboard has multiple visualizations about school shootings, so let’s take a deep dive into each one.

Left: Shootings Map

The map on the left is similar to the GIS plot, it has distinguished marks and the radius is determined by the amount of casualties that have occured at that shooting. But the Tableau map has more features. The states of the U.S that have had shootings have colors depending on the number of shootings that have occurred within the state. The darker the state, the more shootings that have occurred. Some of the darker states include: Colorado, California, and Florida.

There are also tooltips that occur for each mark and state, showing the statistics of the number of casualties (kills and injuries) and the location of the shooting. I have also incorporated some annotations on the map for the largest points on the graph, these are the Columbine, Sandy Hook, Red Lake, and Parkland Shootings.

Upper Right: Killed and Injured Packed Bubbles

This visualization was purely for fun, but it is also informative too.

For both visualizations there are bubbles that represent each state, and the size represents the amount of killed or injured victims from a school shooting within the state. The light blue represent the amount of killed victims in each state; the states with the most deaths from school shootings are Connecticut, Florida, and Colorado. The dark blue bubbles represents the amount of injured victims in each state; the states with the most injuries from school shootings are California, Florida, and Colorado.

Lower Right: Stacked Bar Chart of Shootings by Region

The last visualization on the dashboard is similar to the first and second visualizations done in R. It is similar to the first visualization because it is a stacked bar chart, and similar to the second because it measures the number of shootings by U.S region that occur each year. The x-axis is the year, and the y-axis is the number of shootings that have occurred. Each shade of blue represents a U.S region, and there are sections within the regional sections that represents a state that falls within that region.

If you hover over the section, a tooltip will appear showing the name of the state, the region the state falls into, the number of shootings that have occurred in the state, and the year.

Closing Statement

Everyone can agree that school shootings are tragic. According to studies, there have been 1453 school shootings from 1997 to 2022 (Rapa et. al). The same source claims that in more recent years, shootings have become more deadly as time passes. Shootings such as Columbine and Sandy Hook have shaken the country with a high number of casualties, especially Sandy Hook because of the young age of its victims. The topic should not be taken lightly and is truly a tragedy.

After analyzing the data and creating several visualizations, I have noticed several patterns. The southern regions of the United States tend to have the most shootings, but that may because it has the most states in that regional group. The most shootings have also occurred in the states of California (specifically sourthern California) and Florida. The shootings of Sandy Hook, Parkland, and Columbine have also significantly influenced the majority of the visualizations related to casualties.

Regarding the visualizations, I am proud of each one. But I was planning to make the second graph more intricate with the theme, but I wasn’t able to because of the lack of time I had. Out of all of the visualizations, I am most proud of the dashboard because I wasn’t very familiar with Tableau but I was able to eventually get a hang of it, and it turned out nicely. Overall, the project was fun (not the topic) and I was able to push myself to my limits for data visualization.

Sources Used:

https://publications.aap.org/pediatrics/article/153/4/e2023064311/196816/School-Shootings-in-the-United-States-1997-2022?autologincheck=redirected (Closing Statement)

https://github.com/washingtonpost/data-school-shootings/blob/master/school-shootings-data.csv (Dataset)

https://www.businessinsider.com/school-shootings-experts-say-policy-changes-help-prevent-tragedy-2021-3 (Image)