US School Shooting Casualties

Author

Dormowa Sherman

Published

October 17, 2023

Wikimedia Commons: Students protest for gun laws outside the White House

Intro

For this project I will use a dataset of school shootings in American from 1999 (Columbine High School) to 2023. The dataset is from The Washington Post and was last updated in August 2023. The school shootings dataset includes 387 observations from 50 variables. The variables I plan to use for my analysis are school name, state, day of week, year, number of injured people, number of killed people, and casualties (injured + killed). I want to look at the school shooting incidents in the US states with the highest prevalence of school shootings. I am interested in exploring if there are one or two days of the week where the majority of school shootings are perpetrated.

loading packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

importing school shootings dataset and naming it schools

schools <- read.csv("schoolshootings.csv")

taking a glimpse of the schools dataset

glimpse(schools)
Rows: 387
Columns: 50
$ uid                              <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…
$ nces_school_id                   <chr> "80480000707", "220054000422", "13044…
$ school_name                      <chr> "Columbine High School", "Scotlandvil…
$ nces_district_id                 <chr> "804800", "2200540", "1304410", "4218…
$ district_name                    <chr> "Jefferson County R-1", "East Baton R…
$ date                             <chr> "4/20/1999", "4/22/1999", "5/20/1999"…
$ school_year                      <chr> "1998-1999", "1998-1999", "1998-1999"…
$ year                             <int> 1999, 1999, 1999, 1999, 1999, 1999, 1…
$ time                             <chr> "11:19 AM", "12:30 PM", "8:03 AM", "1…
$ day_of_week                      <chr> "Tuesday", "Thursday", "Thursday", "M…
$ city                             <chr> "Littleton", "Baton Rouge", "Conyers"…
$ state                            <chr> "Colorado", "Louisiana", "Georgia", "…
$ school_type                      <chr> "public", "public", "public", "public…
$ enrollment                       <chr> "1965", "588", "1369", "3147", "1116"…
$ killed                           <int> 13, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, …
$ injured                          <int> 21, 1, 6, 1, 1, 0, 5, 0, 0, 1, 0, 0, …
$ casualties                       <int> 34, 1, 6, 1, 1, 1, 5, 1, 0, 1, 0, 0, …
$ shooting_type                    <chr> "indiscriminate", "targeted", "indisc…
$ age_shooter1                     <int> 18, 14, 15, 17, NA, 12, 13, 16, 13, 1…
$ gender_shooter1                  <chr> "m", "m", "m", "m", "m", "m", "m", "m…
$ race_ethnicity_shooter1          <chr> "w", "", "w", "", "", "h", "ai", "w",…
$ shooter_relationship1            <chr> "student", "former student (expelled)…
$ shooter_deceased1                <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ deceased_notes1                  <chr> "suicide", "", "", "", "", "", "", ""…
$ age_shooter2                     <int> 17, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gender_shooter2                  <chr> "m", "", "", "", "", "", "", "", "", …
$ race_ethnicity_shooter2          <chr> "w", "", "", "", "", "", "", "", "", …
$ shooter_relationship2            <chr> "student", "", "", "", "", "", "", ""…
$ shooter_deceased2                <int> 1, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ deceased_notes2                  <chr> "suicide", "", "", "", "", "", "", ""…
$ white                            <int> 1783, 5, 1189, 209, 40, 160, 239, 169…
$ black                            <int> 16, 583, 136, 2736, 755, 6, 3, 28, 40…
$ hispanic                         <chr> "112", "0", "28", "27", "287", "583",…
$ asian                            <int> 42, 0, 15, 170, 29, 2, 0, 26, 222, 0,…
$ american_indian_alaska_native    <int> 12, 0, 1, 5, 5, 2, 153, 5, 1, 0, 1, 1…
$ hawaiian_native_pacific_islander <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ two_or_more                      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ resource_officer                 <int> 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0…
$ weapon                           <chr> "12-gauge Savage-Springfield 67H pump…
$ weapon_source                    <chr> "purchased from friends", "", "", "pu…
$ lat                              <dbl> 39.60391, 30.52996, 33.62692, 39.9215…
$ long                             <dbl> -105.07500, -91.16997, -84.04796, -75…
$ staffing                         <dbl> 89.600, 39.000, 84.000, 41.000, NA, 4…
$ low_grade                        <chr> "9", "6", "9", "9", "9", "6", "6", "9…
$ high_grade                       <chr> "12", "8", "12", "12", "12", "7", "8"…
$ lunch                            <chr> "41", "495", "125", "2007", "543", "5…
$ county                           <chr> "Jefferson County", "East Baton Rouge…
$ state_fips                       <int> 8, 22, 13, 42, 25, 35, 40, 12, 6, 17,…
$ county_fips                      <int> 8059, 22033, 13247, 42101, 25025, 350…
$ ulocale                          <int> 21, 12, 21, 11, 11, 33, 32, 21, 13, 1…

In the chunk below, I selected the variables that are most relevant to my analysis. Then, I filtered the data by year to include the most recent six years 2018-2023. I include six years instead of five because in 2020 many school districts were operating under distance learning due to the COVID-19 pandemic. I worried there might not be enough data. I also filtered the data by state to only include California, Texas, Illinois, Florida, Michigan, Pennsylvania, North Carolina, and Maryland.

schools |>
  select(school_name, year, day_of_week, state, killed, injured, casualties) |>
  filter(year == "2018" | year == "2019" | year == "2020" | year == "2021" | year == "2022" | year == "2023", state == "California" | state == "Texas" | state == "Illinois" | state == "Florida" | state == "Michigan" | state == "Pennsylvania" | state == "New York" | state == "North Carolina" | state == "Maryland")
                                               school_name year day_of_week
1                                        Italy High School 2018      Monday
2                         Salvador B. Castro Middle School 2018    Thursday
3                     Marjory Stoneman Douglas High School 2018   Wednesday
4                                      Seaside High School 2018     Tuesday
5                                  Great Mills High School 2018     Tuesday
6                                      Jackson High School 2018    Thursday
7                                       Forest High School 2018      Friday
8                                     Highland High School 2018      Friday
9                                        Dixon High School 2018   Wednesday
10                                    Santa Fe High School 2018      Friday
11                         Villa Heights Elementary School 2018    Thursday
12                          Lawrence Orr Elementary School 2018      Monday
13                                    Battle Creek Academy 2018      Friday
14                                      Butler High School 2018      Monday
15                          Frederick Douglass High School 2019      Friday
16            New Joseph Bonnheim Community Charter School 2019   Wednesday
17                               Saint Clair Evans Academy 2019   Wednesday
18                                        Flex High School 2019   Wednesday
19                                     Menta Academy North 2019      Monday
20                                Hollenbeck Middle School 2019     Tuesday
21                                     Ridgway High School 2019     Tuesday
22                                     Achievement Academy 2019      Monday
23                              Esteban Torres High School 2019   Wednesday
24                                      Saugus High School 2019    Thursday
25                                    Bellaire High School 2020     Tuesday
26                             McAuliffe Elementary School 2020    Thursday
27                                     Antioch High School 2020   Wednesday
28                                         Sagemont School 2020    Thursday
29                                     Ribault High School 2020      Friday
30                               Lincoln Elementary School 2020      Friday
31                            Hendersonville Middle School 2020     Tuesday
32                             Wayne Central Middle School 2021      Monday
33                          Urban Dove Team Charter School 2021    Thursday
34                                North Forest High School 2021   Wednesday
35                                 New Hanover High School 2021      Monday
36                                 Mount Tabor High School 2021   Wednesday
37                             Tri-County Education Center 2021   Wednesday
38                            YES Prep Southwest Secondary 2021      Friday
39                                  Timberview High School 2021   Wednesday
40                    Wendell Phillips Academy High School 2021     Tuesday
41                           James McDade Classical School 2021   Wednesday
42                                Poughkeepsie High School 2021      Monday
43                                                 P.S. 44 2021     Tuesday
44                           Thornton Township High School 2021   Wednesday
45                                      Oxford High School 2021     Tuesday
46                                 Sam Rayburn High School 2021   Wednesday
47                               Great Oaks Charter School 2021    Thursday
48                             Jesse C. Carson High School 2021    Thursday
49                              West Charlotte High School 2021      Monday
50                                      Auburn High School 2022     Tuesday
51                                    Seminole High School 2022   Wednesday
52                                 Oliver Citywide Academy 2022   Wednesday
53                                    Magruder High School 2022      Friday
54                                Mount Vernon High School 2022     Tuesday
55                               North Gardens High School 2022   Wednesday
56                                     De Anza High School 2022      Friday
57 West Philadelphia Achievement Charter Elementary School 2022    Thursday
58                                        Erie High School 2022     Tuesday
59                           Aspen Ridge Elementary School 2022     Tuesday
60                                     Heights High School 2022    Thursday
61                    Alexander W. Dreyfoos School of Arts 2022      Friday
62                               Walt Disney Magnet School 2022     Tuesday
63                                       Mexia High School 2022      Monday
64                                  Robb Elementary School 2022     Tuesday
65                            Ulysses S. Grant High School 2022   Wednesday
66                                 John Finney High School 2022   Wednesday
67                                    Madison Park Academy 2022      Monday
68           Mergenthaler Vocational Technical High School 2022      Friday
69                         Treasure Cost Classical Academy 2022      Monday
70                                    Rudsdale High School 2022   Wednesday
71                                    Suitland High School 2022    Thursday
72                             Fuquay-Varina Middle School 2022    Thursday
73                           Benjamin Franklin High School 2023    Thursday
74                                                   PS 78 2023      Monday
75                                     Dalhart ISD schools 2023      Monday
76                                    Westinghouse Academy 2023     Tuesday
77                                   Palo Duro High School 2023      Monday
78                                       Lamar High School 2023      Monday
79                      International Academy of Flint K12 2023     Tuesday
80                  E. Washington Rhodes Elementary School 2023      Monday
81                                 Oliver Citywide Academy 2023   Wednesday
82                         Michigan Collegiate High School 2023      Monday
            state killed injured casualties
1           Texas      0       1          1
2      California      0       5          5
3         Florida     17      17         34
4      California      0       3          3
5        Maryland      1       1          2
6        Michigan      0       0          0
7         Florida      0       1          1
8      California      0       1          1
9        Illinois      0       0          0
10          Texas     10      13         23
11 North Carolina      0       0          0
12 North Carolina      0       0          0
13       Michigan      0       0          0
14 North Carolina      1       0          1
15       Maryland      0       1          1
16     California      0       0          0
17        Florida      0       0          0
18       Michigan      0       0          0
19       Illinois      0       0          0
20     California      0       1          1
21     California      0       1          1
22       Maryland      0       1          1
23     California      1       0          1
24     California      2       3          5
25          Texas      1       0          1
26     California      0       1          1
27     California      0       0          0
28        Florida      0       1          1
29        Florida      0       1          1
30 North Carolina      0       1          1
31 North Carolina      0       1          1
32       New York      0       1          1
33       New York      1       0          1
34          Texas      0       1          1
35 North Carolina      0       1          1
36 North Carolina      1       0          1
37       Michigan      0       0          0
38          Texas      0       1          1
39          Texas      0       4          4
40       Illinois      0       2          2
41       Illinois      0       0          0
42       New York      0       0          0
43       New York      0       0          0
44       Illinois      0       0          0
45       Michigan      4       7         11
46          Texas      0       0          0
47       New York      0       0          0
48 North Carolina      0       0          0
49 North Carolina      0       0          0
50       Illinois      0       2          2
51        Florida      0       1          1
52   Pennsylvania      1       0          1
53       Maryland      0       1          1
54       New York      0       0          0
55        Florida      0       3          3
56     California      0       1          1
57   Pennsylvania      0       0          0
58   Pennsylvania      0       1          1
59       Michigan      0       0          0
60          Texas      0       1          1
61        Florida      1       0          0
62       Illinois      0       1          1
63          Texas      0       0          0
64          Texas     21      12         33
65     California      0       1          2
66     California      0       0          0
67     California      0       1          1
68       Maryland      1       0          1
69        Florida      0       0          0
70     California      0       6          6
71       Maryland      0       1          1
72 North Carolina      0       0          0
73       New York      0       0          0
74       New York      0       0          0
75          Texas      1       0          1
76   Pennsylvania      0       4          4
77          Texas      0       1          1
78          Texas      1       1          2
79       Michigan      0       1          1
80   Pennsylvania      0       0          0
81   Pennsylvania      1       0          1
82       Michigan      0       0          0

In the chunk below I attempted to create a scatterplot that would reflect the selections and filters I applied above. This didn’t work, as shown in the preliminary visualization.

ggplot (schools, aes(x = year, y = state, color = day_of_week)) +
  labs(x = "Year", y = "State", title = "US School Shootings by State from 2018-2023") +
  theme_minimal() +
  geom_point()

In the chunk below I am creating a data subset called “schools1” that will filter for only the years and states I want to analyze in my visualizations. This was to remedy the mistake I made above.

schools1 <- 
  filter(schools, year == "2018" | year == "2019" | year == "2020" | year == "2021" |year == "2022" | year == "2023", state == "California" | state == "Texas" | state == "Illinois" | state == "Florida" | state == "Michigan" | state == "Pennsylvania" | state == "New York" | state == "North Carolina" | state == "Maryland")

In the chunk below I created a data subset called “schools2” to build upon “schools1” and select only the variables I want to explore in my visualizations. I could not figure out how to create a data subset that would filter the values and select the variables simultaneously (in one chunk). So, I performed the steps separately. Now, I hope I have the final subset of the school shootings dataset that I will use for my preliminary and final visualizations.

schools2 <-
  select(schools1, school_name, year, day_of_week, state, killed, injured, casualties)

Below is another preliminary visualization. I set the data to “schools2” and created a sccatterplot with lines to try to explore the relationship between school shootings and the day of week they occur. I set the theme to minimal. This graph tells me very little, as did the scatterplot above.

ggplot (schools2, aes(x = year, y = state, color = day_of_week)) +
  labs(x = "Year", y = "State", title = "US School Shootings by State from 2018-2023") +
  theme_minimal() +
  geom_point() +
  geom_line()

With this next preliminary visualization I am still trying to explore anything that might stand out relation to school shootings and days of the week. This time I chose a bar graph with x = state, and flipped it for better visibility. The fill and legend are set to reflect days of week. I see that Mondays and Wednesdays stand out somewhat, but only for some of the states. While, this is a much better visualization than the others, it doesn’t show a strong enough relationship to continue moving in this direction.

ggplot(schools2, aes(x = state, fill = day_of_week)) +
  geom_bar(position = "dodge") +
  coord_flip()

In the preliminary visualization below I decided to stop grasping at straws with the day of week angle, and move on to casualties (sum of killed and injured people) in the selected states during the six-year time period. Still using schools2, I grouped by state, calculated the average casualty per state during the time period, and used geom_col to set the x and y axes to “state” and “causalities” respectively.

schools2 |> 
  group_by(state) |> 
  summarize(casualties = mean(casualties, na.rm = TRUE)) |> 
  ggplot(aes(x = state, y = casualties)) +
  geom_col() + 
  coord_flip() +
  labs(x = "State", y = "Average Casualties", title = "US School Shootings by State from 2018-2023", caption = "Source: https://github.com/washingtonpost/data-school-shootings/blob/master/school-shootings-data.csv")

At this point I decided that restricting the time period to 2018-2023 was not yielding the robustness of data needed. So, I took the original data and created a subset that only filters for the states I designated earlier, but includes all years from 1999-2023.

schools4 <- 
  filter(schools, state == "California" | state == "Texas" | state == "Illinois" | state == "Florida" | state == "Michigan" | state == "Pennsylvania" | state == "New York" | state == "North Carolina" | state == "Maryland")

In this chunk of the final visualization, I set the data to schools4 and used mutate to have the states appear in the order of highest to lowest number of casualties. I used ggplot, geom_col, and coord_flip to create a flipped bar graph with states along the x-axis and number of casualties along the y-axis.

schools4 |> 
  mutate(state = fct_relevel(state, 
            "New York", "Maryland", "Illinois", 
            "North Carolina", "Michigan", "Pennsylvania", "Florida", 
            "Texas", "California")) |>
  ggplot(aes(x = state, y = casualties, fill = year)) +
  geom_col() + 
  coord_flip() +
  labs(x = "States w/ High Incidents", y = "Casualties (Injured and Deaths)", title = "US School Shooting Casualties from 1999-2023", 
       caption = "Source: https://github.com/washingtonpost/data-school-shootings/blob/master/school-shootings-data.csv") +
  theme_minimal()

In Closing

Not much cleaning was necessary for the school shootings dataset I used. The variables are categorized correctly, and there are no duplicates or missing data in the variables I needed for my analysis. The bit of cleaning that I did perform was to select the variables and filter for the observations that I wanted to explore.

The final visualization represents the number of school shooting casualties (people killed plus people injured) in select US states from 1999-2023. The states were selected by their frequency of school shooting incidents. The legend fill is set to year, so the color gradient of each bar shows the saturation of casualties through the years.

The final visualization is far from what I set out to explore. I had a theory that there might be one or two school days on which most shootings occurred, but this wasn’t true in any significant way. Next, I set out to explore the number of people injured, number of people killed, and their proportion to the total casualties. I did this to some extent, but not in the way I envisioned. If I had the technical acumen what I really wanted was for the y-axis to be a percentage scale labeled 0%, 25%, 50% and 100% and for the state names to be along the x-axis. Then I would use coord_flip so the percentages are along the bottom of the chart. Each state would have a bar going all the way across from 0% to 100%, and each bar would be two-toned. One color would represent the percentage of casualties that are deaths and the other color the percentage of casualties that are injured. After trying for an entire morning and afternoon, I conceded that I have a quite a few knowledge gaps to fill before I can get there.