Introduction

Terrorism within the United States has been a continued national security concern, and government agencies are constantly seeking to understand, combat, and prevent terrorist incidents from occuring. Through the study of observational data, one hopes that we can describe the history of terrorism to better understand its perpetrators, motives, and methodologies in an effort to more effectively resist against it. This analysis will use data collected on terrorist incidents occurring within the United States since 1970 to gain a better understanding of the history and trends of terrorism, in an effort to correct misconceptions, answer questions, and to provoke new ones that may lead to a safer nation and globe.

Note: Readers wishing to follow along with the technical code can find the required packages used in this analysis below:

# Load packages
library(data.table)
library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)
library(leaflet)
library(RColorBrewer)

The Data

Throughout this analysis, we will be referencing data supplied by researchers from the National Consortium for the Study of Terrorism and Responses to Terrorism, headquartered at the University of Maryland. This organization publishes an annual update to its Global Terrorism Database (GTD), which tracks information on terrorist attacks throughout the world, beginning in 1970. The specific database version that was used for this analysis was pulled from kaggle, and contains over 100 variables to describe locations, tactics, perpetrators, targets, and much more on over 180,000 attempted terrorist attacks from 1970 to 2017. Detailed documentation exists for the database within their officially maintained codebook, which does an excellent job of describing the definitions and methodology of the dataset.

Though the dataset contains a vast amount of useful information on terrorism throughout the globe, we will limit this analysis to focus on terrorism occurring within the United States. Users wishing to follow along programatically with this analysis can download a copy of the dataset from kaggle and mimic the ingestion steps outlined below.

# Load the dataset from our local directory
df = fread('data/globalterrorismdb_0718dist.csv')

# Select the subset of the relevant columns
# Many other columns exist that are out of
# scope for this analysis
df = df %>%
    select(iyear, country_txt, provstate, latitude, longitude, attacktype1_txt, 
           gname, nkill, summary) %>%
    data.frame()

# Filter down to only include records within the U.S.
df = df %>%
    filter(country_txt == 'United States')

Terrorist Incidents and Fatalities Over Time

A natural starting point to evaluate the impact of terrorism within the United States is to view terrorist incidents over time. We'll begin by considering two primary ways of measuring the impact of terrorism in any given year:

  1. The number of terrorist incidents that occurred within that year
  2. The number of confirmed fatalities that occurred due to terrorist incidents within that year

The below line graph simultaneously entertains either measurement. The time component is displayed on the x-axis and each measurement contains its own y-axis.

# Aggregate our data to the desired format
# Number of incidents and fatalities by year
df_1 = df %>%
    select(iyear, nkill) %>%
    group_by(iyear) %>%
    summarise(n_inc = length(nkill), 
              n_deaths = sum(nkill, na.rm = TRUE), 
              .groups = 'keep') %>%
    data.frame()

# Produce the line plot
ggplot(data = df_1, aes(x = iyear, y = n_inc)) + 
    geom_line(colour = 'black', size = 1.5) +
    geom_line(aes(colour = 'Incidents'), size = 1) + 
    geom_point(size = 3, fill = 'white', shape = 21) + 
    geom_line(inherit.aes = FALSE, 
              aes(x = iyear, y = 100 * log10(1 + n_deaths)), 
              colour = 'black', size = 1.5) +
    geom_line(inherit.aes = FALSE, 
              aes(x = iyear, y = 100 * log10(1 + n_deaths), colour = 'Fatalities'), 
              size = 1) + 
    geom_point(inherit.aes = FALSE, 
               aes(x = iyear, y = 100 * log10(1 + n_deaths)),
               size = 3, fill = 'white', shape = 21) +
    theme_dark() + 
    theme(plot.title = element_text(hjust = 0.5),
          panel.background = element_rect(fill = 'grey80'),
          panel.border = element_rect(colour = 'black', fill = NA),
          legend.key = element_rect(fill = "grey80", color = NA)) + 
    scale_y_continuous(name = 'Number of Incidents', 
                       labels = comma,
                       sec.axis = dup_axis(name = 'Number of Confirmed Fatalities',
                                           breaks = c(100*log10(1 + 0), 
                                                      100*log10(1 + 10), 
                                                      100*log10(1 + 100), 
                                                      100*log10(1 + 1000), 
                                                      100*log10(1 + 10000), 
                                                      100*log10(1 + 100000)),
                                           labels = comma(c(0, 10, 100, 1000, 
                                                            10000, 100000)))) +
    labs(x = 'Year of Incident', 
         title = paste0('Number of Terrorist Incidents and Confirmed Fatalities by Year',
                        '\nUnited States, ', min(df_1$iyear), '-', max(df_1$iyear), '*'),
         captions = '*The Global Terrorism Database does not supply data for the year 1993.',
         colour = 'Metric')

We can see that the number of incidents, highlighted in blue, was at an extreme high at the beginning of the 1970's, measured on the left-side y-axis. Though there was some variation from year-to-year, the number of incidents tended to decline until about 2010. Concerningly, the years from 2011 to 2017 have marked a notable reversal of this trend. The 65 terrorist incidents occurring in 2017 for example, were the most occurring since 1982.

We highlight the number of confirmed fatalities in red, measured on the right-side y-axis, which operates on a log-scale. The log-scale is chosen due to the extremity of the September 11th attacks occurring in 2001, which was by far the most deadly terrorist attack on United States soil throughout the dataset. The trending of fatalities over time is more variable than the trending of the number of incidents, in part due to certain unusually deadly events that occur throughout the dataset, such as September 11th in 2001 and the Oklahoma City Bombing in 1995. We do however, continue to observe a trend in which the number of fatalities per year sharply increases in the years from 2011 to 2017.

Attack Methodologies By Decade

As this dataset spans almost 50 years, the technologies, motivations, and methodologies that terrorists leverage in their schemes has shifted significantly over time. The GTD categorizes each attack into one of 8 methodological classes, plus an unknown bucket, that describes the primary attack methodology that was used during the incident. The below graphic plots the methodologies used, as a percentage of total incidents within each decade in the dataset.

# Aggregate our dataset to the desired format
df_2 = df %>%
    mutate(decade = paste0(floor(iyear/10)*10,"'s")) %>%
    select(attacktype1_txt, decade, nkill) %>%
    group_by(attacktype1_txt, decade) %>%
    summarise(n_inc = length(attacktype1_txt), 
              n_deaths = sum(nkill, na.rm = TRUE), 
              .groups = 'keep') %>%
    group_by(decade) %>%
    mutate(perc = round(n_inc/sum(n_inc), 2)) %>%
    data.frame()

ggplot(data = df_2, aes(x = attacktype1_txt, y = perc, fill = decade)) + 
    geom_bar(stat = 'identity', color = 'black') + 
    coord_flip() +
    labs(x = 'Type of Attack', 
         y = 'Percentage of Incidents', 
         fill = 'Decade',
         title = 'Percentage of Incidents by Attack Type within Decade') + 
    theme_dark() + 
    theme(plot.title = element_text(hjust = 0.5),
          panel.background = element_rect(fill = 'grey80'),
          panel.border = element_rect(colour = 'black', fill = NA),
          legend.position = c(0.85, 0.25),
          legend.background = element_rect(fill="grey80",
                                           size=0.5, linetype="solid", 
                                           colour ="black")) + 
    facet_wrap(~decade, ncol = 3) + 
    scale_y_continuous(labels = percent, breaks = seq(0.0, 0.6, .1))

We can see that bombings/explosions were by far the most common attack methodology in the 1970's, where over 60% of all incidents occurred in this manner. Over time, this methodology became less common and it now represents less than 15% of all incidents in the 2010's.

In contrast, facility/infrastructure attacks were relatively less common in the 1970's, but became much more common in the 1990's, 2000's, and 2010's. Additionally, the 2010's have seen a sharp increase in the number of armed assaults, with over 30% of incidents falling into that category. No other decade surpassed 10% of incidents classified as armed assaults.

Modern Terrorist Organizations

With the observed increase of terrorist activities throughout the 2010's, we may be interested in further understanding the individuals and organizations that are driving the increased threat to national security. The below graphic displays a heatmap of the 34 terrorist group names that attempted an attack in the United States from 2010 to 2017.

# Aggregate our dataset to the desired format
df_3 = df %>%
    select(gname, iyear) %>%
    filter(iyear >= 2010) %>%
    filter(gname != 'Unknown') %>%
    group_by(gname, iyear) %>%
    summarise(n = length(gname), .groups = 'keep') %>%
    data.frame() %>%
    complete(iyear, gname, fill = list(n = 0)) %>%
    data.frame()

# Produce the plot
ggplot(data = df_3, aes(x = iyear, y = gname, fill = n)) + 
    geom_tile(color = 'black') + 
    geom_text(aes(label = n)) +
    labs(title = 'Incidents Incited by Terrorist Organization Since 2010', 
         x = 'Year',
         y = 'Organization Name',
         fill = 'Number of Incidents Incited') + 
    theme_minimal() + 
    theme(plot.title = element_text(hjust = 0.5),
          panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank()) + 
    scale_fill_continuous(low = 'grey88', high = 'firebrick', breaks = 0:9) +
    scale_x_continuous(breaks = 2010:2017) +
    guides(fill = guide_legend(reverse = TRUE, 
                               override.aes = list(colour = 'black')))

Throughout the 2010's, anti-Muslim extremists and Jihadi-inspired extremists incited the most incidents, with 26 and 25 respectively. In 2017, white extremists attempted 9 attacks, the most by any group in a single year.

States Most Impacted by Terrorist Activities

Another natural curiousity to address is the locations of the attacks. Certain states and territories throughout the United States experience a much larger portion of the attacks than others, as communicated in the below pie chart, which considers all data from 1970 to 2017.

# Find states with most attacks
top_targets = df %>%
    group_by(provstate) %>%
    count() %>%
    mutate(perc = round(100*n/nrow(df),2)) %>%
    arrange(desc(n)) %>%
    data.frame()

# Group all states w/ fewer than 2.5% of the total attacks
# into an 'other' bucket
df_4 = rbind(
    top_targets[top_targets$perc >= 2.5,], 
    c('Other', 
      sum(top_targets[top_targets$perc < 2.5, 'n']), 
      sum(top_targets[top_targets$perc < 2.5, 'perc'])))

# Reformat fields 
df_4$n = as.numeric(df_4$n)
df_4$provstate = factor(df_4$provstate, 
                        ordered = TRUE, 
                        levels = rev(df_4$provstate))

# Produce the pie chart
ggplot(data = df_4, aes(x = '', y = n, fill = provstate)) + 
    geom_bar(stat = 'identity', position = 'fill', color = 'black') +
    coord_polar(theta = 'y', start = 0) +
    labs(title = 'Percentage of Incidents by State', 
         x = NULL, y = NULL, fill = 'State/Territory') + 
    scale_fill_brewer(palette = 'Spectral') + 
    geom_text(aes(x = 1.6, label = paste0(perc, '%')),
              size = 4,
              position = position_fill(vjust = 0.5)) +
    theme_void() + 
    theme(plot.title = element_text(hjust = 0.5),
          axis.text = element_blank(),
          axis.ticks = element_blank(),
          panel.grid = element_blank()) + 
    guides(fill = guide_legend(reverse = TRUE))

California is the most heavily attacked state, containing over 1/5th of all attacks. New York is not far behind, with 18.6% of the attacks. These two states capture close to 40% of all attacks, with another ~25% of the remaining attacks occurring in Puerto Rico, Florida, Illinois, Washington, or the District of Columbia. All other states make up the remaining 36.14% of attacks.

Interactive Map of Terrorist Incidents

We can further investigate location data regarding these terrorist incidents with an interactive map produced in leaflet. The below map displays over 2,800 terrorist incidents throughout the United States, color coded by attack type. The size of each bubble is further determined by the number of confirmed fatalities from each incident. Hovering over a bubble will display these attack methdology and fatality details, and clicking on a bubble will display a detailed summary of the incident, as provided by the GTD, if available.

# Initialize the map
map = leaflet() %>%
    addProviderTiles(providers$Stamen.Toner) %>%
    setView(lng = -103, lat = 42, zoom = 4) # Approx center of continental US

# We will color code the markers by attack type
# Get list of attack types
attack_types = unique(df$attacktype1_txt)
# Assign colors to each attack type
colors = brewer.pal(n = length(attack_types), name = 'Set1')

# Add markers for each attack type
for(i in 1:length(attack_types)){
    map = map %>%
        addCircles(
            lng = subset(df, attacktype1_txt == attack_types[i])$longitude,
            lat = subset(df, attacktype1_txt == attack_types[i])$latitude,
            opacity = 10,
            color = colors[i],
            popup = ifelse(subset(df, attacktype1_txt == attack_types[i])$summary == '',
                           'No Summary Available',
                           subset(df, attacktype1_txt == attack_types[i])$summary),
            label = paste0('# of Confirmed Fatalities: ', 
                           subset(df, attacktype1_txt == attack_types[i])$nkill, 
                           '\nAttack Type: ', 
                           subset(df, attacktype1_txt == attack_types[i])$attacktype1_txt),
            radius = 10000*log10(subset(df, attacktype1_txt == attack_types[i])$nkill)
        )
}

# Display the map
map

Conclusion

Throughout this analysis, we have reviewed the impact of terrorism over time, the methodologies used by terrorists and how they have trended by decade, the terrorist organizations that are most active in the modern era, and the states that have been most impacted by terrorism since 1970. We concluded with an interactive map that allows users to further inspect location data and view the tragic details of individual incidents.

Though brief, this analysis further reinforces the importance of effective counter-terrorist policy, emphasized by the uptick in terrorist incidents over the past decade. We encourage interested readers to further interact with the Global Terrorism Database as a resource to ask questions, gain insights, and bolster efforts to create a safe and peaceful nation.

Citations

National Consortium for the Study of Terrorism and Responses to Terrorism (START), University of Maryland. (2019). The Global Terrorism Database (GTD) [Data file]. Retrieved from https://www.start.umd.edu/gtd