Executive summary

My main questions are:

My final graphic uses five different plots and annotations added in Illustator to guide the reader in how to understand the takeaways from the plot. I believe all of the questions above are answered, and tell a story about global terrorism. Annotated below are individual stories that contribute to the larger picture.

Data background

The Global Terrorism Database (https://www.kaggle.com/datasets/START-UMD/gtd) is comprised of attack-level details for incidents of terrorism across the world from 1970 through 2017. Researchers use news stories covering terrorist attacks to collect information about each attack. Each attack must meet a set of criteria in order to be added to the database. According to the codebook, the incident must include “the threatened or actual use of illegal force and violence by a nonstate actor to attain a political, economic, religious, or social goal through fear, coercion, or intimidation.”

This database (referred to as GTD) is managed by the University of Maryland’s National Consortium for the Study of Terrorism and Responses to Terrorism (START). Since 1970, various groups have been involved in the data collection process, but the START team has made significant effort to maintain consistent reporting practices during the entire period of data collection.

There are a few issues that limit the ‘completeness’ of this dataset. Given the limitations of news media and the uncertainty surrounding acts of violence, this could not be used as a comprehensive list of incidents but could certainly illuminate trends in terrorism, especially terrorism that rises to the level of news coverage. Notably, issues with data collection in 1993 leaves the records for that year only 15% complete.

Data cleaning

Libraries

First, I’m bringing in some libraries:

library(tidyverse)
library(sf)
library(scales)
library(patchwork)
library(viridis)

Import Data

Then, I’m going to bring in a few datasets. I definitely need to bring in the csv file with the terrorism records. I also intend to try out some maps, so I’ll bring in shapefiles a world map and a US map.

gtd_raw <- read_csv("data/globalterrorismdb_0718dist.csv")
world_map <- read_sf("data/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp")
us_map <- read_sf("data/cb_2022_us_state_500k/cb_2022_us_state_500k.shp")

Filter and Preliminary Plots

I browsed through the GTD and I have some exploring to do! I’m starting with some preliminary plots and the ones I think tell the most compelling story will be the ones I’ll clean up, annotate, and enhance for the final draft. There are a lot of variables here, but many of them have null values for most of the observations. So, I’ll probably look for the columns with the fewest missing values.

For consistency, I’ll try to use colors within the same color palette. I’m picking a viridis color scale (plasma) that I think is appropriate for the topic. I’ll plan on using discrete values from this scale when needed, and the full scale for non-categorical values.

Attacks by Year

First, I want to see how many attacks happen from 1970 through 2017 (the most recent year in the csv file). I’m starting with a worldwide perspective, without specifying a subset of time or geographical area (knowing I could do that later).

gtd_year <- gtd_raw %>% 
  group_by(iyear) %>% 
  summarize(attacks = n())
plot_attacks_global <- ggplot() +
  geom_line(data = gtd_year, aes(x = iyear, y = attacks), color = "#000004", linewidth = 1.5) +
  theme_minimal() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Global Terrorist Attacks, 1970-2017", x = "Year", y = "Number of Attacks")
plot_attacks_global

This is interesting! A slow rise until the early 90s then a drop in the early 2000s then a huge spike after 2010, with a drop after.

I’m curious to delve a bit more into the ‘9/11 effect’ I predicted, so I’m going to do a US-only one.

gtd_us <- gtd_raw %>% 
  group_by(iyear, country_txt) %>% 
  summarize(attacks = n()) %>% 
  filter(country_txt == "United States")
plot_attacks_us <- ggplot() +
  geom_line(data = gtd_us, aes(x = iyear, y = attacks), color = "#51127C", linewidth = 1.5) +
  theme_minimal() +
  labs(title = "Terrorist Attacks in the US, 1970-2017", x = "Year", y = "Number of Attacks")
plot_attacks_us

I can’t make any causal claums here but it looks like there is a little dip in the mid 2000s, possibly because law enforcement was on higher alert after 2001. But the bigger story here is the huge drop from the beginning of the timeframe to the early 1970s. It almost mirrors the global story, which is worth noting! I think I’ll combine these together for the final plot.

plot_combined_total_attacks <- plot_attacks_global + plot_attacks_us
plot_combined_total_attacks

Attacks by Region

I also want to look at which regions contribute to the overall attack numbers.

gtd_region <- gtd_raw %>% 
  group_by(region_txt) %>% 
  summarize(attacks = n())

Trying a slopegraph

region_top4_na <- gtd_raw %>% 
  group_by(region_txt, iyear) %>% 
  filter(iyear %in% c(2012, 2017),
         region_txt %in% c("Middle East & North Africa", "South Asia", "South America", "Sub-Saharan Africa",                              "North America")) %>% 
  summarize(attacks = n()) 
ggplot(region_top4_na, aes(x = iyear, y = attacks, group = region_txt, color = region_txt)) +
  geom_line(linewidth = 1.5)

I tried a few things to make this work, but I’m not loving it. I think actually small multiples could be useful here.

Trying Small Multiples for Regions

gtd_region_year <- gtd_raw %>% 
  group_by(iyear, region_txt) %>% 
  summarize(attacks = n()) %>% 
  filter(!region_txt %in% c("Australasia & Oceania", "Central Asia", "East Asia"))
plot_sm_mult_region <- ggplot(data = gtd_region_year, aes(x = iyear, 
                                                          y = attacks, 
                                                          color = region_txt)) +
  geom_line(linewidth = 1.5) +
  facet_wrap(vars(region_txt)) +
  guides(color = "none") +
  theme_minimal() +
  theme(panel.grid.major = element_blank()) +
  scale_color_manual(values = c("Central America & Caribbean" = "#000004",
                                "Eastern Europe" = "#180F3D",
                                "Middle East & North Africa" = "#440F76",
                                "North America" = "#721F81",
                                "South America" = "#9E2F7F",
                                "South Asia" = "#CD4071",
                                "Southeast Asia" = "#F1605D",
                                "Sub-Saharan Africa" = "#FD9668",
                                "Western Europe" = "#FECA8D")) +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Attacks in Selected Regions, 1970-2017", x = "Year", 
       y = "Number of Attacks")
plot_sm_mult_region

Again, we can see that North America had a bump at the beginning but it has tapered down. Some regions, such as Central America & Caribbean, South America, and Western Europe have had an increased number of attacks that has dropped down quite low afterwards. This suggests that countries in the region may have found a way to limit the attacks in their countries.

I think it would be interesting to try some map stuff but I’m realizing that it would be difficult to join based on the country name, but I still want to try a map. I’ll try the US map in a bit. (I can’t get this one plot to align when knitting to PDF, but the full image is available in the final graphic.)

Attacks by Type

Next, I see that there is a column that details the type of attack, so I’m interested to see how common certain types of attacks are.

gtd_type <- gtd_raw %>% 
  group_by(attacktype1_txt) %>% 
  summarize(attacks = n()) %>% 
  rename(attack_type = attacktype1_txt) %>% 
  arrange(desc(attacks)) %>% 
  mutate(attack_type = recode(attack_type,
                              "Bombing/Explosion" = "Bombing/ Explosion",
                              "Facility/Infrastructure Attack" = "Facility/ Infrastructure Attack"))
plot_attack_types <- ggplot() +
  geom_col(data = gtd_type, aes(x = reorder(attack_type, -attacks), y = attacks, fill = attack_type),
                                color = "grey20") +
  guides(fill = "none", color = "none") +
  scale_x_discrete(labels = label_wrap(10)) +
  scale_y_continuous(labels = label_comma()) +
  scale_fill_viridis(option="magma", discrete = TRUE) +
  theme_minimal() +
  labs(title = "Types of Terrorist Attacks, 1970-2017", x = "Attack Type", y = "Number of Attacks")
plot_attack_types

For the entire time period, I can see that by far, the most common type of attack is Bombing/Explosion. I’m surprised that Hijacking is the least common, but it does take a significant amount of planning and resources. I’m interested to see if bombing has always been the most popular or if it has risen to prominence over time. Normally, I think I would remove ‘Unknown’ but it’s significantly higher than three other types of attack. I think it speaks to the nature of terrorism as chaotic and difficult to understand.

gtd_type_year <- gtd_raw %>% 
  group_by(iyear, attacktype1_txt) %>% 
  summarize(attacks = n()) %>% 
  rename(attack_type = attacktype1_txt) %>% 
  mutate(attack_type = recode(attack_type,
                              "Bombing/Explosion" = "Bombing/ Explosion",
                              "Facility/Infrastructure Attack" = "Facility/ Infrastructure Attack"))
gtd_type_top5 <- gtd_type_year %>% 
  filter(attack_type %in% c("Facility/ Infrastructure Attack", "Hostage Taking (Kidnapping)",
                            "Assassination", "Armed Assault", "Bombing/ Explosion"))
ggplot() +
  geom_line(data = gtd_type_top5, aes(x = iyear, y = attacks, color = attack_type), linewidth = 1.5) +
  scale_color_manual(values = c("Bombing/ Explosion" = "#000004",
                                "Armed Assault" = "#3B0F70",
                                "Assassination" = "#8C2981",
                                "Hostage Taking (Kidnapping)" = "#DE4968",
                                "Facility/ Infrastructure Attack" = "#FE9F6D")) +
  theme_minimal() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Attack Types Over Time", x = "Year", y = "Attacks by Type")

There are a lot of attack types, and many of them have very few instances so I isolated the top 5. I used the viridis magma scale, but one of the colors was very light so I decided to select a different hex color in the scale. The issue here is that I can’t reorder the values in the legend. Another issue is that I’m not convinced this tells us anything new - if bombs were unpopular then switched places, that might be more compelling but that’s not the case here. This sort of reiterates the same takeaways from the bar plot so I’m not sure I’ll include this in the final draft.

Attacks by US State

gtd_us_states <- gtd_raw %>% 
  filter(country_txt == "United States") %>% 
  group_by(provstate) %>% 
  summarize(attacks = n()) %>% 
  filter(!provstate %in% c("Alaska", "Hawaii", "US Virgin Islands", "Unknown")) %>% 
  rename(NAME = provstate)
gtd_us_states_year <- gtd_raw %>% 
  filter(country_txt == "United States") %>% 
  group_by(iyear, provstate) %>% 
  summarize(attacks = n()) %>% 
  filter(!provstate %in% c("Alaska", "Hawaii", "US Virgin Islands", "Unknown")) %>% 
  rename(NAME = provstate)
us_map_clean <- us_map %>% 
  filter(!STUSPS %in% c("AK", "AS", "HI", "MP", "GU", "VI"))
ggplot() + 
  geom_sf(data = us_map_clean)

It took a bit of tweaking - the initial map had all of the US territories which are spread widely across the world. Some of the states (Alaska for example) took up a lot of geographical space but were (mercifully) underrepresented in the number of terrorist attacks. I do want to add Puerto Rico

us_map_with_gtd <- us_map_clean %>% 
  left_join(gtd_us_states, by = "NAME")
plot_us_map <- ggplot() +
  geom_sf(data = us_map_with_gtd,
          aes(fill = attacks)) +
  scale_fill_viridis_c(option = "magma") +
  labs(title = "Terrorist Attacks in the Contiguous United States and Puerto Rico, 1970-2017") +
  theme_void() +
  theme(legend.position="left")
plot_us_map

Individual figures

Figure 1

plot_attacks_global

plot_combined_total_attacks

This one is simple but striking - it shows that terrorist attacks have increased during the time period covered by the GTD. The principles of CRAP will come up more in the larger handout, but in order to contribute to CRAP principles, I am focusing on Alberto Cairo’s principles for the individual plots. In this case, it’s functional because it sets the tone for the entire final document (a handout) that illustrates that terrorism has increased significantly - this plot sets the tone for the entire document, which will allow the viewer to delve into the details behind global terrorism trends.

I see some things I can do to improve this in Illustrator: some of the repeated axis labels and the titles could be adjusted. I may actually adjust these separately because including them together like this could have the same effect as double y axes because the scales are so different. Now that I have completed the map of the US, it might actually make more sense to move the second plot to the US map to add a sense of time to the map (which also solves the problem of drastically different scales on the y axes here)

ggsave(plot_attacks_global, filename = "output/plot_attacks_global.png")

Describe and show how you created the first figure. Why did you choose this figure type? What did you do to ensure it follows Cairo’s principles from A Truthful Art and Robin Williams’s CRAP?

Figure 2

plot_sm_mult_region

ggsave(plot_sm_mult_region, filename = "output/plot_sm_mult_region.png")

I like the repetition in this one. I also used 9 plots here because I particularly appreciate the alignment of the plots in three rows and three columns. I think this is where we start to establish the color scheme for the rest of the plots. I tweaked the line size and gridlines to keep it cleaner and to allow the lines to contrast against the background. This also adds a new level of insight into the regions that contribute to the overall trend.

Figure 3

plot_attack_types

ggsave(plot_attack_types, filename = "output/plot_attack_types.png")

The goal of this plot was to show which methods were most often used in terrorist attacks. Again, I used the same colors for repetition with the other figures. I think this is particularly illuminating because it shows that hijacking (something Americans may be particularly attuned to) is relatively infrequently used in favor of bombing and armed assaults. I did add borders around the bars because the bar for ‘Unknown’ blended in a bit with the background.

Figure 4

plot_us_map

plot_attacks_us

These are two plots for the US. The line shows that there is actually contrast in the data: the US has almost an inverse experience than the global community. I hope this is somewhat comforting to American viewers here, even though there is a slight uptick at the end of the study period.

The map is interesting, and I’m glad I was able to figure out how to add all the states. I think this also shows contrast in the data: California and New York host the most attacks. This makes sense in that these are two of the most populated states in the nation. I also include Puerto Rico to demonstrate that it’s in the midrange of number of attacks, and may be worth more support from Homeland Security and law enforcement.

ggsave(plot_us_map, filename = "output/plot_us_map.png")
ggsave(plot_attacks_us, filename = "output/plot_attacks_us.png")

Final figure

For contrast, I used complementary colors but the entirety of the viridis magma palette. I think this softens it a bit from strictly black and white. I also used contrasting (serif vs sans serif) fonts to make the written portions more appealing. I used the same magma palette so that each plot had repeated colors. I also used the lightest color in the magma palette as a background to add yet more repetition. To align everything, I used a rectangle around the whole page. I did a smaller rectangle with a summary of the main findings in the plots, then added commentary to individual plots by center aligning them, either on the bottom or to the side of a figure. Proximity was tough, I wanted the reader to be able to read left to right, and have one line for global data, with a second row below with details on the United States.

Ultimately, I think this is a useful figure and illuminates the nuance of terrorism across the world, and which regions have had success mitigating the impact of terrorism. I managed to get a lot of stories in one place, which I believe makes this a pretty functional document!

Final Figure - Handout