STAT545A hw06: Analysis of the Global Terrorism Database

Rebecca Johnston

21/10/2013

I have chosen to complete my final Exploratory Data Analysis assignment using the Global Terrorism Database (GTD), available for download here. The GTD includes over 100 000 incidents of terrorism from 1970 to 2011 (N.B. Incidents of terrorism from year 1993 are not present in the GTD as they were lost).

Here I will display the figures generated by my analysis pipeline using R and the graphical package ggplot2. For my full code, and access to the data used herein, please visit my github repo.

Since there are over 100 variables in the GTD, I immediately restricted my analyses to 15 variables for the purposes of this assignment. In addition, I shortened some region names for ease of graphing, so here “MENA” means Middle East and North Africa.


Total fatalities and total wounded per incident over time?

First, I want to explore the total number of fatalities per incident over time using a scatter plot:

plot of chunk unnamed-chunk-2

N.B. I have deliberately chosen not to show extreme outliers by manually specifying the y axis limits. Here, the extreme outliers are the September 11 attacks (2001) which were counted in the GTD as two separate incidents, each with 1381.5 fatalities.

The majority of terrorist incidents have a low number of fatalities, and I have added transparency to the points (which represent incidents) to convey this.

Initially, it appears that this graph may be suitable as a boxplot, but the inter-quartile range is 1, so that would not be appropriate to graph given the spread of the data.

summary(terr$nkill)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0       0       0       2       1    1380    6907

Since there are so many incidents to convey on the one plot, we could introduce the ggplot2 function facet_wrap to separate the fatalities by region:

plot of chunk unnamed-chunk-4

Now we can observe that some regions have very low total numbers of fatalities per terrorist incident (e.g. Oceania and Central Asia), whilst other regions have quite a spread in the number of fatalities per incident (e.g. Sub-Saharan Africa and MENA). Let's compare this result to the number of individuals wounded per incident over time by region:

plot of chunk unnamed-chunk-5

The differences between the number of fatalities and number of individuals wounded by region appear to be subtle, but let's compare the two variables properly by plotting the data on the same graph. To do this, I used data aggregation to find the total number of fatalities and the total number of individuals wounded by year and region. I then reshaped the data into tall format to allow group in ggplot2:

plot of chunk unnamed-chunk-7

So yes, for the most part, the total number of fatalities and the total number of individuals harmed follow a similar trend over time by region. However, one striking deviation from this trend was during 1980-1985 in Central America, where there was a maximum of ~5000 fatalities but no where near as many individuals harmed.


Maximum number of fatalities per region for any one incident?

What about the observed outliers, what was the maximum number of fatalities per region for any one incident?

region_txt maxKill
Oceania 17.00
Central Asia 23.00
Southeast Asia 116.00
Eastern Europe 180.00
East Asia 184.00
Western Europe 270.00
South America 275.00
Central America 300.00
Russia 344.00
MENA 422.00
South Asia 518.00
Sub-Saharan Africa 1180.00
North America 1381.50

plot of chunk unnamed-chunk-10

The terrorist incident with the most number of casualties occurred in North America (this was 9/11). Which terrorist groups were behind the attacks with the most fatalities per region?

Region Max killed Terrorist group name
Oceania 17.00 Kanak Separatists
Central Asia 23.00 Unknown
Southeast Asia 116.00 Abu Sayyaf Group (ASG)
Eastern Europe 180.00 Serbian Militants
East Asia 184.00 Unknown
Western Europe 270.00 Libyan
South America 275.00 Revolutionary Armed Forces of Colombia (FARC)
Central America 300.00 Unknown
Russia 344.00 Riyadus-Salikhin Reconnaissance and Sabotage Battalion of Chechen Martyrs
MENA 422.00 Mujahedin-e Khalq (MEK)
South Asia 518.00 Communist Party of Nepal- Maoist (CPN-M)
Sub-Saharan Africa 1180.00 Hutus
North America 1381.50 Al-Qaida


How many fatalities are attributed to terrorist attacks each year?

plot of chunk unnamed-chunk-14


Total number of fatalities per attack type from 1970-2011?

plot of chunk unnamed-chunk-16