Introduction

This report contains an Exploratory Data Analysis of The Global Terorrism Database.

I have elected to use this dataset as my father is a Retired Marine Veteran
having served a great deal of my life protecting our country from the events
described in this dataset.

Throughout this report I will refer to the dataset as GTD.

The GTD dataset consists of 135 variables/columns with a total record count of 170,350 records.

As I conduct the Exploratory Data Analysis I will not use all of the 135
variables that are available to the dataset.

With that said an explanation of each chosen variable will be provided as it
was chosen with some significant value to the EDA process.

Univariate Plots Section

Univariate Summaries

colnames(gtData)
##   [1] "eventid"            "iyear"              "imonth"            
##   [4] "iday"               "approxdate"         "extended"          
##   [7] "resolution"         "country"            "country_txt"       
##  [10] "region"             "region_txt"         "provstate"         
##  [13] "city"               "latitude"           "longitude"         
##  [16] "specificity"        "vicinity"           "location"          
##  [19] "summary"            "crit1"              "crit2"             
##  [22] "crit3"              "doubtterr"          "alternative"       
##  [25] "alternative_txt"    "multiple"           "success"           
##  [28] "suicide"            "attacktype1"        "attacktype1_txt"   
##  [31] "attacktype2"        "attacktype2_txt"    "attacktype3"       
##  [34] "attacktype3_txt"    "targtype1"          "targtype1_txt"     
##  [37] "targsubtype1"       "targsubtype1_txt"   "corp1"             
##  [40] "target1"            "natlty1"            "natlty1_txt"       
##  [43] "targtype2"          "targtype2_txt"      "targsubtype2"      
##  [46] "targsubtype2_txt"   "corp2"              "target2"           
##  [49] "natlty2"            "natlty2_txt"        "targtype3"         
##  [52] "targtype3_txt"      "targsubtype3"       "targsubtype3_txt"  
##  [55] "corp3"              "target3"            "natlty3"           
##  [58] "natlty3_txt"        "gname"              "gsubname"          
##  [61] "gname2"             "gsubname2"          "gname3"            
##  [64] "gsubname3"          "motive"             "guncertain1"       
##  [67] "guncertain2"        "guncertain3"        "individual"        
##  [70] "nperps"             "nperpcap"           "claimed"           
##  [73] "claimmode"          "claimmode_txt"      "claim2"            
##  [76] "claimmode2"         "claimmode2_txt"     "claim3"            
##  [79] "claimmode3"         "claimmode3_txt"     "compclaim"         
##  [82] "weaptype1"          "weaptype1_txt"      "weapsubtype1"      
##  [85] "weapsubtype1_txt"   "weaptype2"          "weaptype2_txt"     
##  [88] "weapsubtype2"       "weapsubtype2_txt"   "weaptype3"         
##  [91] "weaptype3_txt"      "weapsubtype3"       "weapsubtype3_txt"  
##  [94] "weaptype4"          "weaptype4_txt"      "weapsubtype4"      
##  [97] "weapsubtype4_txt"   "weapdetail"         "nkill"             
## [100] "nkillus"            "nkillter"           "nwound"            
## [103] "nwoundus"           "nwoundte"           "property"          
## [106] "propextent"         "propextent_txt"     "propvalue"         
## [109] "propcomment"        "ishostkid"          "nhostkid"          
## [112] "nhostkidus"         "nhours"             "ndays"             
## [115] "divert"             "kidhijcountry"      "ransom"            
## [118] "ransomamt"          "ransomamtus"        "ransompaid"        
## [121] "ransompaidus"       "ransomnote"         "hostkidoutcome"    
## [124] "hostkidoutcome_txt" "nreleased"          "addnotes"          
## [127] "scite1"             "scite2"             "scite3"            
## [130] "dbsource"           "INT_LOG"            "INT_IDEO"          
## [133] "INT_MISC"           "INT_ANY"            "related"

Now that i“ve had a chance to review the column names I have already identified a few variables that appear to have significant value.

summary(gtData$success)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  1.0000  1.0000  0.8964  1.0000  1.0000

Success Summary: 90% of terrorist attackes where denoted as a success.

summary(gtData$suicide)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.00000 0.03387 0.00000 1.00000

Suicide Summary: 4% of the total attacks where denoted as suicide.

summary(gtData$nkill)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##    0.000    0.000    0.000    2.387    2.000 1500.000     9682

NKill Summary: 2.387 deaths per attack with a peak 1500

summary(gtData$city)
##                Unknown                Baghdad                Karachi 
##                   9162                   7206                   2609 
##                   Lima                Belfast                  Mosul 
##                   2358                   2140                   1775 
##               Santiago           San Salvador              Mogadishu 
##                   1618                   1547                   1351 
##               Istanbul                 Athens                 Bogota 
##                   1037                    987                    981 
##                 Beirut                 Kirkuk               Medellin 
##                    919                    887                    846 
##               Peshawar               Benghazi         Guatemala City 
##                    798                    791                    754 
##                 Quetta                Baqubah                  Kabul 
##                    752                    725                    644 
##               Srinagar              Jerusalem                  Paris 
##                    637                    605                    598 
##               Fallujah                  Dhaka                   Rome 
##                    561                    549                    548 
##                Tripoli                 Ramadi                 Manila 
##                    537                    482                    481 
##                 Aleppo           Buenos Aires               Ayacucho 
##                    469                    460                    459 
##          New York City                  Sanaa                        
##                    449                    447                    446 
##                 Madrid                Algiers                  Arish 
##                    414                    406                    400 
##                 Tikrit                 Imphal                 London 
##                    398                    392                    383 
##              Maiduguri            Londonderry               Damascus 
##                    376                    349                    344 
##               Kandahar                 Bilbao                   Gaza 
##                    343                    335                    333 
##                Colombo                   Cali                Ajaccio 
##                    321                    311                    305 
##             Abu Ghraib                 Ankara                 Tehran 
##                    290                    289                    276 
##                   Aden Donostia-San Sebastian           Tuz Khormato 
##                    275                    272                    268 
##                Samarra           Johannesburg                  Baiji 
##                    259                    255                    240 
##                  Rafah              Jalalabad                 Madain 
##                    240                    236                    231 
##          Sheikh Zuweid                  Taizz              Bujumbura 
##                    231                    229                    227 
##                 Bastia                 Grozny              Barcelona 
##                    225                    222                    221 
##                 Lahore                Bangkok             Muqdadiyah 
##                    218                    212                    211 
##             Mahmudiyah                  Cairo                  Milan 
##                    205                    203                    203 
##                Managua                 Hebron                Kismayo 
##                    201                    200                    194 
##            Makhachkala                  Sidon               Tarmiyah 
##                    194                    193                    193 
##                Donetsk                  Basra              Santa Ana 
##                    192                    191                    191 
##                 Jaffna             San Miguel                   Taji 
##                    189                    182                    182 
##               Huancayo                 La Paz                  Sirte 
##                    181                    180                    175 
##            Lashkar Gah                  Tokyo                  Khost 
##                    174                    172                    171 
##                   Bara                 Nablus             Batticaloa 
##                    170                    166                    165 
##            Tegucigalpa                 Ghazni                 Jamrud 
##                    164                    162                    157 
##                (Other) 
##                 107248

City Summary: Top 3 cities are Baghdad, Karachi, and Lima. 1

summary(gtData$natlty1_txt)
##                             Iraq                         Pakistan 
##                            21625                            13168 
##                            India                      Afghanistan 
##                            11110                             9669 
##                         Colombia                      Philippines 
##                             7783                             5997 
##                             Peru                      El Salvador 
##                             5832                             5212 
##                    United States                           Turkey 
##                             4976                             4436 
##                           Israel                         Thailand 
##                             3940                             3623 
##                 Northern Ireland                          Nigeria 
##                             3288                             3270 
##                            Spain                            Yemen 
##                             3091                             2908 
##                           France                        Sri Lanka 
##                             2867                             2811 
##                          Algeria                          Somalia 
##                             2648                             2636 
##                    International                           Russia 
##                             2428                             2282 
##                            Chile                            Egypt 
##                             2227                             2209 
##                    Great Britain                     South Africa 
##                             2104                             2040 
##                        Nicaragua                            Syria 
##                             1960                             1954 
##                            Libya                        Guatemala 
##                             1952                             1893 
##                          Ukraine                       Bangladesh 
##                             1605                             1593 
##                            Italy                                  
##                             1458                             1394 
##                          Lebanon                            Nepal 
##                             1381                              950 
##                           Greece                          Germany 
##                              917                              898 
##         West Bank and Gaza Strip                             Iran 
##                              889                              812 
##                            Sudan                        Indonesia 
##                              750                              697 
##                        Argentina Democratic Republic of the Congo 
##                              641                              569 
##                          Burundi                            Kenya 
##                              556                              554 
##                           Mexico                            Japan 
##                              462                              457 
##                          Myanmar                           Angola 
##                              440                              423 
##                            China                     Saudi Arabia 
##                              369                              360 
##                           Uganda                          Ireland 
##                              354                              336 
##                       Mozambique                    Multinational 
##                              310                              289 
##                          Bolivia                             Mali 
##                              283                              279 
##                        Venezuela                Serbia-Montenegro 
##                              271                              256 
##                         Honduras                           Brazil 
##                              253                              244 
##                         Cameroon                         Cambodia 
##                              220                              203 
##                            Haiti                     Soviet Union 
##                              200                              199 
##                          Georgia                          Ecuador 
##                              187                              182 
##                          Bahrain         Central African Republic 
##                              170                              168 
##                         Ethiopia                      Netherlands 
##                              167                              150 
##                      Switzerland                           Rwanda 
##                              143                              141 
##                         Portugal                           Canada 
##                              140                              136 
##                       Tajikistan                      South Sudan 
##                              135                              134 
##                          Namibia                        Australia 
##                              131                              130 
##                       Yugoslavia                            Niger 
##                              125                              124 
##                          Tunisia                          Senegal 
##                              117                              116 
##                           Sweden                           Panama 
##                              115                              113 
##                          Belgium                           Jordan 
##                              110                              109 
##               Bosnia-Herzegovina                        Macedonia 
##                              106                              106 
##                         Paraguay               West Germany (FRG) 
##                              105                              104 
##                          Albania                             Cuba 
##                               96                               94 
##                         Zimbabwe                           Cyprus 
##                               90                               89 
##               Dominican Republic                         Malaysia 
##                               86                               86 
##                          Austria                          (Other) 
##                               82                             2452

Nationality Summary: Top 5 Iraq, Pakistan, India, Afghanistan, and Columbia

summary(gtData$weaptype1_txt)
##                                                                  Biological 
##                                                                          35 
##                                                                    Chemical 
##                                                                         293 
##                                                   Explosives/Bombs/Dynamite 
##                                                                       86704 
##                                                                Fake Weapons 
##                                                                          33 
##                                                                    Firearms 
##                                                                       55273 
##                                                                  Incendiary 
##                                                                       10459 
##                                                                       Melee 
##                                                                        3338 
##                                                                       Other 
##                                                                         104 
##                                                                Radiological 
##                                                                          13 
##                                                          Sabotage Equipment 
##                                                                         130 
##                                                                     Unknown 
##                                                                       13852 
## Vehicle (not to include vehicle-borne explosives, i.e., car or truck bombs) 
##                                                                         116

Weapon Summary: Explsives/Bombs and Firearms are top weapons of choice

summary(gtData$iyear)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1970    1990    2007    2002    2014    2016

Year Summary: Dates range from 1970 to 2016 nearly 50 years of data

summary(gtData$imonth)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   4.000   6.000   6.474   9.000  12.000

Month Summary: June seems to be the average Month attacks occur.

summary(gtData$region_txt)
##       Australasia & Oceania Central America & Caribbean 
##                         264                       10340 
##                Central Asia                   East Asia 
##                         554                         794 
##              Eastern Europe  Middle East & North Africa 
##                        5031                       46511 
##               North America               South America 
##                        3346                       18762 
##                  South Asia              Southeast Asia 
##                       41497                       11453 
##          Sub-Saharan Africa              Western Europe 
##                       15491                       16307

Region Summary: Middle East and South Asia are high attack areas

Given the above summaries it can be determined that approximately 90% of the total attacks 170350 were a success.

Univariate Plots > Note Analysis of each plot will be provided in the subtitle of the plot.

hchart(factor(gtData$success), name = "Success") %>% 
  hc_title(text = "Success Univ. Plot") %>% 
  hc_subtitle(text = "Majority of the attacks are denoted as success.")
hchart(factor(gtData$suicide), name = "Suicide") %>%
  hc_title(text = "Suicide Univ. Plot") %>% 
  hc_subtitle(text = "Only a small amount of the attacks were suicide attacks.")
hchart(factor(gtData$iyear), name = "Year") %>%
  hc_title(text = "Year Univ. Plot") %>% 
  hc_subtitle(text = "Shows attacks by year.")
uni_weap <- gtData %>% 
  group_by(weaptype1_txt) %>% 
  summarise(count = n())
## Warning in grouped_df_impl(data, unname(vars), drop): '.Random.seed' is not
## an integer vector but of type 'NULL', so ignored
hchart(uni_weap$count, name  = "Weapon Type 1") %>% 
  hc_title(text = "Weapon Type 1 Univ. Plot") %>% 
  hc_subtitle(text = "Histogram of attacks by Weapon Type 1")
uni_country <- gtData %>% 
  group_by(country_txt) %>% 
  summarise(count = n(), unique = length(unique(country_txt))) %>% 
  arrange(-count, -unique)

hchart(uni_country, "treemap",
       hcaes(x = country_txt, value = count, color = unique)) %>% 
  hc_title(text = "Country Univ. Plot") %>% 
  hc_subtitle(text = "Attacks by Country")
uni_nkill <- gtData %>% 
  filter(nkill != '', nkillus != '') %>% 
  group_by(nkill) %>% 
  summarise(t_nkill = sum(nkill + nkillus))

hcboxplot(uni_nkill$t_nkill, name = "Nkill") %>%
  hc_title(text = "Number of Kills Univ. Plot") %>% 
  hc_subtitle(text = "Boxplot Showing Attacks by Number of Kills")
hchart(factor(gtData$imonth), name = "Month") %>%
  hc_title(text = "Month Univ. Plot") %>% 
  hc_subtitle(text = "Shows attacks by month")
hchart(factor(gtData$iday), name = "Day") %>%
  hc_title(text = "Day Univ. Plot") %>% 
  hc_subtitle(text = "Shows attacks by Day")
data_hours <- gtData %>% 
  filter(nhours != '', nhours > 0) %>% 
  group_by(nhours) %>% 
  summarise(count = n())

hchart(data_hours$count, name = "Hours") %>%
  hc_title(text = "Hours Univ. Plot") %>% 
  hc_subtitle(text = "Shows attacks by hour")
hchart(factor(gtData$region_txt), name = "Region") %>%
  hc_title(text = "Region Univ. Plot")
hchart(factor(gtData$attacktype1_txt), name = "Attack Type 1") %>%
  hc_title(text = "Attack Type 1 Univ. Plot")

Univariate Analysis

What is the structure of your dataset?

The data contains 170,350 records of attacks with a total of 135 variables.

What is/are the main feature(s) of interest in your dataset?

The dataset contains 135 variables with a total of 170,350 terroist attacks.

The dataset also contains geographic coordinates for each attak.

One important point of interest in the dataset is the weapon details.

The dataset gives us information, regarding the type of weapon used which can be analyzed to determine the types of weapon choice within an attak.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

Variables such as hostage, resolution, year,month, day, vicinity, and summary
are all great points of interest in this dataset.

The additional variables mentioned are exteremely critical.

If we were to take this analysis a step further and step into predictive analysis.

The dataset could be used to cross examine potential threats and correlate them
to a potential outcome of the threat based on historical data of attacks.

Did you create any new variables from existing variables in the dataset?

In later exploration I will introduce new variables such as count.

The count variable will be used in statistical comparrisions and plots.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

I have used the dplyr functions summarise, group_by, mutate, and filter to subset, organize and prepare the data.

Bivariate Plots Section

# plot weapons by year
weaponByYear <- gtData %>% 
  group_by(iyear, weaptype1_txt) %>% 
  summarise(count = n())

bp1 <- hchart(weaponByYear, "scatter",
              hcaes(x = iyear, y = count, group = weaptype1_txt)) %>% 
  hc_title(text = "Bivariate Weapon By Year") %>% 
  hc_subtitle(text = "The chart shows that Firearms are consistent weapons of
              use, however Explosives/Bombs/Dynamite have inceasingly killed
              more people as time goes on.")

bp1
# plot country by year
bivarCountry <- gtData %>% 
  group_by(country_txt) %>% 
  summarise(count = n()) %>% 
  arrange(desc(.data$count)) %>% 
  head(20)

bp2 <- hchart(bivarCountry, "column",
              hcaes(x = country_txt, y = count, group = country_txt)) %>% 
  hc_title(text = "Count By Country") %>% 
  hc_subtitle(text = "Shows the top 20 countries by total number of attacks" )

bp2
# plot weapons by n of kills
bivarWeapXNKills <- gtData %>% 
  filter(nkill != "") %>% 
  group_by(weaptype1_txt, nkill) %>% 
  arrange(desc(nkill)) %>% 
  summarise()

hchart(bivarWeapXNKills, "bar",
       hcaes(x = weaptype1_txt, y = nkill, group = weaptype1_txt)) %>% 
  hc_title(text = "Weapons by Kills") %>% 
  hc_subtitle(text = "Shows weapons by the number of recorded kills, firearms
              and Explosives appear to be the primary choice of weapons for
              terroist attacks.")

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

I discovered different weapon usages in correlation to the number of deaths
that are impacted by a particular weapon.

I also analyzed this data by year, and found that as the years progress , more
specifically the Global War on Terroism.

Explosives have also increasingly become more popular overtime.

This knowledge may be common knowledge if one follows the news, but this
analysis allows to put factual statistics behind the topic.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

The correlation between nkill and weaptype1_txt variable.

Higher impact weapons such as explosives tend to have a larger impact on the
nkill variable.

Where as lower impact weapons tend to have a lower impact on nkill.

What was the strongest relationship you found?

Weapon type based on the year of the attack.

This observation shows the slight evolution of weapon choice overtime.

Multivariate Plots Section

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

I observed the relationship between the country and the weapon type used
compared to the number of attacks for the given nation and weapon.

Were there any interesting or surprising interactions between features?

I found it interesting that Iraq, in the pie chart, contained a higher
percentage of the attacks, along with a higher count of explosives.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

No efforts were focussed on analyzing the correlation between the weapon type
used and the number of attacks by country.


Final Plots and Summary

Plot One

Description One

The Bivariate Weapon By Year shows the usage of a weapon over time.

This plot showed Explosives/Bombs/Dynamites to be an increasingly
popular weapon of choice over the years.

Plot Two

Description Two

The Count By Country plot shows a breakdown of the top 20 countries
based on the total number of attacks.

After analyzing this plot I was able to identify 3 countries
Iraq, Afghanistan, and Pakistan as top countries in the dataset where terroist
attacks have taken place.

Plot Three

Description Three

The plot contains the comparison between weapon, count, and country.

I was able to find a significant correlation between the type of weapon used
and the country of origin where the attack took place.

Iraq was leading metric as it relates to the Explosives and number of attacks.


Reflection

Exploring the Global Terroism Database was a very informing and challenging.

While exploring this dataset I experienced several different challenges.

Due to the size of the dataset over 170,000 records.

I learned that cleaning and subsetting the data to only explore the points of
interest, are extremely important in data exploration.

I also learned that during quick explorations to determine which path will be
taken for deeper explorations require knowing when to filter the data
along with knowing which data points to filter.

For example I started out by subsetting the data by 100. I quickcly realized
that the results of my exploration were not consistent to what I was expecting.

After taking a closer look I realized that I had already grouped my data by year.

Having done so, when I subset the data I only saw recrods for the year 1970.

Which would have significantly limited my exploration to a very narrow piece of the data.

This dataset opens a world of exploration opportunities.

By analyzing and storing the variables of a threat, and cross referencing the threat to the collected data within this dataset, it may be possible to determine the potential outcome of a terroist threat and possibly prevent the threat from taking place.

Bonus Plot

While exploring this dataset I have learned a great deal about the Global War on Terroism, along with the geographic impact terroism can have on the world.

I decided to render a map of the terroist attacks.

d1 <- gtData %>% 
  mutate(name = country_txt) %>% 
  group_by(name) %>% 
  summarise(z = n())

hcmap("custom/world", data = d1, value = "z", borderColor = "#FAFAFA",
      borderWidth = 0.1) %>% 
  hc_mapNavigation(enabled = TRUE, enableDoubleClickZoomTo = TRUE) %>% 
  hc_title(text = "Global Terroism Attacks World Map")

  1. Tripoli is one of the historical battle grounds for the United
    States Marine Corps.