Major Data & Design Challenges

  • Missing data when performing inner_join() on country for country shapefile and terrorism dataset. Mismatch of country name in terrorism dataset and shapefile.
    Solution: Check for the country name that does not match and check if country exists in the shapefile. If exist, then modify the name of the country in the terrorism dataset to match the shapefile.

  • Some packages do not offer interactive visualisations.
    Solution:Have to search for other packages that are compatible with the package to make visualisations interactive. An example would be the treemap package, which was used together with d3treeR to create an interactive treemap.

  • Unable to aggregate count of data automatically unlike in Tableau.
    Solution: Have a clear plan on what I want to visualise and create subsets of orginal dataset aggregating across different variables.

  • Complicated dataset with many variables and possible levels of analysis, experienced difficulty in capturing the information for the all levels of analysis (Regional, Country, Provstate, City) without creating a large number of visualisations and filtering of the visualisations. E.g. Terrorism data captures the region, country, province/state and city details of the attacks as well as weapon, target and attack types, each with their own subtypes.
    Solution: Opted to focus on just one level of analysis instead to reduce the complexity of the visualisation process. In this case, I opted to visualise terrorist attacks on the regional level, the Middle East & North Africa Region in particular as it has the most number of attacks, thus would provide more data for visualisation.

Sketch of Proposed Visualisations

Step-by-step Data Visualisation Process

1. Installing Required Packages

packages = c('sf', 'tmap', 'tidyverse', 'ggplot2', 'plotly', 'treemap','readxl')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

library('devtools')
install_github("timelyportfolio/d3treeR")

2. Loading Data

Assign the shapfile to world. As orignal terorism dataset is an xlsx file, read the file with the read_excel() function from readxl package. Filter the original file for the following columns as seen below. Then save as csv file with the write.csv() function. Load the file and assign it to terror.

world <- st_read(dsn = "World_map", layer = "99bfd9e7-bb42-4728-87b5-07f8c8ac631c2020328-1-1vef4ev.lu5nk")
## Reading layer `99bfd9e7-bb42-4728-87b5-07f8c8ac631c2020328-1-1vef4ev.lu5nk' from data source `C:\Users\Bangkit\Documents\IS428 Visual Analytics for Business Intelligence\Assignment 5\World_map' using driver `ESRI Shapefile'
## Simple feature collection with 251 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.6236
## Geodetic CRS:  WGS 84
original <- readxl::read_excel("globalterrorismdb_0221dist.xlsx")
filter_original <- original %>%filter(iyear >=2015) %>%
  select(iyear,imonth,iday,approxdate,country_txt,region_txt,
         provstate,city,attacktype1_txt,weaptype1_txt,targtype1_txt,
         gname,suicide,success,nkill,nkillus,nwound,nwoundus,
         latitude,longitude)
write.csv(filter_original, file = "Terrorism 2015 to 2019.csv", row.names = F)
terror <- read.csv("Terrorism 2015 to 2019.csv")

As I am focusing on the Middle East and North Africa region, filter terror for the ME & North Africa region and assign to terror_ME. For attacks that took place in West Bank and Gaza Strip, I renamed the country_txt to their respective location (West Bank or Gaza Strip) according to the provstate variable as these two areas are not grouped as a single country in the shapefile.

terror_ME <- terror %>% filter(region_txt == "Middle East & North Africa")

terror_ME$country_txt <- as.character(terror_ME$country_txt)
terror_ME[terror_ME$provstate == "West Bank",]$country_txt <- "West Bank"
terror_ME[terror_ME$provstate == "Gaza Strip",]$country_txt <- "Gaza Strip"
terror_ME$country_txt <- as.factor(terror_ME$country_txt)

3. Building Visualisations

Interactive Map

The goal of the map is to visualise the extent to which each country in the region is affected by terrorist attacks as well as where these attacks usually take place in each country.

Generate the data required to produce the map visualisation. Aggregate total number of attacks from year 2015 to 2019 by country and assign to terror_ME_summarised.

terror_ME_summarised <- aggregate(data.frame(Frequency = terror_ME$country_txt), by = list(Country = terror_ME$country_txt), FUN = length)

As terror_ME_summarised would eventually be inner-joined with world on country to generate a choropleth map, check if there are any country names in terror_ME_summarised that are missing from world.

terror_ME_summarised[!(terror_ME_summarised$Country %in% world$CNTRY_NAME), "Country"]
## factor(0)
## 20 Levels: Algeria Bahrain Egypt Gaza Strip Iran Iraq Israel ... Yemen

In this case, we have made modifications to the country name “West Bank and Gaza Strip” earlier under terror_ME to refer to their respective places instead of recognising it as one country, therefore no mismatch in country names. Perform an inner join for world and terror_ME_summarised and assign to ME_terrormap, which would be used to generate choropleth map.

ME_terrormap <- inner_join(world,terror_ME_summarised, by = c("CNTRY_NAME"= "Country"))

To generate the density map, convert the longititude and latitude in terror_ME to coordinates. As there are missing coordinates in the original dataset, make use of !is.na() to filter out rows with valid coordinates. Assign this to terror_ME_coord.

terror_ME_coord <- st_as_sf(terror_ME[!is.na(terror_ME$latitude),], coords = c("longitude", "latitude"), crs = 4326)

Execute code below to generate the interactive map.

tmap_mode("view")
tm_shape(ME_terrormap)+
  tm_fill("Frequency",
          style = "pretty",
          palette = "YlOrRd",
          id = "CNTRY_NAME")+
    tm_layout(title= "Map of Terrorist Attacks in Middle East & North Africa from 2015 to 2019",
              legend.outside = TRUE,
            frame = FALSE)+
  tm_borders(alpha = 0.5)+
  tm_shape(terror_ME_coord)+
  tm_dots(title = "Attack Type",
          col = "attacktype1_txt",
          id="country_txt",
          popup.vars = c("Year" = "iyear", "Country" = "country_txt", 
                         "Province/State" = "provstate", "City" = "city", 
                         "Attack Type"="attacktype1_txt", "Weapon"="weaptype1_txt",
                         "Target"="targtype1_txt", "Terrorist Group" = "gname",
                         "Killed"="nkill", "Wounded"="nwound"),
                         size = 0.01)

Terrorist Group Bump Chart

Aim of the bump chart is to visualise which terrorist groups are considered major threats in the region as well as potential groups to look out for with growing presence in the region in terms of the number of attacks attributed to them.

First, manipulate the terror_ME data to generate a dataset ranking terrorist groups in terms of the number of attacks in the region for each year and assign it to terror_groups.

terror_groups <- aggregate(data.frame(Attacks = terror_ME$gname), by = list(Year = terror_ME$iyear, `Terrorist Group` = terror_ME$gname), FUN = length)
terror_groups <- terror_groups %>%
  group_by(Year) %>%
  arrange(Year, desc(Attacks), `Terrorist Group`) %>%
  mutate(Rank = row_number()) %>%
  ungroup()

Execute code below to generate bump chart for top 10 terror groups for each year in the region.

Terror_group_bump<- ggplot(terror_groups, aes(x = Year, y = Rank, Attacks = Attacks))+ geom_line(data = subset(terror_groups, Rank <=10),aes(color = `Terrorist Group`), size = 2, alpha = 0.5)+geom_point(data = subset(terror_groups, Rank <=10), aes(color = `Terrorist Group`), size = 4)+ scale_y_reverse(breaks = c(1:10))+ggtitle("Top 10 Terrorist Groups by Year")

ggplotly(Terror_group_bump, tooltip = c("x","Terrorist Group", "Attacks", "y"))

Bar Charts showing Attack Types and Target Types

Bar charts will show which target types are most vulnerable in the region as well as what are the most common types of attacks in the region.

Generate the data required.

target <- terror_ME[, c("targtype1_txt")]
target <- aggregate(data.frame(Count = target), by = list(Target = target), FUN = length)

attack <- terror[, c("attacktype1_txt")]
attack <- aggregate(data.frame(Count = attack), by = list(Attack = attack), FUN = length)

Plot bar chart for attack type.

attack_plot <- ggplot(attack, aes(x = reorder(Attack, Count), y = Count))+ geom_bar(stat = "identity", fill = "lightblue")+ labs(y = "Count", x = "Attack Type")+coord_flip()+ggtitle("Count of Attack Types in ME & North Africa<br>2015-2019")

ggplotly(attack_plot, tooltip = c("Count"))

Plot bar chart for target type.

target_plot <- ggplot(target, aes(x = reorder(Target, Count), y = Count))+ geom_bar(stat = "identity", fill = "lightblue")+ labs(y = "Count", x = "Target Type") + coord_flip() + ggtitle("Count of Target Types in ME & North Africa<br>2015-2019")

ggplotly(target_plot, tooltip = c("Count"))

Tree Map for Number of Attacks in Country and Province/State

The goal is to visualise the cumulative number of attacks in each country located in the region within the selected time frame of the data, which will be further broken down to the province/state level as an attempt to visualise more granular details in one visualisation.

Aggregate total number of attacks in each province/state for each country and assign it to terror_provstate.

terror_provstate <- aggregate(data.frame(Frequency = terror_ME$provstate), by = list(Country = terror_ME$country_txt, ProvinceState = terror_ME$provstate), FUN = length)

Execute code to generate interactive treemap.

provstate_tree <- treemap(terror_provstate,
        index = c("Country", "ProvinceState"),
        vSize = "Frequency",
        type = "index",
        bg.labels=c("white"),
        align.labels=list(c("center", "center"),c("center", "center")))
d3treeR::d3tree2(provstate_tree, rootname = "No. of Attacks in Country and Province/State from 2015 to 2019")

Final Visualisations and Observations

Observations:

  • Iraq is the country most badly affected by terrorist attacks from 2015 to 2019 in the Middle East & North Africa region, with a total number of 10700 attacks.

  • Terrorist attacks for Turkey and Syria seem to be concentrated at the shared borders.

  • Western side of Yemen more badly affected by terrorism as compared to the eastern side.

Observations:

  • From 2015 to 2019, “Unknown” ranked first as compared to other terrorist groups in terms of number of attacks. This would mean that most attacks per year were not conducted by terrorist groups and could be random individuals, making tracking and targeting them harder as there might not be a connection between the different terrorists.

  • ISIL, Houthi Extremists and Kurdistan Workers’ Party remain a significant threat in the region as these 3 groups consistently remained in the top 2 to 4 spots across the years.

Observations:

  • Bombing/Explosion is the most common choice of attack in the region, followed by armed assault and kidnapping. This is seen from the cumulative number of attacks from 2015 to 2019 using these methods of attack.

Observations:

  • Private citizens & Property is the most susceptible to attack in the region, followed by Military and Police This is seen from the cumulative number of attacks from 2015 to 2019 targeting these groups.

Observations:
Most attacked Province/State for each country are:
Iraq: Baghdad
Yemen: Taizz
Syria: Aleppo
Libya: Benghazi
Egypt: North Sinai
Turkey: Diyarbakir
Saudi Arabia: Jazan
Israel: Southern
Lebanon: Beqaa
Tunisia: Kasserine