Missing data when performing inner_join() on country for country shapefile and terrorism dataset. Mismatch of country name in terrorism dataset and shapefile.
Solution: Check for the country name that does not match and check if country exists in the shapefile. If exist, then modify the name of the country in the terrorism dataset to match the shapefile.
Some packages do not offer interactive visualisations.
Solution:Have to search for other packages that are compatible with the package to make visualisations interactive. An example would be the treemap package, which was used together with d3treeR to create an interactive treemap.
Unable to aggregate count of data automatically unlike in Tableau.
Solution: Have a clear plan on what I want to visualise and create subsets of orginal dataset aggregating across different variables.
Complicated dataset with many variables and possible levels of analysis, experienced difficulty in capturing the information for the all levels of analysis (Regional, Country, Provstate, City) without creating a large number of visualisations and filtering of the visualisations. E.g. Terrorism data captures the region, country, province/state and city details of the attacks as well as weapon, target and attack types, each with their own subtypes.
Solution: Opted to focus on just one level of analysis instead to reduce the complexity of the visualisation process. In this case, I opted to visualise terrorist attacks on the regional level, the Middle East & North Africa Region in particular as it has the most number of attacks, thus would provide more data for visualisation.
packages = c('sf', 'tmap', 'tidyverse', 'ggplot2', 'plotly', 'treemap','readxl')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
library('devtools')
install_github("timelyportfolio/d3treeR")
Assign the shapfile to world. As orignal terorism dataset is an xlsx file, read the file with the read_excel() function from readxl package. Filter the original file for the following columns as seen below. Then save as csv file with the write.csv() function. Load the file and assign it to terror.
world <- st_read(dsn = "World_map", layer = "99bfd9e7-bb42-4728-87b5-07f8c8ac631c2020328-1-1vef4ev.lu5nk")
## Reading layer `99bfd9e7-bb42-4728-87b5-07f8c8ac631c2020328-1-1vef4ev.lu5nk' from data source `C:\Users\Bangkit\Documents\IS428 Visual Analytics for Business Intelligence\Assignment 5\World_map' using driver `ESRI Shapefile'
## Simple feature collection with 251 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.6236
## Geodetic CRS: WGS 84
original <- readxl::read_excel("globalterrorismdb_0221dist.xlsx")
filter_original <- original %>%filter(iyear >=2015) %>%
select(iyear,imonth,iday,approxdate,country_txt,region_txt,
provstate,city,attacktype1_txt,weaptype1_txt,targtype1_txt,
gname,suicide,success,nkill,nkillus,nwound,nwoundus,
latitude,longitude)
write.csv(filter_original, file = "Terrorism 2015 to 2019.csv", row.names = F)
terror <- read.csv("Terrorism 2015 to 2019.csv")
As I am focusing on the Middle East and North Africa region, filter terror for the ME & North Africa region and assign to terror_ME. For attacks that took place in West Bank and Gaza Strip, I renamed the country_txt to their respective location (West Bank or Gaza Strip) according to the provstate variable as these two areas are not grouped as a single country in the shapefile.
terror_ME <- terror %>% filter(region_txt == "Middle East & North Africa")
terror_ME$country_txt <- as.character(terror_ME$country_txt)
terror_ME[terror_ME$provstate == "West Bank",]$country_txt <- "West Bank"
terror_ME[terror_ME$provstate == "Gaza Strip",]$country_txt <- "Gaza Strip"
terror_ME$country_txt <- as.factor(terror_ME$country_txt)
The goal of the map is to visualise the extent to which each country in the region is affected by terrorist attacks as well as where these attacks usually take place in each country.
Generate the data required to produce the map visualisation. Aggregate total number of attacks from year 2015 to 2019 by country and assign to terror_ME_summarised.
terror_ME_summarised <- aggregate(data.frame(Frequency = terror_ME$country_txt), by = list(Country = terror_ME$country_txt), FUN = length)
As terror_ME_summarised would eventually be inner-joined with world on country to generate a choropleth map, check if there are any country names in terror_ME_summarised that are missing from world.
terror_ME_summarised[!(terror_ME_summarised$Country %in% world$CNTRY_NAME), "Country"]
## factor(0)
## 20 Levels: Algeria Bahrain Egypt Gaza Strip Iran Iraq Israel ... Yemen
In this case, we have made modifications to the country name “West Bank and Gaza Strip” earlier under terror_ME to refer to their respective places instead of recognising it as one country, therefore no mismatch in country names. Perform an inner join for world and terror_ME_summarised and assign to ME_terrormap, which would be used to generate choropleth map.
ME_terrormap <- inner_join(world,terror_ME_summarised, by = c("CNTRY_NAME"= "Country"))
To generate the density map, convert the longititude and latitude in terror_ME to coordinates. As there are missing coordinates in the original dataset, make use of !is.na() to filter out rows with valid coordinates. Assign this to terror_ME_coord.
terror_ME_coord <- st_as_sf(terror_ME[!is.na(terror_ME$latitude),], coords = c("longitude", "latitude"), crs = 4326)
Execute code below to generate the interactive map.
tmap_mode("view")
tm_shape(ME_terrormap)+
tm_fill("Frequency",
style = "pretty",
palette = "YlOrRd",
id = "CNTRY_NAME")+
tm_layout(title= "Map of Terrorist Attacks in Middle East & North Africa from 2015 to 2019",
legend.outside = TRUE,
frame = FALSE)+
tm_borders(alpha = 0.5)+
tm_shape(terror_ME_coord)+
tm_dots(title = "Attack Type",
col = "attacktype1_txt",
id="country_txt",
popup.vars = c("Year" = "iyear", "Country" = "country_txt",
"Province/State" = "provstate", "City" = "city",
"Attack Type"="attacktype1_txt", "Weapon"="weaptype1_txt",
"Target"="targtype1_txt", "Terrorist Group" = "gname",
"Killed"="nkill", "Wounded"="nwound"),
size = 0.01)
Aim of the bump chart is to visualise which terrorist groups are considered major threats in the region as well as potential groups to look out for with growing presence in the region in terms of the number of attacks attributed to them.
First, manipulate the terror_ME data to generate a dataset ranking terrorist groups in terms of the number of attacks in the region for each year and assign it to terror_groups.
terror_groups <- aggregate(data.frame(Attacks = terror_ME$gname), by = list(Year = terror_ME$iyear, `Terrorist Group` = terror_ME$gname), FUN = length)
terror_groups <- terror_groups %>%
group_by(Year) %>%
arrange(Year, desc(Attacks), `Terrorist Group`) %>%
mutate(Rank = row_number()) %>%
ungroup()
Execute code below to generate bump chart for top 10 terror groups for each year in the region.
Terror_group_bump<- ggplot(terror_groups, aes(x = Year, y = Rank, Attacks = Attacks))+ geom_line(data = subset(terror_groups, Rank <=10),aes(color = `Terrorist Group`), size = 2, alpha = 0.5)+geom_point(data = subset(terror_groups, Rank <=10), aes(color = `Terrorist Group`), size = 4)+ scale_y_reverse(breaks = c(1:10))+ggtitle("Top 10 Terrorist Groups by Year")
ggplotly(Terror_group_bump, tooltip = c("x","Terrorist Group", "Attacks", "y"))
Bar charts will show which target types are most vulnerable in the region as well as what are the most common types of attacks in the region.
Generate the data required.
target <- terror_ME[, c("targtype1_txt")]
target <- aggregate(data.frame(Count = target), by = list(Target = target), FUN = length)
attack <- terror[, c("attacktype1_txt")]
attack <- aggregate(data.frame(Count = attack), by = list(Attack = attack), FUN = length)
Plot bar chart for attack type.
attack_plot <- ggplot(attack, aes(x = reorder(Attack, Count), y = Count))+ geom_bar(stat = "identity", fill = "lightblue")+ labs(y = "Count", x = "Attack Type")+coord_flip()+ggtitle("Count of Attack Types in ME & North Africa<br>2015-2019")
ggplotly(attack_plot, tooltip = c("Count"))
Plot bar chart for target type.
target_plot <- ggplot(target, aes(x = reorder(Target, Count), y = Count))+ geom_bar(stat = "identity", fill = "lightblue")+ labs(y = "Count", x = "Target Type") + coord_flip() + ggtitle("Count of Target Types in ME & North Africa<br>2015-2019")
ggplotly(target_plot, tooltip = c("Count"))
The goal is to visualise the cumulative number of attacks in each country located in the region within the selected time frame of the data, which will be further broken down to the province/state level as an attempt to visualise more granular details in one visualisation.
Aggregate total number of attacks in each province/state for each country and assign it to terror_provstate.
terror_provstate <- aggregate(data.frame(Frequency = terror_ME$provstate), by = list(Country = terror_ME$country_txt, ProvinceState = terror_ME$provstate), FUN = length)
Execute code to generate interactive treemap.
provstate_tree <- treemap(terror_provstate,
index = c("Country", "ProvinceState"),
vSize = "Frequency",
type = "index",
bg.labels=c("white"),
align.labels=list(c("center", "center"),c("center", "center")))
d3treeR::d3tree2(provstate_tree, rootname = "No. of Attacks in Country and Province/State from 2015 to 2019")