Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: How Geo-Mapping Helps Identify Trends in Global Terrorism (2020).


Objective

The main objective of the data visualization is to determine the regions and countries around the world with the highest rate of terrorist attacks. Through the visualization, the creator wants to identify the biggest regional contributors to terrorist attacks.It is one of the parts of an important story that the author want to tell.The basis of this story is the estimation provided by United Nations that nearly 70% of the world population will be living in urban areas by 2050. The author wants to explore the disadvantages of living or either moving to an urban city in terms of safety aspect.

Targeted audience

The targeted audience are all the people around the world who are looking to shift to an urban area in their respective countries in the near future or are already living in an urban area. The author wants the audience to think about how safe is the urban area they are presently living in or thinking of moving to in the near future.

Issues

The visualisation chosen had the following three main issues:

1. Choice of plot:

The author has done a good job in identifying the regional areas and countries with highest rate of terrorist attacks. However, the choice of plot used for visualizing the data is not great. Doughnut charts provide a visual analysis of the data which often cause errors in understanding it. This type of chart is only suitable for presenting comparisons for atmost two or three values of a particular variable, The audience may find it difficult to understand the comparison between each region & country in terms of rate of attacks that took place in each region & country. For example, it can be seen that Pakistan (South India) and Iraq (Middle East & North America) have the highest rate of terrorist attacks. But it is very difficult to generate a result about other countries/regions and the comparison is not quite clear by just looking at the area that they are covering in the doughnut chart,

2. No Labels of count/percentage on each area covered by the countries to support the analysis:

The second most important issue is the unlabelled values in counts/percentages to support the visualization generated. All the values in the doughnut chart are not labelled. The visual analysis in terms of area covered by each value in the chart alone cannot be considered an appropriate presentation of data. Every data visualization must contain labelled values as a support or proof which validates it. Therefore, the visualization is incomplete as there are no statistics to support the chart generated.

3. Missing colour scale and presence of region maps which creates confusion:

The author has done a good job in creating region maps around the doughnut chat. However, their presence confuses the audience as it is very difficult to interpret whether their position is according to the position of the country which is located in that region or they have been randomly set. Even if we assume the former, the interpretation can still be wrong. For example, Pakistan and Afghanistan belong to South Asia region but in the visualization it looks like they are part of Middle East and North America. This creates deception from the original information.

The quality of a visualization does not depend on how beautifully you show data, but it depends upon how clear and understandable it is to the target audience. The only thing which was to be done was to have a proper colour-scale assigned to each region which can then be used to present the countries in the respective regions. Though the colours have been assigned to each region (as there are coloured regions for three different groups of countries) , but the colour-scale section showing which colour has been assigned to which region is missing. This again creates confusion as the audience won’t be able to identify the regions of each country shown.

Reference

Code

The following code was used to fix the issues identified in the original.

# Load all packages required

library(readr)
library(dplyr)
library(ggplot2)

# Read the dataset and store in terror variable using resd_csv

terror <- read_csv("terror.csv")

# Changing column names for easy writability in the code

colnames(terror)[30] <- "Type_of_Attack"
colnames(terror)[11] <- "Region"
colnames(terror)[9] <- "Country"

# Filtering out the countries with highest rate of terror attacks

t1 <- filter(terror, Country == "Sri Lanka")
t2 <- filter(terror,Country == "Colombia" )
t3 <- filter(terror,Country == "Peru" )
t4 <- filter(terror,Country == "Chile" )
t5 <- filter(terror,Country == "Iraq" )
t6 <- filter(terror,Country == "Pakistan" )
t7 <- filter(terror,Country == "Afghanistan" )
t8 <- filter(terror,Country == "Turkey" )
t9 <- filter(terror,Country == "Yemen" )
t10 <- filter(terror,Country == "Algeria" )
t11 <- filter(terror,Country == "Egypt" )
t12 <- filter(terror,Country == "India" )

# Merging all the above data and storing in most_terror variable

most_terror <- rbind(t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12)

# Plot the bar graph

t <- ggplot(most_terror, aes(x=Country)) + geom_bar(aes(y = ..count.. , fill = Region) , stat="count", position = position_dodge(width = 0.9)) 

# Put the y-axis label to horizontal position for better readibility

t <- t + theme(axis.title.y = element_text(angle = 0, vjust=0.5))

# Decease the size of labels to avoid overlapping

t <- t + theme(axis.text = element_text(size = 8.5))

# Provide axis labels and main title

t <- t + labs(y = "Count of Attacks") + labs(x= "Attacked Country") + ggtitle("Countries and Regions with the Most Terrorist Attacks, from 1970-2017")

# Put the main title in centre of graph

t <- t + theme(plot.title = element_text(hjust = 0.5))

# Put labels in percentage to depict rate of atacks for each country

t <- t +  geom_text(aes(label = scales::percent(..prop..), group =2), stat="count", vjust = -0.5, position = position_dodge(0.9))

# Put background colour in panel and plot

t <- t + theme(plot.background = element_rect("lightyellow")) +
 theme(panel.background = element_rect("lightgrey"))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.