Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The original data visualisation is representing information about current residence of refugees in different countries in 2019. The objective of the visualisation is to show that half of the population of total refugee is residing in only six countries of the world. So , the visualisation focuses on maximum density of refugee-demographics worldwide.
The information of the visualisation is useful for social scientists, specialist on international affairs, researchers, charity organizations like ‘Red-cross’, global organizations, foreign high commissions or embassies, concerned authority of any country or any person who is willing to have understanding on refugee demographics of the world.
Issues
The visualisation chosen had the following three main issues:
Issue 1:
Information provided by the visualisation has some data integrity problems. World Bank visualisation is not matching with it’s referenced data source. World Bank sourced the data from UNHCR and then adapted those data in it’s data catalog (The World Bank|Data Catalog). Here are the discrepancies:
The visualisation is showing that second highest number of refugees are staying in Jordan. But UNHCR positioned Jordan at number 10 . According to UNHCR, total refugees residing at Jordan in 2019 is 692700 (UNHCR Global Trends 2019,pg22) whereas World Bank figure for Jordan in 2019 is 2967046 (DataBank |World Development Indicators) which is four times higher than true data.
The heading of the visualisation is not correct. It says that six countries occupy the residence of half of the refugee-population. However, UNHCR data is showing that seven countries occupy half of the world refugee population. Another reason of this discrepancy is that World bank considered ‘Gaza and West Bank’ as a different country. But the reality is that ‘Palestine’ is not still considered as a country by ‘United Nations’.(United Nations. Member States).Hence, ‘UNHCR’, the main source of the data , did not include Gaza and West Bank in their data-set as a country.
So,the visualisation fails to show three countries which should be with in top 50% among all countries . As more than 40% country data is missing in this visualisation considering original data source, UNHCR data has been selected instead of World Bank data to get new visualisation showing correct data.
Issue 2:
The visualisation is a pie-chart that uses area and angle to represent proportions. The values for Pakistan, Lebanon and Uganda are very close to each other. with out reading the numbers, it is very difficult to visualize whether the percentages among those countries are equal or not. And we know that angle and area used in pie chart are inferior to position in terms of accuracy. That is why, a bar chart of filled type is a better option to produce a very good visualisation. Moreover data for percentages of others is unnecessary because audience can easily understand that rest of the countries will be considered as others in filled bar plot.
Issue 3:
The visualisation can not be readily interpreted by the people with colour blindness. We know that around 8% male in the world (Baglin,2020,p.97) have some form of colour blindness. As this data are presented by the highly responsible authorities of the world, utmost care in presenting visualisation is expected including the issue of ‘blindness friendly’. In this particular visualisation, any person with tritanomaly(reduced blue sensitivity) can hardly identify the difference of ‘Others’ and ‘Pakistan’. Because the changed perception of blue colour is very much similar to the colour of ’others". Similarly, people with deuteranopia will percieve the green colour exactly as grey (Wickline, as cited in Baglin, 2020). So, blindness friendly colour combinations will solve this issue.
From the discussions above, it is clearly understandable that the visualisation is suffering from lack of data integrity, inappropriate charts to represent data accurately, in-depth colour issues. So, scopes are there for improvements on those issues. Moreover, refugee population in terms of country of origin is not shown here which is another important considerations for the target audience of this visualisation. So, aspects of country of origin of refugees can also be included in the reconstructed visualisation.
Reference
Baglin, J. (2020), Data visualisation: from theory to practice.
Colorbrewer2 https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3
The World Bank. Data catalog. Half of the world’s refugee population resides in only six countries. Retrieved September 16, 2020, from ‘The World Bank’ website: https://datacatalog.worldbank.org/search/visualizations?search_api_views_fulltext_op=AND&query=Half%20Of%20The%20World%27s%20Refugee%20Population%20Resides%20In%20Only%20Six%20Countries%20&f%5B2%5D=type%3Aresource&sort_by=search_api_relevance
The World Bank. Data bank| World development indicators. https://databank.worldbank.org/reports.aspx?source=2&series=SM.POP.REFG
UNHCR. Global trends. Froced displacement in 2019. https://www.unhcr.org/en-au/statistics/unhcrstats/5ee200e37/unhcr-global-trends-2019.html
UNHCR. Refugee data finder. https://www.unhcr.org/refugee-statistics/download/?url=N6nC
United Nations. Member states. https://www.un.org/en/member-states/#gotoP
The following code was used to fix the issues identified in the original.
# Required packages are loaded. UNHCR data for source and host country is downloaded to working directory.
setwd("C:/Users/User/OneDrive/Desktop/assign_2/UNHCR_2")
library(readr)
library(dplyr)
library(tidyr)
library(knitr)
library(magrittr)
library(ggplot2)
#Source data named "population_assylum" and "population_origin" are imported from working directory to this template using readr package.
#Relevant variables are being selected and renamed. Two objects consisting selected variables are being created.
population_assylum<-read_csv("population_assylum.csv", skip=14)
assylum<-population_assylum[,c(4,6)]
colnames(assylum)<-c( "country","refugee_assylum")
head(assylum)
## # A tibble: 6 x 2
## country refugee_assylum
## <chr> <dbl>
## 1 Afghanistan 72227
## 2 Albania 120
## 3 Algeria 98599
## 4 Angola 25793
## 5 Egypt 258391
## 6 Argentina 3857
Population_origin <-read_csv("population_origin.csv", skip=14)
origin<-Population_origin[,c(2,6)]
colnames(origin)<-c("country","refugee_origin")
head(origin)
## # A tibble: 6 x 2
## country refugee_origin
## <chr> <dbl>
## 1 Afghanistan 2728853
## 2 Albania 15027
## 3 Algeria 4514
## 4 Angola 8178
## 5 Antigua and Barbuda 117
## 6 Egypt 27506
# Creating two tables by mutating relevant percentage data of refugees of different countries for both origin and assylum objects.
# Selecting only top seven observations for hosted countries and
# Selecting top two observations from orgin objects.
assylum_percentage <- assylum%>%
mutate(current_refugee_residence=refugee_assylum*100/sum(refugee_assylum,na.rm=TRUE))%>%
arrange(desc(current_refugee_residence))%>%
head(n=7)%>%select(-refugee_assylum)
origin_percentage <- origin%>%
mutate(origin_of_refugee=(refugee_origin*100)/sum(refugee_origin,na.rm=TRUE))%>%
arrange(desc(origin_of_refugee))%>%
head(n=2)%>%select(-refugee_origin)
# Forming a table named "refugee" for making bar plot by tidying up data from two percentage tables.
# Selecting complete cases. Factorising and ordering country variable.
refugee<-assylum_percentage %>%
full_join(origin_percentage , by= "country")%>%
gather(refugee_location,percentage,2:3)%>%
na.omit()
refugee$country<-
factor(refugee$country, levels=
c("Lebanon","Iran (Islamic Rep. of)","Sudan",
"Germany","Uganda","Pakistan","Turkey",
"Afghanistan","Syrian Arab Rep."),ordered = TRUE)
refugee
## # A tibble: 9 x 3
## country refugee_location percentage
## <ord> <chr> <dbl>
## 1 Turkey current_refugee_residence 17.5
## 2 Pakistan current_refugee_residence 6.94
## 3 Uganda current_refugee_residence 6.65
## 4 Germany current_refugee_residence 5.61
## 5 Sudan current_refugee_residence 5.16
## 6 Iran (Islamic Rep. of) current_refugee_residence 4.79
## 7 Lebanon current_refugee_residence 4.48
## 8 Syrian Arab Rep. origin_of_refugee 32.4
## 9 Afghanistan origin_of_refugee 13.3
# Choosing colour-blindness frienly colour schemes manually from colorbrewer webtool.
palette <- c('#d73027','#f46d43','#fdae61','#fee090','#ffffbf','#e0f3f8',
'#abd9e9','#74add1','#4575b4','#313695')
# Generating bar plots using ggplot2 package following layer by layer approach.
p1<- ggplot(data= refugee, aes(x=refugee_location,y=percentage,fill= country))
p<-p1+
geom_bar(stat='identity', colour="blue")+
labs(title =
"Half of the world's refugee resides in only 7 countries;
45% of total refugee is displaced just from 2 countries. ",
y = "% of total refugee of the world",
x = "Refugee percentages by host country and country of origin" )+
geom_text(aes(label = round(percentage)), size = 3.0, hjust = 0.5, vjust = 2,
position ="stack",colour="black")+
scale_fill_manual(values = palette)
Data Reference
UNHCR. Refugee Data Finder. Retrieved September 16, 2020, from ‘UNHCR’ website: https://www.unhcr.org/refugee-statistics/download/?url=N6nC
UNHCR. Refugee Data Finder. Retrieved September 16, 2020, from ‘UNHCR’ website: https://www.unhcr.org/refugee-statistics/download/?url=92Nv
The following plot fixes the main issues in the original. Instead of six, seven countries of the world is occupying 50% of refugee population. Germany, Sudan and Iran are included in the visualisation instead of Jordan, West Bank and Gaza. So, data integrity is ensured. Filled type bar plot solves the issues of pie chart. Colour blindness friendly schemes are being chosen from ‘colorbrewer 2’ website that solves colour issues. Information on origin of refugee is included which is showing that only two countries contribute to the 45% population of world refugee.