Click the Original, Code and Reconstruction tabs to see the visualisation selected, read about the issues and see how they were fixed.
I have selected the top visualisation in this infographic (FIgure 1) for analysis and improvement. It shows the average voter turnout for elections across 10 countries in South East Asia.
Objective
I believe the intention of the original visualisation was to show have engagement with the political process varies across Southeast Asia. The image was shared by SEA Globe (a regional news agency) in 2014, as there is no specific attribution, I believe this organisaiton is the primary source, a google reverse image search did not find any other copies of the graphic. SEA Globe - South East Asian Globe is an independent long form news source. The publication “covers the topics Power, Money, Life and Earth around the region. These are stories about current affairs, business, social issues and climate and the environment.” (Southeast Asian Globe, 2020). As the articles are written in English, which tends to be an additional language within Southeast Asia (rather than an official language in these countries), it is likely the primary audience for the article is the western immigrants living within the region, with the global population with and interest in the regional politics, as a secondary audience.
The specific visualisation has the following three main issues:
Details:
Data Integrity - The only data source referenced on the original image is the CIA World Factbook. This site, maintained by the US government, it unlikely to be an impartial source. The CIA World Factbook, provides a high level summary of the countries and it written in a highly accessible manner (almanac style) which suggests it has been created to assist novice researchers (Central Intelligence Agency, 2020) . The lack of rigor in referencing and partiality of the single cited source, allows questions of the validity of the data used in other graphics to remain unanswered. Upon investigation, the World Factbook does not include any information about voter participation. While, it is likely this is an accurate data source for the years various countries declared independence or the number of recognised political parties (see the bottom elements of the infographic, which are outside the scope of this assignment). There is no source provided for the average turnout of voters. Without knowing the source, the provenance of the data or its legitimacy cannot be determined or verified by the viewer.
To reconstruct the visualisation, I have sourced data on voter turn out from the International Institute for Democracy and Electoral Assistance. This organisation based in Sweden, is a internationally recognised, valid source regarding global democratic process.
Deceptive Methods - Image. The use of a graphic in place of a bar as part of this visualisation ignores a convention. Using a graphic (the finger dipped in ink which is sometimes used to prevent the same person voting multiple times) in place of a standard bar makes it much harder for the viewer to draw comparisons between the data. Using the graphic makes the visualisation more interesting, and attracts attention, but ultimately does a disservice to the viewer as it makes it more difficult for the viewer to interpret the information presented. The inclusion of numeric values for each country, reduces the need to interpret the graphics, but these are also over-emphasise countries with the higher voter turnout.
Deceptive Methods - Area/Size. Not only does this visualisation break convention by using a graphic (as noted above), it also fails to present information in a consistent manner. The proportionate scaling of the graphics to represent different values does not follow any discernible pattern. Overall, there is a 33 percent difference between Thailand and Laos (the countries with the highest and lowest voter turnout). The ratio between the voter turnout in Thailand and Laos is approximately 1:1.5. The area increase of the image in the visualisation is closer to 1:4.5, therefore overstating the difference in voter turnout between the two countries. Figure 2 (below) gives a rough visual indication of how the proportions have been distorted. Using an image which is much larger (and also with larger font) misleads the viewer into believing there is a bigger change between the voter turn out between these countries than there really is.
Reference
@SEA_GLOBE. (2014). #politics in #SoutheastAsia is a mixed bag and no mistake [Tweet]. Twitter. Retrieved 4 September 2020, from https://twitter.com/SEA_GLOBE/status/487067035558363137
Central Intelligence Agency. (2020). The World Factbook. https://www.cia.gov/library/publications/the-world-factbook/
Southeast Asia Globe. (2020). Policies. Southeast Asia Globe. Retrieved 6 September 2020, from https://southeastasiaglobe.com/policies/
The following code was used to fix the issues identified in the original.
#librarires used
library(readr)
library (magrittr)
library (tidyr)
library(dplyr)
library(ggplot2)
library(ggflags) #found out about library from https://github.com/rensa/ggflags
library(stringr) #found library via 'cheat sheets' from applied analytics
library(rmarkdown)
#Import Election data sourced from the International Institute for Democracy and Electoral Assistance (2020).
sea_vote <- read_csv("sea_vote.csv",
col_types = cols(Year = col_integer(),
`Compulsory_voting` = col_factor(levels = c("Yes",
"No")),
Country = col_factor(levels = c("Cambodia",
"Indonesia",
"Lao People's Dem. Republic",
"Malaysia",
"Myanmar",
"Philippines",
"Singapore",
"Thailand",
"Timor-Leste",
"Viet Nam",
"Indonesia2")),
Election_type = col_factor((levels = c("Parliamentary",
"Presidential" )))))
#import country code data (International Organization for Standardization [ISO], 2020)
library(readr)
country_codes <- read_csv("country codes.csv")
# #Confirmation of data quality -----
#
# #check to make sure data has imported correctly
# head(sea_vote)
# head (country_codes)
# #confirm table structures
# str(sea_vote)
# str(country_codes)
#
#
# #check for missing data
# sum(is.na(country_codes))
# sum(is.na(sea_vote))
# #there are 72 missing data points, in the sea_votes dataset. Where are they?
#
# colSums(is.na(sea_vote))
# #missing data is mostly in invalid votes.
# #The amount of missing data is slightly concerning, but may not be relevant.
# #Process data to identify the most recent election, and then see how much data is missing
#restrict data to most recent dataset per country
Recent <- sea_vote %>% group_by(Country, Election_type) %>% summarise(Year = max(Year))
#select only the parliamentary elections
Recent <- Recent %>% filter(Election_type == "Parliamentary")
#use mutating join to subset data and get most recent data for all Parliamentary elections (1 per country)
toplot <-sea_vote %>% inner_join (Recent)
toplot <- toplot %>% inner_join(country_codes)
# #find out how much data is missing
# sum(is.na(toplot))
# #three missing variable seems ok, lets see where they are
# colSums(is.na(toplot))
# #all the missing data is invalid votes. This column is not critical, thus missing data should not
# #prevent a valid analysis.
#Sort country order, based on Voter Turnout
toplot$Country <- toplot$Country %>% factor(levels = toplot$Country[order(toplot$Voter_Turnout)])
#change the country codes to lower case so they will work with ggflags
toplot$code <- str_to_lower(toplot$`country code`)
#declare the colour pallet
colour_scheme3 <- c("#c6f2f7", "#E7D4E8")
#Construct graph
theme_set(theme_classic())
p1 <-
ggplot(toplot,
aes(Country, Voter_Turnout, country = code, fill = Compulsory_voting)) +
geom_bar(stat = "identity") +
labs(
title = "High voter turnout in Southeast Asia, even with optional voting",
subtitle = "Voter turnout per counrty for the most recent parliamentary elections
(election year indicated on country bar)",
x = NULL,
y = "Voter turnout (as a percent of registered voters)
",
caption = "Institute for Democracy and Electoral Assistance 2020"
) +
geom_flag() +
coord_flip() +
scale_fill_manual(values = colour_scheme3) +
geom_text (aes(x = Country, y = 6, label = Year), hjust = 0) +
geom_text(aes(
x = Country,
y = 52,
label = round(Voter_Turnout, 1),
hjust = 0
)) +
annotate(geom = "text", x= "Malaysia", y= 110, label ="Voting is
compulsory in
Singapore and
Thailand", hjust="right", size =4, colour = "grey50")+
theme(
legend.position = "none",
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.line = element_line(colour = "grey50"),
axis.text.y = element_text(colour = "grey50"),
title = element_text(colour = "grey50")
)
Data Reference
Voting data:
International Institute for Democracy and Electoral Assistance. (2020). Voter Turnout Database | International IDEA. Retrieved 4 September 2020, from https://www.idea.int/data-tools/data/voter-turnout
Country codes:
International Organization for Standardization [ISO]. (2020). The International Standard for country codes and codes for their subdivisions (ISO 3166). International Organization for Standardization. https://www.iso.org/obp/ui/#iso:std:iso:3166:-1:ed-4:v1:en
The following plot fixes the main issues in the original.