Click the Original, Code and Reconstruction tabs to see the visualisation selected, read about the issues and see how they were fixed.
The specific visulisation I have selected, to analyse and improve is the top visulisation from this infographic.The Average voter turnout, from the 10 countries which host partimentary elections.
Objective
I believe the intention of this original visualisation was to communicate political engagement across South East Asia. It appears SEA Global (a regional news agency) produced this graphic in 2014, for the general adult population of the local region (South East Asia).
This graphic was also shared by the organisaion on Twitter, thus extending the intended audience to also include the politically interested global population, and the migrant population from this region.
The specific visualisation I have selected had the following three main issues:
More specifically
Data Integrity - The only data source referenced on the original image is the CIA World Factbook. This site does not include any information about voter participation. It is likely this is an accurate data source for when various countries declared independence (The bottom element of the infographic, which is not being addressed as part of this assignment).There is no source provided for where SEA Global, derived their data for the average turnout of voters. Without knowing the source, the provenance of the data or ligitimacy cannot be determined or verified by the viewer.
Deceptive Methods - Image. The use of an graphic in place of a bar as part of this visulaisation, is ignoring a convention. Using a graphic (the finger dipped in ink which is sometimes used to prevent the same person voting multiple times) in place of a standard bar, makes drawing comparison much harder. Using the graphic makes the visualisation more interesting, and attracts attention, it does a diservice to the view as it makes it more difficult for the viewer to interpret the information being presented. The inclusion of numberic values for each countries, reduces the need to interpret the graphics, but these are also scaled to over-emphases the countries with the higher voter turnout.
Deceptive Methods - Area/Size. Not only does this visualisation break convention by using a graphic (as noted above), it also fails to present information in a consistent manner. The proportionate scaling of the graphics to represent different values does not follow any discernible pattern. Overall there are 33 percentage points difference between Thailand and Laos (the countries with the highest and lowest voter turnout). The ratio between the Voter turnout in Thailand and Laos is approximately 1:1.5. The area increase of the image in the visulisation is closer to 1:4.5, therefore overstating the difference in voter turnout between the two countries. Figure 2 below gives a rough visual indication of how the proportions have been distorted. By having an image which is much larger (and also with larger font) the viewer is mislead into believing there is a bigger change between the voter turn out between these countries than there really is.
Reference
@SEA_GLOBE. (2014). #politics in #SoutheastAsia is a mixed bag and no mistake [Tweet]. Twitter. Retrieved 4 September 2020, from https://twitter.com/SEA_GLOBE/status/487067035558363137
The following code was used to fix the issues identified in the original.
#librarires used
library(readr)
library (magrittr)
library (tidyr)
library(dplyr)
library(ggplot2)
library(ggflags) #found out about library from https://github.com/rensa/ggflags
library(stringr) #found library via 'cheat sheets' from applied analytics
#Import Election data (International Institute for Democracy and Electoral Assistance, 2020)
sea_vote <- read_csv("sea_vote.csv",
col_types = cols(Year = col_integer(),
`Compulsory_voting` = col_factor(levels = c("Yes",
"No")),
Country = col_factor(levels = c("Cambodia",
"Indonesia",
"Lao People's Dem. Republic",
"Malaysia",
"Myanmar",
"Philippines",
"Singapore",
"Thailand",
"Timor-Leste",
"Viet Nam",
"Indonesia2")),
Election_type = col_factor((levels = c("Parliamentary",
"Presidential" )))))
#import country code data (International Organization for Standardization [ISO], 2020)
library(readr)
country_codes <- read_csv("country codes.csv")
#check to make sure data has imported correctly
head(sea_vote)
## # A tibble: 6 x 11
## Country Election_type Year Voter_Turnout Total_vote Registration VAP_Turnout
## <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Cambod~ Parliamentary 2018 83.0 6956900 8380217 66.3
## 2 Cambod~ Parliamentary 2013 68.5 6627159 9675453 69.8
## 3 Cambod~ Parliamentary 2008 75.2 6111210 8125529 75.9
## 4 Cambod~ Parliamentary 2003 83.2 5277494 6341834 77.9
## 5 Cambod~ Parliamentary 1998 93.7 5057679 5395595 92.2
## 6 Cambod~ Parliamentary 1993 86.8 4134631 4764430 88.8
## # ... with 4 more variables: Voting_age_population <dbl>, Population <dbl>,
## # Invalid_votes <dbl>, Compulsory_voting <fct>
head (country_codes)
## # A tibble: 6 x 2
## Country `country code`
## <chr> <chr>
## 1 Cambodia KH
## 2 Indonesia ID
## 3 Lao People's Dem. Republic LA
## 4 Malaysia MY
## 5 Myanmar MM
## 6 Philippines PH
#confirm table structures
str(sea_vote)
## tibble [112 x 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Country : Factor w/ 11 levels "Cambodia","Indonesia",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ Election_type : Factor w/ 2 levels "Parliamentary",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Year : int [1:112] 2018 2013 2008 2003 1998 1993 2019 2014 2009 2004 ...
## $ Voter_Turnout : num [1:112] 83 68.5 75.2 83.2 93.7 ...
## $ Total_vote : num [1:112] 6956900 6627159 6111210 5277494 5057679 ...
## $ Registration : num [1:112] 8380217 9675453 8125529 6341834 5395595 ...
## $ VAP_Turnout : num [1:112] 66.3 69.8 75.9 77.9 92.2 ...
## $ Voting_age_population: num [1:112] 10499588 9500301 8051099 6772724 5488029 ...
## $ Population : num [1:112] 16449519 15205539 14241640 13607069 10192000 ...
## $ Invalid_votes : num [1:112] 8.55 NA 1.7 2.1 3.1 ...
## $ Compulsory_voting : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
## - attr(*, "spec")=
## .. cols(
## .. Country = col_factor(levels = c("Cambodia", "Indonesia", "Lao People's Dem. Republic", "Malaysia",
## .. "Myanmar", "Philippines", "Singapore", "Thailand", "Timor-Leste",
## .. "Viet Nam", "Indonesia2"), ordered = FALSE, include_na = FALSE),
## .. Election_type = col_factor(levels = c("Parliamentary", "Presidential"), ordered = FALSE, include_na = FALSE),
## .. Year = col_integer(),
## .. Voter_Turnout = col_double(),
## .. Total_vote = col_double(),
## .. Registration = col_double(),
## .. VAP_Turnout = col_double(),
## .. Voting_age_population = col_double(),
## .. Population = col_double(),
## .. Invalid_votes = col_double(),
## .. Compulsory_voting = col_factor(levels = c("Yes", "No"), ordered = FALSE, include_na = FALSE)
## .. )
str(country_codes)
## tibble [10 x 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Country : chr [1:10] "Cambodia" "Indonesia" "Lao People's Dem. Republic" "Malaysia" ...
## $ country code: chr [1:10] "KH" "ID" "LA" "MY" ...
## - attr(*, "spec")=
## .. cols(
## .. Country = col_character(),
## .. `country code` = col_character()
## .. )
#check for missing data
sum(is.na(sea_vote))
## [1] 72
sum(is.na(country_codes))
## [1] 0
#there are 72 missing data points, where are they?
colSums(is.na(sea_vote))
## Country Election_type Year
## 0 0 0
## Voter_Turnout Total_vote Registration
## 4 2 4
## VAP_Turnout Voting_age_population Population
## 5 4 2
## Invalid_votes Compulsory_voting
## 49 2
#missing data is mostly in invalid votes.
#The amount of missing data is slightly concerning, but may not be relevant.
#Process data to identify the most recent election, and then see how much data is missing
#restrict data to most recent dataset per country
Recent <- sea_vote %>% group_by(Country, Election_type) %>% summarise(Year = max(Year))
#select only the parlimentary elections
Recent <- Recent %>% filter(Election_type == "Parliamentary")
#use mutating join to subset data and get most recent data for all Parlimenary elections (1 per country)
toplot <-sea_vote %>% inner_join (Recent)
toplot <- toplot %>% inner_join(country_codes)
#find out how much data is missing
sum(is.na(toplot))
## [1] 3
#three missing variable seems ok, lets see where they are
colSums(is.na(toplot))
## Country Election_type Year
## 0 0 0
## Voter_Turnout Total_vote Registration
## 0 0 0
## VAP_Turnout Voting_age_population Population
## 0 0 0
## Invalid_votes Compulsory_voting country code
## 3 0 0
#all the missing data is invalid votes. This column is not critical, thus missing data should not
#prevent a valid analysis.
#Sorting country order, based on Voter Turnout
toplot$Country <- toplot$Country %>% factor(levels = toplot$Country[order(toplot$Voter_Turnout)])
#change the country codes to lower case so they will work with ggflags
toplot$code <- str_to_lower(toplot$`country code`)
#declare the colour pallet
colour_scheme3 <- c("#c6f2f7", "#E7D4E8")
#Construct graph
p1<- ggplot(toplot, aes(Country, Voter_Turnout, country=code, fill =Compulsory_voting))+
geom_bar( stat = "identity")+
labs(title = "Voter turn out is high in South East Asia, even when optional",
subtitle = "Voter Turnout per counrty for most recent Parlementary Elections",
x= NULL,
y = "Voter turn out (as a percent of registered voters)
",
caption = "Institute for Democracy and Electoral Assistance 2020")+
geom_flag()+
coord_flip()+
scale_fill_manual(values= colour_scheme3)+
geom_text (aes(x=Country, y =6, label= Year), hjust =0)+
geom_text(aes(x=Country, y =52, label= round(Voter_Turnout,1), hjust=0)) +
theme_set(theme_classic()) +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.line = element_line(colour = "grey50"),
axis.text.y = element_text(colour = "grey50"),
title = element_text(colour ="grey50"))
Data Reference
Voting data:
International Institute for Democracy and Electoral Assistance. (2020). Voter Turnout Database | International IDEA. Retrieved 4 September 2020, from https://www.idea.int/data-tools/data/voter-turnout
Country codes:
International Organization for Standardization [ISO]. (2020). The International Standard for country codes and codes for their subdivisions (ISO 3166). International Organization for Standardization. https://www.iso.org/obp/ui/#iso:std:iso:3166:-1:ed-4:v1:en
The following plot fixes the main issues in the original.