Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


reddit.com/r/dataisbeautiful - Successful space missions since 2000. by u/stagOverflow


Objective

The objective of the original visualization is to show where successful space flight launches have occurred in the world since the year 2000 to the audience of reddit.com/r/dataisbeautiful. This subreddit is a general data visualization forum where users post visualizations and infographics. Since anyone can post, the quality is mixed and there is a heavy emphasis on interesting and aesthetically pleasing visualizations, however the overall quality can be lacking and users are untrained in spotting issues. Therefore the audience in this context is quite general as there will be a mix of data visualization practitioners, enthusiasts who are more about aesthetics and the general reddit user who may find the content interesting due to aesthetics or the implications of the message. The author has chosen to use a proportional symbols map to communicate ‘Space Flights Launches since 200’

The visualization chosen had the following three main issues:

  • The authors choice of using a proportional symbols map allows the data to be visualized in a succinct manner but it fails in being able to communicate precisely where a launch has occurred and fails in being able to compare how many launches have occurred at a certain location. Since the objective is to communicate where launches have occurred - this can only be communicated in an extremely general sense due to the size of the circles and the lack of any labeling or legends. The data points do not give the viewer any indication of where a launch is happening and relies purely on the viewer deduce where a launch occurred which along with the following mistakes, diminishes how useful this visualization is.

    • There is a lot of overlap of circles which makes it hard to spot where launches have occurred. Especially in the case of a location with a similar amount of launches close by. For example there appear to be two locations in Eastern Russia that just overlap (Figure 1), it is very hard to spot that there are two data points here. This begs the question, why are these even separated? Are they truly different launch locations?
      Figure 1. Hard to spot datapoints
    • The scale appears to be incorrect, also visible in figure 1 is a circle which is smaller than the legend scale of 1, another example of this can be seen in the Pacific Ocean west of the American Continent (Figure 2). As the visualziation is showing a sucessful launch, wether a launch hapens or not is binary therefore there cannot be any partial launches or launches smaller than 1 therefore the scale or the data is incorrect. Further on this point - it es extremely hard to spot similar locations with such small datapoints.
      Figure 2. A point in the Pacific Ocean smaller than 1?
  • There are a number of apparent data errors in the original visualization which again fails to communicate the location of Sucessful Space Flight Launches by being misleading/incorrect. In addition to the errors of scale detailed in the above point, data appears to be missing. For example, there have been a number of successful space launches by the company Rocket Lab from New Zealand which are not shown in the original visualization, in addition to this there was one successful launch into space from Alcântara, Brazil which is also missing from the visualization. This is disrespectful to Rocket Lab and the Brazilian Space Agency and is misleading to the general audience of reddit.

    • It also appears that there might been an error in classifying locations as there are circles which appear to be exactly in the middle of a bigger circle. This can be seen in launches centered around Florida, USA (Figure 3) and the South of Japan.
      Figure 3. Are these the same location?
    • If it is the case that these are indeed different locations, this only supports the above point about overlapping circles as a weakness of proportional circle maps.
  • The visualization is lacking in a clear objective. The title of the visualization is ‘Space Flights Launches since 2000’ however what is appears to be actually being shown is the location of sucessful space flight launches since 2000. There is a disconnect in what is being shown in the visualization and what is proposed by the title of the authors post. Also due to the inaccuracy of the scales and the appearance circles the appearing to overlap perfectly it is unclear what is defined as a location. Surely if two launches are from the same facility with different launchpads or the same geographical region these could be classed as the same place. This results in a failing of the trifecta checkup as it is unclear what the data and is trying to say and what is the question that is trying to be answered.

Reference

Code

The following code was used to fix the issues identified in the original.

library(tidyr)
library(dplyr)
library(readr)
library(knitr)
library(ggplot2)
library(forcats)
space <- read_csv("Space_Corrected.csv")

#Separate data data to obtain years
space <- space %>% separate(Datum, into = c("Date", "Dates"), sep = ",", remove = FALSE)
space <- space %>% separate(Dates, into = c("dummy", "Year", "Time", "Zone"), sep = " ", remove = TRUE)

#Separate location into different columns
space <- space %>% separate(Location, into = c("PadB", "Facility", "CountryB", "Country2"), sep = ",", remove = FALSE)

space$Year <- as.numeric(space$Year)

#Filter for datas after year 2000
spaceFilt <- space %>% filter(space$Year >= 2000)

#Refactor Facilities and rename to more succint locations
spaceFilt <- spaceFilt %>% mutate (Facility2  = case_when(
  PadB == "Xichang Satellite Launch Center" ~ " Xichan",
  PadB == "Svobodny Cosmodrome" ~ " Svobodny",
  PadB == "Taiyuan Satellite Launch Center" ~ " Taiyuan",
  PadB == "Tai Rui Barge" ~ "Yellow Sea",
  PadB == "Uchinoura Space Center" ~ "Uchinoura",
  PadB == "Jiuquan Satellite Launch Center" ~ " Jiuquan",
  CountryB == " New Zealand" ~ " Mahia, New Zealand",
  Country2 == " Brazil" ~ " Alcântara, Brazil",
  CountryB == " French Guiana" ~ " Guiana, French Guiana",
  CountryB == " Israel" ~ " Palmachim, Israel",
  Facility == " Shahrud Missile Test Site" ~ "Shahrud, Iran",
  Facility == " Semnan Space Center" ~ " Semnan, Iran",
  ))

#Refactor Countries and rename to more succint locations

spaceFilt <- spaceFilt %>% mutate (Country2  = case_when(
  Location == " Xichang Satellite Launch Center" ~ " China",
  Location == " Taiyuan Satellite Launch Center" ~ " China",
  Location == " Svobodny Cosmodrome" ~ " Russia",
  
  Facility == " China" ~ " China",
  Facility == " Japan" ~ " Japan",
  Facility == " Russia" ~ " Eastern Europe",
  Facility == " Shahrud Missile Test Site" ~ " Middle East",
  Facility == " Semnan Space Center" ~ " Middle East",
  Facility == " Yellow Sea" ~ " Offshore",
  Facility == " Ronald Reagan Ballistic Missile Defense Test Site" ~ " Pacific",
  
  Country2 == " USA" ~ " N.America",
  Country2 == " Brazil" ~ " S.America",
  
  CountryB == " China" ~ " China",
  CountryB == " Japan" ~ " Japan",
  CountryB == " Algeria" ~ " Algeria",
  CountryB == " Kazakhstan" ~ " Eastern Europe",
  CountryB == " New Zealand" ~ " Pacific",
  CountryB == " Russia" ~ " Eastern Europe",
  CountryB == " French Guiana" ~ " S.America",
  CountryB == " Iran" ~ " Middle East",
  CountryB == " India" ~ " India",
  CountryB == " Israel" ~ " Middle East",
  CountryB == " Australia" ~ " Australia",
  CountryB == " New Mexico" ~ " N.America",
  CountryB == " Kenya" ~ " Kenya",
  CountryB == " Gran Canaria" ~ " Gran Canaria",
  CountryB == " Pacific Missile Range Facility" ~ " Pacific",
  CountryB == " Barents Sea"  ~ " Offshore" ,
  CountryB == " Maranh?œo"  ~ " Maranh?œo" ,
  CountryB == " North Korea"  ~ " Korea" ,
  CountryB == " Pacific Ocean"  ~ " Pacific" ,
  CountryB == " South Korea"   ~ " Korea"  ,
  CountryB == " Texas"  ~ " USA" ,
  CountryB == " Virginia"  ~ " USA" ,
  CountryB == " California"  ~ " USA" ,
  CountryB == " Marshall Islands" ~ " Pacific"
  ))


spaceFilt <- spaceFilt %>% mutate( Facility3 = coalesce(spaceFilt$Facility2, spaceFilt$Facility)
  
)

# Further Refactoring for succintness
spaceFilt$Facility3[spaceFilt$Facility3 == ' Sohae Satellite Launching Station'] <- ' Sohae, North Korea'
spaceFilt$Facility3[spaceFilt$Facility3 == ' Tonghae Satellite Launching Ground'] <- ' Tonghae, North Korea'
spaceFilt$Facility3[spaceFilt$Facility3 == ' Naro Space Center'] <- 'Naro, South Korea'
spaceFilt$Facility3[spaceFilt$Facility3 == ' Barents Sea Launch Area'] <- 'Barents Sea'
spaceFilt$Facility3[spaceFilt$Facility3 == ' Kiritimati Launch Area'] <- 'Kiritimati, Kiribati'
spaceFilt$Facility3[spaceFilt$Facility3 == ' Kiritimati Launch Area'] <- 'Kiritimati, Kiribati'
spaceFilt$Facility3[spaceFilt$Facility3 == ' Kauai'] <- 'Kauai, Hawaii'
spaceFilt$Facility3[spaceFilt$Facility3 == " Ronald Reagan Ballistic Missile Defense Test Site"] <- " Marshall Islands"

spaceFilt$Facility3[spaceFilt$Facility3 == " M?\u0081hia Peninsula"] <- " Mahia, New Zealand"
spaceFilt$Facility3[spaceFilt$Country2 == " Brazil"] <- " Alcântara, Brazil"
spaceFilt$Facility2[spaceFilt$Country2 == " Brazil"] <- " Alcântara, Brazil"

spaceFilt$Facility3[spaceFilt$Facility == " Yasny Cosmodrome"] <- "Yasny, Russia"
spaceFilt$Facility3[spaceFilt$Facility == " Plesetsk Cosmodrome"] <- "Plesetsk, Russia"
spaceFilt$Facility3[spaceFilt$Facility3 == " Svobodny"] <- "Svobodny, Russia"
spaceFilt$Facility3[spaceFilt$Facility == " Vostochny Cosmodrome"] <- "Vostochny, Russia"
spaceFilt$Facility3[spaceFilt$Facility == " Baikonur Cosmodrome"] <- "Baikonur, Kasakhstan"

spaceFilt$Facility3[spaceFilt$Facility3 == " Taiyuan Satellite Launch Center"] <- "Taiyuan"
spaceFilt$Facility3[spaceFilt$Facility3 == " Wenchang Satellite Launch Center"] <- "Wenchang"
spaceFilt$Facility3[spaceFilt$Facility3 == " Xichang Satellite Launch Center"] <- "Xichang"
spaceFilt$Facility3[spaceFilt$Facility3 == " Jiuquan Satellite Launch Center"] <- "Jiuquan"

spaceFilt$Facility3[spaceFilt$Facility == " Tanegashima Space Center"] <- "Tanegashima"
spaceFilt$Facility3[spaceFilt$Facility == " Uchinoura Space Center"] <- "Uchinoura"

spaceFilt$Facility3[spaceFilt$Facility == " Satish Dhawan Space Centre"] <- "Satish Dhawan"

spaceFilt$Facility3[spaceFilt$Facility3 == " Cape Canaveral"] <- "Cape Canaveral"
spaceFilt$Facility3[spaceFilt$Facility3 == " Pacific Spaceport Complex"] <- "Pacific Spaceport"
spaceFilt$Facility3[spaceFilt$Facility3 == " Wallops Flight Facility"] <- "Wallops"
spaceFilt$Facility3[spaceFilt$Facility3 == " Mojave Air and Space Port"] <- "Mojave"

# Obtain classes for faceting
countryClasses <- spaceFilt %>% distinct(Facility3,.keep_all = TRUE)

# Get count of all launches by location
facilityCount <- as.data.frame(table(spaceFilt$Facility3))
facilityCount <- facilityCount %>% arrange(Freq)

facilityCount <- facilityCount %>% mutate(Facility3 = fct_reorder(Var1, desc(Freq)))


facilityCount <- left_join(facilityCount, countryClasses, by = 'Facility3' )

facilityCount <- facilityCount %>% mutate(Facility = fct_reorder(Var1, Freq))


facilityCount <- facilityCount %>% arrange(Freq)

# Factorise Classes for location and country for better faceting
countryCount <- as.data.frame(table(spaceFilt$Country2))
countryCount <- countryCount %>% arrange(Freq)

countryCount <- countryCount %>% mutate(Country = fct_reorder(Var1, desc(Freq)))

countryClasses <- (as.list(levels(countryCount$Country)))

facilityCount$Country2 <- factor(facilityCount$Country2, levels = countryClasses, ordered = TRUE)

facilityCount <- facilityCount %>% mutate(Country3 = as.factor(Country2))

t <-ggplot(data = facilityCount, aes(y =Facility, x=Freq))  + 
  facet_grid( ~Country2~.,switch = 'y', scales = "free", space = "free") + 
  theme() +
  geom_bar(stat = 'identity', width = 0.7, fill= 'skyblue4') + 
  theme(strip.placement = "outside", 
        axis.text.x = element_text(angle = 0, vjust = 0.5, hjust=1),
        strip.text.y.left = element_text(angle = 0, size = 10, colour = 'white'),
        panel.spacing = unit(0.5, "lines"),
        plot.title = element_text(hjust = 0.5, size = 15, face = 'bold'),
        axis.title.x=element_text(size=11,colour="black"),
        axis.title.y=element_text(size=13,colour="black"),
        panel.grid.major.x = element_line(colour = "grey50", linetype = "dashed"),
        panel.background = element_rect(fill = NA),
         # axis.text.y=element_text(size=11, colour="black"),
         # axis.text.x=element_text(size=11,colour="black")
        strip.text.y = element_text(
        size = 20, color = "Black", face = "bold"),
        strip.background = element_rect( color = 'white',fill='skyblue3', size=0.1),
        strip.text = element_text(vjust=0.95)
      ) + 
  coord_cartesian(xlim = c(10, max(facilityCount$Freq))) +
  xlab("\n Number of Sucessful Space Launches")+
  ylab("Location of Launchees")+
  ggtitle("Number of Sucessful Space Launches since 2000 from different Locations grouped by Region\n")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.