Original


Source: Main source of Sexual Education in the United Kingdom. Stadafa.com. (2021)


Objective

The original data visualisation intends to demonstrate the sources of sexual education in males and females for the 16-24 age group in the United Kingdom. The interview was conducted between 2010 and 2012 by the National Survey of Sexual Attitudes and LifeStyle. The targeted audience is researchers who study human sexual health and diseases linked with it.

The visualisation had the following three main issues:

  • Visual Bombardment: It’s hard to comprehend the sources of sexual education between male and female. While there are several categories, it’s complex for an audience to understand what sources were common or dominant between the two genders. The graph itself is a poor choice to represent percentage and doesn’t connect to a story.

  • Data Quality: The percentage of each category reported in this graph isn’t accurate. Although there might be a slight difference in the values, the results can be misleading to the audience.

  • Poor aesthetics: The graph does not follow the grammar of data visualisation. In particular, the scale on the y-axis does not match the data label(24% appear much higher compared to the scale), and the category labels are not on the same level. The colours used in the visuals are distinct but the colour choice distracts the audience from the story.

Reference

Code

The following code was used to fix the issues identified in the original.

library(readr)
library(dplyr)
library(plyr)
library(ggplot2)
#read survey data 
natsal_survey <- read_csv('data_natsal.csv')

#filter data only with 16-24 age group
age_grp = natsal_survey %>% filter(agrp == 1.0)

#select variables for the data visulalisation
filtered_df = age_grp[c(15,145)]

#remove unwanted values (N/A)
clean_df = filtered_df %>% filter(lernmost != 99.0 & lernmost != -1)

#Replace values with actual labels
clean_df$lernmost <- mapvalues(clean_df$lernmost, from=c(1.0, 2.0, 3.0,4.0,
                                                   5.0,6.0,7.0,8.0,9.0,10.0,11.0,
                                                   12.0,13.0,14.0,16.0,17.0,18.0,19.0)
                            , to=c("Doctor/nurse/clinic", "Pornography", "Lessons at school","Internet(excl pornography)",
                                "Media","Friends of about own age","Media","Mother","Father","1st sexual partner","Pornography","Pornography",
                                "Siblings","Other","Other","Other","Other","Other"))


#replace values with labels
clean_df$rsex <- mapvalues(clean_df$rsex,from = c(1,2),to = c("Male","Female"))

#group sex and education category and summarise

grouped <- clean_df %>% group_by(rsex,lernmost) %>% dplyr::summarise(count = n())

#add percentage into grouped data

df <- group_by(grouped,rsex ) %>% transmute(lernmost, percent = count/sum(count)*100) %>% arrange(desc(percent))

#plot facet barchart

p <- ggplot(df,aes(x=reorder(lernmost,percent), y=percent,width = 0.8,fill=rsex)) + ggtitle("How youths of United Kingdom are educated about Sex?") +
    labs(subtitle="Data collected from 16-24 years old male and female", caption ="\n\n Data source: National Survey of Sexual Attitudes and Lifestyles, Natsal-3, 2010 & 2012") +
    xlab("Main Source of Sex Education") + ylab("Percentage (%)") +labs(fill='Gender') +
    geom_bar(position="dodge", stat="identity")+ scale_fill_brewer(palette = "Dark2") + coord_flip()+  facet_wrap(~rsex)+ geom_text( aes( label = paste0(round(percent,0)), y = percent),vjust = 0.2,hjust = 1.4, size = 3.5, color = "white" )

Data Reference

  • University College London, Centre for Sexual Health and HIV Research, Johnson, A. (2018). National Survey of Sexual Attitudes and Lifestyles, 2010-2012. [data collection]. 2nd Edition. UK Data Service. SN: 7799, http://doi.org/10.5255/UKDA-SN-7799-2 [Retrieved on: 26 April 2021]

Reconstruction

The following plot fixes the main issues in the original.