Pictorial Representation of Data

Author

Dr Andrew Dalby

Background

It is often said that you can tell lies with statistics. The reality is that there are just alternative visions, versions and narratives that can be used to present data. Even when statistics is taught decisions are made by those that set the curriculum and the assessment as to which version of statistics is going to be the correct one for the examinations.

What I wanted to do here is cover some of the techniques used for pictorially representing data in the UK A-level curriculum of the 1980s. That was when I studied for my A-levels. I also wanted to reflect on these techniques after nearly 20 years teaching data handling at undergraduate and post-graduate level to biologists and medical students.

These examples are taken from Letts Study Aids - A-level Mathematics Course Companion by Duncan Graham, Christine Graham and Allan Whitcombe, Charles Letts 1984

The first set of data I am going to create is for a man’s annual expenditure.

Item <- c("House","Food", "Fuel", "Travel", "Others")
Expenditure <- c(3000,2000,800,800,1100)
annual <- data.frame(Item,Expenditure)
annual
    Item Expenditure
1  House        3000
2   Food        2000
3   Fuel         800
4 Travel         800
5 Others        1100

This can be plotted in several different ways. The first two that I will consider are bar charts and pie charts.

First the bar chart in both the vertical and horizontal alignments.

barplot(Expenditure, names.arg = Item, main="A man's annual expenditure")

barplot(Expenditure, names.arg = Item, main="A man's annual expenditure", horiz=TRUE)

If you are an Excel user then the vertical version is called a column chart and the horizontal version a bar chart. In most other software they are both called bar charts. I prefer the vertical orientation as scanning left to right seems more natural than scanning up and down but it is personal preference.

More controversial is the pie chart. You will sometimes see people argue that they should never be used and they do have limitations, especially when there are a large number of categories. They are hard to quantify but for simple graphical impact they are sometimes useful.

pie(Expenditure, labels=Item)

Finally there are plots that use pictograms to represent data. These are commonly used in Infographics. In reality they are based on waffle diagrams.

These are worse than pie charts for quantification and they are only a visual aid. They should not be used for presenting technical data where decisions are going to be made.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(waffle)
Loading required package: ggplot2
tibble(
  Item = factor(
    c("House","Food", "Fuel", "Travel", "Others"),
    levels=c("House","Food", "Fuel", "Travel", "Others")
  ),
  Expenditure = c(3000,2000,800,800,1100)/100
) -> annual

waffle(
  annual
)

As a waffle plot it is not too bad. This is a square version of the pie chart and you can count the number of squares to find the area of each type. In this case the base unit is £100.

However a pictogram version is much harder to read.

library(echarts4r)
library(echarts4r.assets)
annual %>% 
  e_charts(Item) %>% 
  e_pictorial(Expenditure, symbol = ea_icons("tag"), 
              symbolRepeat = TRUE, z = -1,
              symbolSize = c(20, 20)) %>% 
  e_theme("westeros") %>%
  e_title("A Man's Annual Expenditure") %>% 
  e_flip_coords() %>%
  # Hide Legend
  e_legend(show = TRUE) %>%
  # Remove Gridlines
  e_x_axis(splitLine=list(show = FALSE)) %>%
  e_y_axis(splitLine=list(show = FALSE))

These are not plots that you want to use for anything serious or important.

From the syllabus they wanted these small data sets to be plotted as pie charts rather than bar charts. The reason for this was this gives them an extra step to examine as you have to work out the angles for each of the segments by calculating the total and they dividing it across 360 degrees. This is much more about creating something that you can assess, rather than actually using data presentation methods appropriately.

This is often an issue in education. We have things that we can measure but we often cannot measure the things that are important. Critical thinking is a prime example especially the use of IQ tests. They test your ability to create logical sequences, but is that intelligence? You are just as likely to be conned by an internet scam or by someone selling you something that you don’t want. Does that help you work out when a politician is lying to you, or what is the best way to vote in an election? The real world is a lot more complex than that.

Here are some more examples represented as bar charts and pie charts.

UK Government Expenditure on Education

Sector <- c("Primary","Secondary","Special","Adult","Teacher Training","Universities")
Spending <- c(525,608,46,260,57,258)
Education <- data.frame(Sector,Spending)
barplot(Spending, names.arg = Sector, main="UK Spending on Education in the 1980s")

pie(Spending, labels=Sector, main="UK Spending on Education in the 1980s")

Preventable Deaths

library(RColorBrewer)
Cause <- c("Smoking","Traffic Accidents","Alcohol","Falls","Other")
Deaths <- c(100000,6000,5000,4000,2000)
Preventable <- data.frame(Cause,Deaths)
barplot(Deaths, names.arg = Cause, main="Preventable Deaths in Wales in 1975")

pal1 <- brewer.pal(5, "BuPu")
pie(Deaths, labels=Cause, col=pal1, main ="Preventable Deaths in Wales in 1975")