The following data visualisation is presented at the end of an opinion article on cost of living increases in the online version of the The Sun newspaper in the United Kingdom. It was published on 3 January 2022.

Click the Original, Code and Reconstruction tabs to read about the issues with the data visualisation and how they have been addressed to create a new reconstructed version.

Original


Source: The Sun, ‘Ordinary folk are set to be £1200 worse off next year – Boris must level up by getting bills down now’, (Jan 2022).


Objective and audience

The target audience for the data visualisation is likely to be The Sun’s general readership. As Nelson (2021) notes, The Sun is the second most read newspaper in the UK and is a tabloid known for its conservative politics and sensational headlines. Kirk (2015) reported that The Sun was the most popular publication among ‘C2DE’ people - which is a market research demographic classification for skilled working class to non-working people (NRS 2022). The Sun’s ‘About us’ webpage states, “The Sun stands for ordinary working people looking to get on, building better lives for themselves and their families, regardless of where they grow up or which school they went to”.

More specifically, the intended audience for the visualisation is likely to be people who are on lower incomes and who will be more significantly impacted by cost of living increases and may be concerned about their finances.

The objective of the visualisation is explanatory in nature. It is being used to convey that the cost of living in the UK is increasing by highlighting how much selected products and materials are increasing in cost.

Critique

The visualisation has the following three main issues:

  • Issue 1 - The visualisation includes many items and a percentage associated with each item. However, it is not clear where the data comes from or what these percentages represent. The preamble text in the article suggests the chart may show consumer price increases (that customers are already paying) and producer price increases (that may affect what consumers pay in the future). This is confusing because it is unclear whether these figures represent prices rises that have occurred or may occur when items are bought in retail stores. If it is both, it is very difficult to know which item is associated with what type of increase. The ‘source’ information in the bottom right-hand corner of the visualisation does not provide clear information about the source of the data other than to indicate some comes from ONS (the Office for National Statistics) and some from trade organisations. Lack of clarity about the source of the data and what it represents may be deceiving and lead to questions about the integrity of the data and the visualisation overall.

  • Issue 2 - The layout and scales used for the elements of the visualisation are problematic. All the percentage figures are positive numbers – broadly indicating the cost of all items are increasing. However, different graph styles are used within the visualisation - some with bar graphs and some with bubbles – and some with just the numbers alone. It is very busy and there is no logical approach indicating what order to look at the different elements. When it is not always presented in order of size, i.e for the bubbles, the order is not alphabetical. The fact that the different elements do not appear to bear any relation to each other in terms of scale even more problematic. It is very difficult to quickly see what items cost is increasing the most. For example, the cooking oil bubble is larger than metals bubble even though the percentage figure is smaller.

  • Issue 3 - The titles and labeling could be better used to explain what the visualisation is displaying. There is a blank space in the top left corner that should have been used for a title and sub-title. The article contains text before it which provides some explanation (although as noted above this is somewhat confusing) but there is no heading and or explanatory text in the data visualisation itself. The text used as a caption underneath the visualisation stating, “The cost of living in the UK is set to rise even further” is unlikely to be read until after the audience has spent time reviewing the various elements.

References

Code

The following code was used to fix the issues identified in the original.

# loading the following packages to pre-process the data and create the visualisation
library(readxl)
library(dplyr)
library(forcats)
library(ggplot2)
# importing the data from local file. Original source is from this webpage - https://www.ons.gov.uk/economy/inflationandpriceindices/datasets/consumerpriceinflation/current - accessed on 1 April 2021.
# the specific url for the file download is https://www.ons.gov.uk/file?uri=/economy/inflationandpriceindices/datasets/consumerpriceinflation/current/previous/v82/consumerpriceinflationdetailedreferencetables.xls
cpi_data_v1 <- read_excel("data/consumerpriceinflationdetailedreferencetables_December2021.xls", sheet = 9, range = "C7:Z370")
head(cpi_data_v1)
## # A tibble: 6 x 24
##   ...1    ...2   Nov...3   ...4 Dec     ...6 Jan    ...8 Feb   ...10 Mar   ...12
##   <chr>   <chr>  <chr>    <dbl> <chr>  <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
## 1 CPIH (~ 1      0.59999~    NA 0.800~    NA 0.90~    NA 0.69~    NA 1        NA
## 2 01    ~ 0.699~ -0.5        NA -1.39~    NA -0.6~    NA -0.5~    NA -1.3~    NA
## 3 02    ~ 2.100~ 2           NA 3.5       NA 3.20~    NA 2.89~    NA 2.39~    NA
## 4 03    ~ -1.39~ -3.6000~    NA -1.7      NA -3.2~    NA -5.5~    NA -3.7~    NA
## 5 04    ~ 0.900~ 0.5         NA 0.599~    NA 0.59~    NA 0.80~    NA 0.80~    NA
## 6 05    ~ -      -0.2000~    NA -0.59~    NA 1        NA 0.80~    NA 1.5      NA
## # ... with 12 more variables: Apr <chr>, ...14 <dbl>, May <chr>, ...16 <dbl>,
## #   Jun <chr>, ...18 <dbl>, Jul <chr>, ...20 <dbl>, Aug <chr>, Sep <chr>,
## #   Oct <chr>, Nov...24 <chr>
# removing columns not required to generate the visualisation
cpi_data_v2 = select(cpi_data_v1, 1, 24)
colnames(cpi_data_v2) <- c('Item','Nov2021')
head(cpi_data_v2)
## # A tibble: 6 x 2
##   Item                                                   Nov2021           
##   <chr>                                                  <chr>             
## 1 CPIH (overall index)                                   4.5999999999999996
## 2 01    Food and non-alcoholic beverages                 2.5               
## 3 02    Alcoholic beverages and tobacco                  4.7999999999999998
## 4 03    Clothing and footwear                            3.5               
## 5 04    Housing, water, electricity, gas and other fuels 3.8999999999999999
## 6 05    Furniture, household equipment and maintenance   6.2000000000000002
#checking the attributes
str(cpi_data_v2)
## tibble [363 x 2] (S3: tbl_df/tbl/data.frame)
##  $ Item   : chr [1:363] "CPIH (overall index)" "01    Food and non-alcoholic beverages" "02    Alcoholic beverages and tobacco" "03    Clothing and footwear" ...
##  $ Nov2021: chr [1:363] "4.5999999999999996" "2.5" "4.7999999999999998" "3.5" ...
# converying data type for percentage figures
cpi_data_v2$Nov2021 <- as.numeric(cpi_data_v2$Nov2021)
str(cpi_data_v2)
## tibble [363 x 2] (S3: tbl_df/tbl/data.frame)
##  $ Item   : chr [1:363] "CPIH (overall index)" "01    Food and non-alcoholic beverages" "02    Alcoholic beverages and tobacco" "03    Clothing and footwear" ...
##  $ Nov2021: num [1:363] 4.6 2.5 4.8 3.5 3.9 6.2 1.5 12.5 1.3 3.3 ...
# creating new dataset by filtering the following row references identified from review of Excel file
cpi_data_v3 <- cpi_data_v2 %>% filter(row_number() %in% c(27,   39, 47, 48, 50, 59, 69, 74, 75, 82, 92, 97, 118,    131,    132,    142,    157,    158,    201,    211,    212,    222,    228,    242,    267,    274,    286,    293,    312,    323,    334))
str(cpi_data_v3)
## tibble [31 x 2] (S3: tbl_df/tbl/data.frame)
##  $ Item   : chr [1:31] "01.1.2 Meat" "01.1.4 Milk, cheese and eggs" "01.1.5.1 Butter" "01.1.5.2 Margarine and other vegetable fats" ...
##  $ Nov2021: num [1:31] 1.6 3.4 8.9 14.5 4.5 7.3 5.1 4.5 4.1 2.7 ...
# updating Item values to plain English
cpi_data_v3$Item [cpi_data_v3$Item == "01.1.2 Meat"] <- "Meat"
cpi_data_v3$Item [cpi_data_v3$Item == "01.1.4 Milk, cheese and eggs"] <- "Milk, cheese and eggs"
cpi_data_v3$Item [cpi_data_v3$Item == "01.1.5.1 Butter"] <- "Butter"
cpi_data_v3$Item [cpi_data_v3$Item == "01.1.5.2 Margarine and other vegetable fats"] <- "Margarine"
cpi_data_v3$Item [cpi_data_v3$Item == "01.1.6 Fruit"] <- "Fruit"
cpi_data_v3$Item [cpi_data_v3$Item == "01.1.7.5 Crisps"] <- "Crisps"
cpi_data_v3$Item [cpi_data_v3$Item == "01.1.9.4 Ready-made meals"] <- "Ready-made meals"
cpi_data_v3$Item [cpi_data_v3$Item == "01.2.1.1 Coffee"] <- "Coffee"
cpi_data_v3$Item [cpi_data_v3$Item == "01.2.1.2 Tea"] <- "Tea"
cpi_data_v3$Item [cpi_data_v3$Item == "02.1 Alcoholic beverages"] <- "Alcoholic drinks"
cpi_data_v3$Item [cpi_data_v3$Item == "02.2 Tobacco"] <- "Tobacco and cigarettes"
cpi_data_v3$Item [cpi_data_v3$Item == "03.1 Clothing"] <- "Clothing"
cpi_data_v3$Item [cpi_data_v3$Item == "04.3 Regular maintenance and repair of the dwelling"] <- "House repairs"
cpi_data_v3$Item [cpi_data_v3$Item == "04.5.1 Electricity"] <- "Electricity"
cpi_data_v3$Item [cpi_data_v3$Item == "04.5.2 Gas"] <- "Gas"
cpi_data_v3$Item [cpi_data_v3$Item == "05.1.1.1 Household furniture"] <- "Furniture"
cpi_data_v3$Item [cpi_data_v3$Item == "05.3.1.1 Refrigerators, freezers and fridge-freezers"] <- "Fridges and freezers"
cpi_data_v3$Item [cpi_data_v3$Item == "05.3.1.2 Clothes washing machines, clothes drying machines and dish washing machines"] <- "Washing machines"
cpi_data_v3$Item [cpi_data_v3$Item == "07.1.1B Second-hand cars"] <- "Used cars"
cpi_data_v3$Item [cpi_data_v3$Item == "07.2.2.1 Diesel"] <- "Diesel"
cpi_data_v3$Item [cpi_data_v3$Item == "07.2.2.2 Petrol"] <- "Petrol"
cpi_data_v3$Item [cpi_data_v3$Item == "07.3.1.1 Passenger transport by train"] <- "Train travel"
cpi_data_v3$Item [cpi_data_v3$Item == "07.3.3 Passenger transport by air"] <- "Air travel"
cpi_data_v3$Item [cpi_data_v3$Item == "09.1.1 Reception and reproduction of sound and pictures"] <- "TVs"
cpi_data_v3$Item [cpi_data_v3$Item == "09.3.1 Games, toys and hobbies"] <- "Games and toys"
cpi_data_v3$Item [cpi_data_v3$Item == "09.3.3.1 Garden products"] <- "Garden products"
cpi_data_v3$Item [cpi_data_v3$Item == "09.4.2.1 Cinemas, theatres, concerts"] <- "Cinemas, theatres & concerts"
cpi_data_v3$Item [cpi_data_v3$Item == "09.5.1 Books"] <- "Books"
cpi_data_v3$Item [cpi_data_v3$Item == "11.1.1 Restaurants & cafes"] <- "Restaurants & cafes"
cpi_data_v3$Item [cpi_data_v3$Item == "12.1.1 Hairdressing and personal grooming establishments"] <- "Hairdressing"
cpi_data_v3$Item [cpi_data_v3$Item == "12.3.1.1 Jewellery"] <- "Jewellery"
# converting Item data type to factor
cpi_data_v3$Item <- as.factor(cpi_data_v3$Item)
str(cpi_data_v3)
## tibble [31 x 2] (S3: tbl_df/tbl/data.frame)
##  $ Item   : Factor w/ 31 levels "Air travel","Alcoholic drinks",..: 21 22 4 20 12 8 24 7 26 2 ...
##  $ Nov2021: num [1:31] 1.6 3.4 8.9 14.5 4.5 7.3 5.1 4.5 4.1 2.7 ...
# adding a new variable to categories the Item values
cpi_data_v3$Category <- with(cpi_data_v3, 
                         ifelse(Item == 'Meat', 'Food and drink',
                         ifelse(Item == 'Milk, cheese and eggs', 'Food and drink',
                         ifelse(Item == 'Butter', 'Food and drink',
                         ifelse(Item == 'Margarine', 'Food and drink',
                         ifelse(Item == 'Fruit', 'Food and drink',
                         ifelse(Item == 'Crisps', 'Food and drink',
                         ifelse(Item == 'Ready-made meals', 'Food and drink',
                         ifelse(Item == 'Coffee', 'Food and drink',
                         ifelse(Item == 'Tea', 'Food and drink',
                         ifelse(Item == 'Alcoholic drinks', 'Food and drink',
                         ifelse(Item == 'Tobacco and cigarettes', 'Food and drink',
                         ifelse(Item == 'Clothing', 'Shopping',
                         ifelse(Item == 'House repairs', 'Home',
                         ifelse(Item == 'Electricity', 'Home',
                         ifelse(Item == 'Gas', 'Home',
                         ifelse(Item == 'Furniture', 'Home',
                         ifelse(Item == 'Fridges and freezers', 'Home',
                         ifelse(Item == 'Washing machines', 'Home',
                         ifelse(Item == 'Used cars', 'Travel',
                         ifelse(Item == 'Diesel', 'Travel',
                         ifelse(Item == 'Petrol', 'Travel',
                         ifelse(Item == 'Train travel', 'Travel',
                         ifelse(Item == 'Air travel', 'Travel',
                         ifelse(Item == 'TVs', 'Shopping',
                         ifelse(Item == 'Games and toys', 'Shopping',
                         ifelse(Item == 'Garden products', 'Home',
                         ifelse(Item == 'Cinemas, theatres & concerts', 'Entertainment',
                         ifelse(Item == 'Books', 'Shopping',
                         ifelse(Item == 'Restaurants & cafes', 'Entertainment',
                         ifelse(Item == 'Hairdressing', 'Shopping',
                         ifelse(Item == 'Jewellery', 'Shopping',
                                'ERROR'))))))))))))))))))))))))))))))))
# converting Category variable to factor
cpi_data_v3$Category <- as.factor(cpi_data_v3$Category)
# plotting the visualisation
Newplot <- cpi_data_v3 %>% 
  mutate(
    Category = fct_relevel(Category, "Travel", "Home", "Food and drink", "Entertainment", "Shopping"),
    Item = fct_reorder(Item, Nov2021)
  ) %>% 
  ggplot(aes(x = Item, y = Nov2021, fill = Category)) +
  geom_col(alpha = 0.75, width = 0.7, position = "dodge") +
  scale_fill_brewer(name = "", palette = "Dark2") +
  scale_y_continuous(expand = c(0, 0.1)) +
  coord_flip() +
  facet_grid(rows = vars(Category), scales = "free_y", switch = "y", space = "free_y") +
  labs(
    title = "Prices have risen in many areas of daily life",
    subtitle = "Percentage increases to cost of living for November 2021 (vs November 2020)",
    caption = "Source: Office of National Statistics (ONS) data for consumer price inflation for November 2021",
  ) +
  theme_minimal(base_family = "Helvetica") +
  theme(
    plot.margin = margin(0.3, 0.3, 0.3, 0.3, unit = "cm"),
    plot.title = element_text(size = 16, face = "bold", hjust=0),
    plot.subtitle = element_text(size = 14, hjust=0),
    strip.text.y = element_blank(),
    strip.placement = "outside",
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.title.y = element_blank(),
    axis.text = element_text(size = 12),
    legend.position = "top",
    legend.text=element_text(size=12),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
    ) +
  geom_text(aes(label = paste(Nov2021, "%")),
            hjust = 1.2, 
            size = 3, 
            colour = "#FFFFFF", 
            fontface = "bold", 
            family = "Helvetica")

Data Reference

Reconstruction

The following new data visualisation addresses the main issues in the original data visualisation.

The following actions were taken to address the previously identified issues:

  • The data for the visualisation has all been sourced from the UK’s Office for National Statistics website and is limited to Consumer Price Inflation data for November 2021. It details only price increases that have occurred.

  • The visualisation has been simplified and uses one consistent format, style and scale.

  • The titles and labels have been added to clarify what the visualisation displays.