Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Canada Population Estimates (2016).


Objective

The data visualization of population distribution of Canada across 10 largest cities made by TC18271851 (2021) and shown at https://www.reddit.com/r/dataisbeautiful/comments/mzqt5l/oc_population_distribution_of_canada_across_10/ depicts population estimates and comparison according to 2016 census.

On one hand, the objective of this data visualisation provides detailed data on the population in Canada. On the other hand, the insights, such as population growth and population distribution, behind this data visualisation could provide necessary data in term of decision making for economic projections and immigrant policy in Canada.

Generally speaking, the target audience of this data visualisation is all citizens in Canada. Specifically, population growth and population distribution have an huge impact on economic development. It is assumed that all economic sectors are sensitive to this data visualisation.

The visualization chosen had the following three main issues:

  • Colour: using 10 different colours to depict population distribution of 10 largest cities, which may causes colour blindness when using red and green. For example, the colour of Montreal has little difference with the colour of Quebec City, as well as the colour of Calgary is pretty similar to the colour of Kitchener-Cambridge-Waterloo for a target audience with colour blindness. That may convey the wrong message of population distribution of these 10 largest cities in Canada.

  • Graph: pie chat can’t make accurate comparison as it uses area and angle, especially when proportions are same. In this case, the pie chart shows the percentage of population distribution of 10 largest cities in Canada. However, it is shown that Calgary and Edmonton, Winnipeg and Quebec City, Hamilton and Kitchener-Cambridge-Waterloo are sharing the same percentage, 5%, 3% and 2% respectively, which make target audience unable to distinguish the exact difference of population between each city.

  • Issues with data visualisation integrity: this data visualisation doesn’t indicate data source and notes. For instance, in the legend section, there are no any specific details about the “Non 10 largest Cities”, which may decrease the reliability of this data visualisation.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(dplyr)
library(readr)
population1 <- read_csv("Assignment2LoadingData.csv")
population2 <- population1[, c("REF_DATE", "GEO", "VALUE")]
population3 <- population2 %>% filter(GEO == "Toronto (CMA), Ontario" | GEO == "Montréal (CMA), Quebec" | GEO == "Vancouver (CMA), British Columbia" | GEO == "Calgary (CMA), Alberta" | GEO == "Edmonton (CMA), Alberta" | GEO == "Ottawa - Gatineau (CMA), Ontario/Quebec" | GEO == "Winnipeg (CMA), Manitoba" | GEO == "Québec (CMA), Quebec" | GEO == "Hamilton (CMA), Ontario" | GEO == "Kitchener - Cambridge - Waterloo (CMA), Ontario")
levels(population3$GEO)
## NULL
population3$GEO <- factor(population3$GEO, levels = 
                            c("Toronto (CMA), Ontario",
                              "Montréal (CMA), Quebec",
                              "Vancouver (CMA), British Columbia",
                              "Calgary (CMA), Alberta",
                              "Edmonton (CMA), Alberta",
                              "Ottawa - Gatineau (CMA), Ontario/Quebec",
                              "Winnipeg (CMA), Manitoba",
                              "Québec (CMA), Quebec",
                              "Hamilton (CMA), Ontario",
                              "Kitchener - Cambridge - Waterloo (CMA), Ontario"),
                          labels = c("Toronto",
                                     "Montréal",
                                     "Vancouver",
                                     "Calgary",
                                     "Edmonton",
                                     "Ottawa-Gatineau",
                                     "Winnipeg",
                                     "Québec City",
                                     "Hamilton",
                                     "Kitchener-Cambridge-Waterloo"))
population3$VALUE <- population3$VALUE/1000000
p1 <- ggplot(data = population3, aes(x = population3$REF_DATE, y = population3$VALUE, colour = population3$GEO))
p2 <- p1 + geom_smooth() + geom_point() +
  labs(x = "Year", y = "Population (million)", title = "Population Changes of Ten Largest Cities in Canada from 2016 to 2020", caption = "Source: Canada Population Estimates (2016) - https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710013501") +
  scale_colour_discrete(name = "")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.