Description of this Data Visualisation

This page attempts to lay out the challenges faced and their possible solutions to visualising the demographic structure of Singapore population by age group and planning area in 2019.

The data used can be found in Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2019 data series.

It is available digitally at https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data

Three major data and design challenges faced

  1. There was challenge in data wrangling. On top of knowing which code to use for each data manipulation steps, we also need to understanding how the data table should look like to construct the visualisation we want. For example, in this DataViz Makeover context, should “Males” and “Females” be identified under a single column “Sex” or should we separate them into two different columns.

  2. The next challenge I faced was the need to construct your visualisation programmatically using codes in R instead of interactively like Tableau’s Show Me. For example, in Tableau, there is a Show Me function that gives me suggestions on the different visualisation charts I can use for a given dataset with just a click. This allows me to explore the different visualisations to see if there are any better alternatives to my initial visualisation. However, in R, every change in visualisation chart type, eg. Bar chart to Line graph, needs a new code.

  3. Another challenge I faced is the restrictions to the size of the visualisation. For example, in this DataViz Makeover, I find the Age Group labels for the y-axis too cluttered but am unable to expand the size except to change the number of rows and columns.

Ways to overcome the challenges

  1. For data wrangling, one way will be to first think about the visualisation that you want to create. Thereafter, work backwards and identify all the data required and think through how the codes will be processing the data.

  2. To have a better idea of which graph or charts to use, we first need to understand what is the information we want to highlight or show. Then go through the tidyverse, ggplot guide to see the full list of charts they can do and work towards your final visualisation. In this case, you do not have to keep changing codes.

  3. To overcome the restriction of visualisation output size, although it might be programmatically possible to change, the final size is ultimately trial and error and re-run of code multiple times to see how the change looks like. In Tableau, we can adjust interactively by dragging the corners of the worksheets in dashboards. One possible work around might be to remove unneccesary graph that are empty or has small data count.

Sketch

“DataViz Makeover 8”

“DataViz Makeover 8”

Step-by-Step Guide

Installing and Launching R Packages

This code chunk install the basic tidyverse packages and load them onto RStudio environment.

packages = c('tidyverse')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing Data

The data set has been downloaded and saved in the data sub-folder of this DataViz Makeover folder. Its filename is Demographic_2011to2019.csv and is in csv file format. We read the cv file using read_csv.

demographic_data <- read_csv("data/Demographic_2011to2019.csv")
## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

Data Wrangling

The following code will do the following: 1. Convert the Male population to negative value to invert the axis for the pyramid diagram later. 2. Divide the population count by 1000 as the x-axis looked too cluttered with long numbers like 100,000 and 200,000 etc

demographic_data$new_Pop <- with(demographic_data, ifelse(demographic_data$Sex == "Males", -Pop/1000, Pop/1000))

Creating the Age-Sex Pyramid

Now, we create the Age-Sex Diagram for Year 2019. We use ggplot and bar chart type to construct it.

ggplot(demographic_data, aes(x = AG, y = new_Pop, fill = Sex)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  labs(y = "Population ('000)", x = "Age Group", title = "Population of Singapore Residents by Age Group in 2019") +
  scale_fill_manual(values = c("pink", "blue")) +
  scale_y_continuous(labels = abs)

Rectifying the incorrect order of Age Group in Y-axis

As we can see from the age-sex diagram above, the Y-axis is not sorted according to age from Young to Old. “5_to_9” was placed in between 50_to_54 and 45_to_49.

To solve this problem, we create a new column named “new_AG” and recode “5_to_9” to “05_to_09” using if-else condition.

demographic_data$new_AG <- with(demographic_data, ifelse(demographic_data$AG == "5_to_9", "05_to_09", AG))

And now the Y-axis is fixed

ggplot(demographic_data, aes(x = new_AG, y = new_Pop, fill = Sex)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  labs(y = "Population ('000)", x = "Age Group", title = "Population of Singapore Residents by Age Group in 2019") +
  scale_fill_manual(values = c("pink", "blue")) +
  scale_y_continuous(labels = abs)

Visualising by Planning Area using facet_wrap in ggplot2

Using the same code, we can use facet_wrap in ggplot2 to segregate the disgram into different Planning Areas.

f <- ggplot(demographic_data, aes(x = new_AG, y = new_Pop, fill = Sex)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  labs(y = "Population ('000)", x = "Age Group", title = "Population of Singapore Residents by Age Group in 2019") +
  scale_fill_manual(values = c("pink", "blue")) +
  scale_y_continuous(labels = abs)

f + facet_wrap(~PA)

Remove empty or small population count plots

The above facet wrap has a total of 71 charts. Each chart is too small and has a lot of meaningless empty plots. Some also has too little population count to make up a meaningful visualisation.

As such, we will filter away empty or planning areas with population count of less than 500.

#filter Planning Areas with less than 500 people
demographic_filter <- select(demographic_data, one_of(c("PA", "Sex", "new_AG", "new_Pop", "Time"))) %>%
  filter(Time == 2019) %>%
  group_by(PA) %>% 
  filter(sum(new_Pop) > 0.5)

The Final Visualisation

p <- ggplot(demographic_filter, aes(x = new_AG, y = new_Pop, fill = Sex)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  labs(y = "Population ('000)", x = "Age Group", title = "Population of Singapore Residents by Age Group in 2019") +
  scale_fill_manual(values = c("pink", "blue")) +
  scale_y_continuous(labels = abs)

p + facet_wrap(~PA)

This visualisation is a representation of the demographic structure of Singapore residents in the various Planning Area in 2019. Each age-sex pyramid shows the distribution of males and females in each of the 26 areas. The Age Group on the Y-axis is sorted from Youngest at the bottom to Oldest at the top.

Three useful observations

  1. Bedok, Choa Chu Kang, Hougang, Sengkang, Tampines has more residents than the other planning areas while Tanglin, River Valley, Novena and Marine Parade has fewer residents.

  2. Ang Mo Kio, Bedok, Bukit Batok, Bukit Merah, Toa Payoh are some of the matured estates as the top half of the pyramids are fatter than the bottom half of the pyramid.

  3. Sengkang and Punggol are the two areas with more young children as compared to the elderly.

  4. There is not much difference between the number of males and females in each planning area.

Reflection highlight of at least three major advantages of building the data visualisation in R as compare to using Tableau

  1. The visualisations are easy to replicate, tweak or customise once the initial code for visualisation is done.

  2. To do facet wrap in Tableau involved some complicated steps and creativity while in R we can do it with a single line of code.

  3. In R, you are in control of the data wrangling, calculations and aggregates. Unlike in Tableau, they have a aggregate and level of details expressions hierachy to follow.