1. Overview

The purpose of this data visualisation to reveal the demographic structure of the Singapore population by age cohort and by planning area in 2019. Data (Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2020) is selected from Department Of Statistics Singapore. The data contains the following columns: Planning Areas, Subzones, Age Group, Sex, Type of Dwelling, Population and Year

2. Major Data and Design Challenges

The data consists of records from 2011 to 2020. Therefore, it is hard to use all the data in the CSV file as we only want to focus on 2019 data. As the objective of the visualisation to reveal the demographic structure of the Singapore population by age cohort and by planning area, columns are such as sex, type of dwelling may not be useful to the visualisation.

There are a total of 19 age groups starting from 0 - 4 years old to 90 years old and above. By seeing the age group alone, it is hard to get clear insights from the age group alone or from data visualisations which produced from those age groups. The data is desaturated by the age groups that make it hard for viewers to compare values between categories.

It is also a challenging task to design a visualisation that capable of taking in 40 over planning areas and 19 age groups and ensuring useful insights are produced at the same time. Plots such as bar plot or pie chart may not give clear visualisations which provide clear insights.

3. Solutions to overcome the Challenges

Data preparation is required to produce useful and detail insights. The data is filtered to 2019 data and a new column - age group category (AGCategory) is created to generalise the age groups. To ensure viewer can compare data easily, the age groups are generalised to 4 groups: children (0 - 14 years old), Young Adult (15 - 29 years old), Adult (30 - 59 years old) and elderly (60 and above). The data then is grouped by planning area and age group category. Population are summarised according to the grouped data. After the actions above carried out, the data is now only consist of planning areas, age group category and population which means columns such as sex and type of dwelling are processed to fit the new data structure.

To provide enough, clear and detail insights to the viewer, a combinition of heatmap and barcharts is used like the sketch below:

sketch

4. Step-by-step Data Visualization Guide

packages = c('ggplot2','tidyverse','gridExtra','grid')

for(p in packages){
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

data <- read_csv("data/data.csv")

data<-data%>%
    filter(YEAR==2019)

data$AGCategory <- ifelse(data$AG %in% c('0_to_4','5_to_9','10_to_14'), "Children", 
                    ifelse(data$AG %in% c('15_to_19','20_to_24','25_to_29'), "Young Adult",
                     ifelse(data$AG %in% c('30_to_34', '35_to_39','40_to_44','45_to_49',"50_to_54","55_to_59"), "Adult",  "Elderly")))


data$AGCategory <- factor(data$AGCategory,levels = c("Children","Young Adult","Adult","Elderly"))

data<-data%>%
    group_by(PA,AGCategory) %>%
  summarise(POP = sum(POP))


addUnits <- function(n) {
  labels <- ifelse(n < 1000, n,  # less than thousands
                   ifelse(n < 1e6, paste0(round(n/1e3,3), 'K'),  # in thousands
                          ifelse(n < 1e9, paste0(round(n/1e6,3), 'M'),  # in millions
                                 ifelse(n < 1e12, paste0(round(n/1e9,3), 'B'), # in billions
                                        ifelse(n < 1e15, paste0(round(n/1e12,3), 'T'), # in trillions
                                               'too big!'
                                        )))))
  return(labels)
}



hm<-ggplot(data, aes(AGCategory,PA))+ 
    geom_tile(aes(fill= POP)) +
    geom_text(aes(label = addUnits(POP)))+
    scale_fill_gradient(low="#19547b", high="#ffd89b",name = "Population", labels = addUnits)+
     theme(panel.background = element_blank(),
          plot.background = element_blank(),
          legend.position = "bottom", 
          legend.direction = "horizontal",
          legend.title = element_text(size = 15), 
          legend.key.size = unit(1,"cm"),
          legend.text = element_text(size = 7)) 

tmp <- ggplot_gtable(ggplot_build(hm))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]

hm.clean <- hm +
    theme(axis.title.y = element_blank(), 
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank(), 
          axis.title.x = element_blank(),
          axis.text.x = element_blank(), 
          axis.ticks.x = element_blank(),
          legend.position="none")

bp.x <- ggplot(data, aes(x = factor(AGCategory), y = POP,label = POP)) + 
    geom_bar(stat = "identity") + 
    geom_text(aes(label = addUnits(stat(y)), group = AGCategory),stat = 'summary', fun = sum, vjust = -0.5, size=3)+
    theme_gray() +
    scale_x_discrete(position = "top") +
    theme(
          panel.background = element_blank(),
          plot.background = element_blank(),
          axis.title.y = element_blank(), 
          axis.text.y = element_blank(), 
          axis.ticks.y = element_blank(), 
          axis.text.x = element_text(size = 10), 
          axis.title.x = element_text(size = 15),
          axis.text.x.top = element_text(margin = margin( t = 10)),
          legend.position = "none") +
          labs(x = "Age Group Category")

bp.y <- ggplot(data, aes(x = PA, y = POP)) + 
        geom_bar(stat = "identity") + 
        geom_text(aes(label = addUnits(stat(y)), group = PA),stat = 'summary', fun = sum, hjust = -0.2 , size=3)+
        expand_limits(y = 350000) + 
        coord_flip() + 
        theme_gray() +
        scale_x_discrete(position = "top") +
        theme(
              panel.background = element_blank(),
              plot.background = element_blank(),
              axis.title.x = element_blank(), 
              axis.text.x = element_blank(),
              axis.ticks.x = element_blank(), 
              axis.text.y = element_text(size = 10), 
              axis.title.y = element_text(size = 15),
              axis.text.y.left = element_text(margin = margin(20)),
              legend.position="none") +
              labs(x = "Planning Area")


grob.title <- textGrob(expression(bold(underline("Demographic Structure Of Singapore Population by Age Group Category by Planning Area"))),gp = gpar(fontsize = 15))

grob.caption <- textGrob("Data source: www.singstat.gov.sg",gp = gpar(fontsize = 10))

grid.arrange(bp.x, legend, hm.clean, bp.y, nrow = 2, ncol = 2, widths = c(40, 40), heights = c(10, 30),top = grob.title, bottom = grob.caption)

4.1 Load packages

packages = c('ggplot2','tidyverse','gridExtra','grid')

for(p in packages){
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

4.2 Load data file

data <- read_csv("data/data.csv")

4.3 Filter 2019 data

data<-data%>%
    filter(YEAR==2019)

4.4 Create new column AGCategory to generalise the age group

data$AGCategory <- ifelse(data$AG %in% c('0_to_4','5_to_9','10_to_14'), "Children", 
                    ifelse(data$AG %in% c('15_to_19','20_to_24','25_to_29'), "Young Adult",
                     ifelse(data$AG %in% c('30_to_34', '35_to_39','40_to_44','45_to_49',"50_to_54","55_to_59"), "Adult",  "Elderly")))

4.5 Rearrange the order for AGCategory for barplot

data$AGCategory <- factor(data$AGCategory,levels = c("Children","Young Adult","Adult","Elderly"))

4.6 Group the column value and summarise the population

data<-data%>%
    group_by(PA,AGCategory) %>%
  summarise(POP = sum(POP))

4.7 Create function to add units

To make the visualisation more readable, instead of having many zeros, a new function is created to add units.

addUnits <- function(n) {
  labels <- ifelse(n < 1000, n,  # less than thousands
                   ifelse(n < 1e6, paste0(round(n/1e3,3), 'K'),  # in thousands
                          ifelse(n < 1e9, paste0(round(n/1e6,3), 'M'),  # in millions
                                 ifelse(n < 1e12, paste0(round(n/1e9,3), 'B'), # in billions
                                        ifelse(n < 1e15, paste0(round(n/1e12,3), 'T'), # in trillions
                                               'too big!'
                                        )))))
  return(labels)
}

4.8 Create heatmap

hm<-ggplot(data, aes(AGCategory,PA))+ 
    geom_tile(aes(fill= POP)) +
    geom_text(aes(label = addUnits(POP)))+
    scale_fill_gradient(low="#19547b", high="#ffd89b",name = "Population", labels = addUnits)+
     theme(panel.background = element_blank(),
          plot.background = element_blank(),
          legend.position = "bottom", 
          legend.direction = "horizontal",
          legend.title = element_text(size = 15), 
          legend.key.size = unit(1,"cm"),
          legend.text = element_text(size = 7)) 

4.9 Substract legend for heatmap for later use

tmp <- ggplot_gtable(ggplot_build(hm))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]

4.10 Substract axis and axis title from the heatmap

hm.clean <- hm +
    theme(axis.title.y = element_blank(), 
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank(), 
          axis.title.x = element_blank(),
          axis.text.x = element_blank(), 
          axis.ticks.x = element_blank(),
          legend.position="none")

4.11 Create bar plot to show population by age group category

bp.x <- ggplot(data, aes(x = factor(AGCategory), y = POP,label = POP)) + 
    geom_bar(stat = "identity") + 
    geom_text(aes(label = addUnits(stat(y)), group = AGCategory),stat = 'summary', fun = sum, vjust = -0.5, size=3)+
    theme_gray() +
    scale_x_discrete(position = "top") +
    theme(
          panel.background = element_blank(),
          plot.background = element_blank(),
          axis.title.y = element_blank(), 
          axis.text.y = element_blank(), 
          axis.ticks.y = element_blank(), 
          axis.text.x = element_text(size = 10), 
          axis.title.x = element_text(size = 15),
          axis.text.x.top = element_text(margin = margin( t = 10)),
          legend.position = "none") +
          labs(x = "Age Group Category")

4.12 Create bar plot to show population by planning area

bp.y <- ggplot(data, aes(x = PA, y = POP)) + 
        geom_bar(stat = "identity") + 
        geom_text(aes(label = addUnits(stat(y)), group = PA),stat = 'summary', fun = sum, hjust = -0.2 , size=3)+
        expand_limits(y = 350000) + 
        coord_flip() + 
        theme_gray() +
        scale_x_discrete(position = "top") +
        theme(
              panel.background = element_blank(),
              plot.background = element_blank(),
              axis.title.x = element_blank(), 
              axis.text.x = element_blank(),
              axis.ticks.x = element_blank(), 
              axis.text.y = element_text(size = 10), 
              axis.title.y = element_text(size = 15),
              axis.text.y.left = element_text(margin = margin(20)),
              legend.position="none") +
              labs(x = "Planning Area")

4.13 Puzzle heatmap, bar plots and legend to one visualisation.

grob.title <- textGrob(expression(bold(underline("Demographic Structure Of Singapore Population by Age Group Category by Planning Area"))),gp = gpar(fontsize = 15))

grob.caption <- textGrob("Data source: www.singstat.gov.sg",gp = gpar(fontsize = 10))

grid.arrange(bp.x, legend, hm.clean, bp.y, nrow = 2, ncol = 2, widths = c(40, 40), heights = c(10, 30),top = grob.title, bottom = grob.caption)

5. Final data visualization

This visualisation is a combination of heatmap and bar plot. The reader can see the population of each age group category and by each planning area. The different colour representation allows the viewer to make comparisons easily.

This visualisation is also capable of showing the total population of each age group category and planning area. More than seeing the numbers only, bar plot make it easy for the reader to spot planning areas/ age group categories that have more people.

The bar plot is not sorted for a purpose. They are arranged in alphabetical order, so it is easier for the reader to find a specific planning area in the visualisation.

6. Insights

  1. Insight 1: Bedok has the highest population compared to the rest of the planning area (279.97 thousand).

  2. Insight 2: A large proportion of the population are adults (1.818 million).

  3. Insight 3: Adults tend to live at planning areas such as Jurong West, Hougang, SengKang, Bedok, Tampines, Woodlands and Yishun which are transport convenient or close to office space.

  4. Insight 4: Even though more people are living in Bedok, there are more adults are living in Jurong West (121.73 thousand) than Bedok (121.03 thousand).