1.0 Overview

This Data Visualisation (DataViz) aims to design a static data visualisation to reveal the demographic structure of Singapore population by age cohort and by Singapore’s planning area in 2019.The data used in this DataViz is obtained from www.singstat.gov.sg, Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, June 2011-2019 data series (https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data).


2.0 Data and Design Challenges

Given the large number of planning areas, it would be difficult to visualise the demographic structures of each planning area in one single chart. On the other hand, presenting the data in 55 separate standalone charts (eg.55 standalone population pyramid chart) does not help the reader to obtain a global picture of the demographic structures across the 55 planning areas quickly. To address these challenges, it is proposed to design multiple trellis plots where each plot contains a handful of small population pyramids. Each trellis plot shall represent one of the 5 regions in Singapore (North,North East, West, East and Central regions) and each population pyramid within the trellis plot shows the population demographic structure of a planning area belonging to the region.


2.1 Sketch of Purposed DataViz

An example of a trellis plot for the North region of Singapore is shown in the sketch below.


3.0 DataViz Step-by Step

3.1 Install and Load R packages

Install tidyverse and ggplot2 R packages using the code shown below.

packages = c('tidyverse','ggplot2')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

3.2 Load Data

pop_data<- read_csv('respopagesextod2011to2019.csv')

3.3 Data Pre-processing

Use the codes below to examine the structure and characteristics of the DataFrame prior to data pre-processing. The required pre-processing are:

  1. Delete Unwanted Columns.
  2. Rename Columns Names.
  3. Reverse the polarity for the population counts of females.
  4. Create levels in the Age Group factor.
  5. Create subsets of dataframe.
#show DataFrame structure
str(pop_data)
## tibble [883,728 x 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ PA  : chr [1:883728] "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" ...
##  $ SZ  : chr [1:883728] "Ang Mo Kio Town Centre" "Ang Mo Kio Town Centre" "Ang Mo Kio Town Centre" "Ang Mo Kio Town Centre" ...
##  $ AG  : chr [1:883728] "0_to_4" "0_to_4" "0_to_4" "0_to_4" ...
##  $ Sex : chr [1:883728] "Males" "Males" "Males" "Males" ...
##  $ TOD : chr [1:883728] "HDB 1- and 2-Room Flats" "HDB 3-Room Flats" "HDB 4-Room Flats" "HDB 5-Room and Executive Flats" ...
##  $ Pop : num [1:883728] 0 10 30 50 0 0 40 0 0 10 ...
##  $ Time: num [1:883728] 2011 2011 2011 2011 2011 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   PA = col_character(),
##   ..   SZ = col_character(),
##   ..   AG = col_character(),
##   ..   Sex = col_character(),
##   ..   TOD = col_character(),
##   ..   Pop = col_double(),
##   ..   Time = col_double()
##   .. )
#delete unwanted columns
pop_data <- pop_data[,-c(2,5)]
str(pop_data)
## tibble [883,728 x 5] (S3: tbl_df/tbl/data.frame)
##  $ PA  : chr [1:883728] "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" ...
##  $ AG  : chr [1:883728] "0_to_4" "0_to_4" "0_to_4" "0_to_4" ...
##  $ Sex : chr [1:883728] "Males" "Males" "Males" "Males" ...
##  $ Pop : num [1:883728] 0 10 30 50 0 0 40 0 0 10 ...
##  $ Time: num [1:883728] 2011 2011 2011 2011 2011 ...

Since we only want to visualise the population demographics in 2019, filter the DataFrame to keep only 2019’s data.

pop_data <- pop_data %>%
  filter (Time == 2019)
pop_data <- pop_data[,-c(5)]
str(pop_data)
## tibble [98,192 x 4] (S3: tbl_df/tbl/data.frame)
##  $ PA : chr [1:98192] "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" ...
##  $ AG : chr [1:98192] "0_to_4" "0_to_4" "0_to_4" "0_to_4" ...
##  $ Sex: chr [1:98192] "Males" "Males" "Males" "Males" ...
##  $ Pop: num [1:98192] 0 10 10 20 0 0 50 0 0 10 ...

Rename Columns Names. This improves the readability of the subsequent codes.

names(pop_data) <- c('PlanningArea', 'AgeGroup', 'Gender', 'Population')
str(pop_data)
## tibble [98,192 x 4] (S3: tbl_df/tbl/data.frame)
##  $ PlanningArea: chr [1:98192] "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" ...
##  $ AgeGroup    : chr [1:98192] "0_to_4" "0_to_4" "0_to_4" "0_to_4" ...
##  $ Gender      : chr [1:98192] "Males" "Males" "Males" "Males" ...
##  $ Population  : num [1:98192] 0 10 10 20 0 0 50 0 0 10 ...

In the population pyramid, the population counts for females and males are split at the origin (x=0). To do this split during plotting, the females’ population counts have to be negative. Thus, let’s reverse the polarity of the females’ population counts using the codes below.

pop_data$Population<- ifelse(pop_data$Gender == 'Females',-1*pop_data$Population, pop_data$Population)

Assign levels to the Age Group factor to dictate the order of the age groups to be displyed in the x-axis of the plots.

pop_data$AgeGroup<- factor(pop_data$AgeGroup, levels = unique(pop_data$AgeGroup))
str(pop_data)
## tibble [98,192 x 4] (S3: tbl_df/tbl/data.frame)
##  $ PlanningArea: chr [1:98192] "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" "Ang Mo Kio" ...
##  $ AgeGroup    : Factor w/ 19 levels "0_to_4","5_to_9",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Gender      : chr [1:98192] "Males" "Males" "Males" "Males" ...
##  $ Population  : num [1:98192] 0 10 10 20 0 0 50 0 0 -10 ...

Check if order is correct.

levels(pop_data$AgeGroup)
##  [1] "0_to_4"      "5_to_9"      "10_to_14"    "15_to_19"    "20_to_24"   
##  [6] "25_to_29"    "30_to_34"    "35_to_39"    "40_to_44"    "45_to_49"   
## [11] "50_to_54"    "55_to_59"    "60_to_64"    "65_to_69"    "70_to_74"   
## [16] "75_to_79"    "80_to_84"    "85_to_89"    "90_and_over"

Create 5 dataframes where each represents one planning region in Singapore according the planning areas information obtainable from Wikipedia (https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore). The codes used to create the dataframe for the North region is shown below where data belonging to the planning areas in the specified region are selected and copy to a new dataframe named ‘North’. Similar codes are used to create the remaining 4 dataframes.

#For North region
North<-pop_data[pop_data$PlanningArea=="Central Water Catchment"|pop_data$PlanningArea=="Lim Chu Kang"|pop_data$PlanningArea=="Mandai"|pop_data$PlanningArea=="Sembawang"|pop_data$PlanningArea=="Simpang"|pop_data$PlanningArea=="Sungei Kadut"|pop_data$PlanningArea=="Woodlands"|pop_data$PlanningArea=="Yishun",]

# reuse the above chuck of codes for the remaining regions but change the dataframe name and the planning areas in the region.

3.4 Creating the DataViz

With the data prepared, the population pyramids for the North region are created using the following codes. Note that the function facet_wrap will drop a plot if the dataset has zero count. Hence, the drop argument in facet_wrap is set to FALSE to show planning areas that has zero population. Similar codes are used to create the population pyramids for the remaining regions.

#Create the pyramids for the North Region.
#To create the pyramids for other regions, replace the word 'North' with the region name eg 'NorthEast'

ggplot()+
 geom_bar(data = subset(North, Gender=='Females'), aes(x = AgeGroup, y = Population, fill = PlanningArea),stat = 'identity',fill='red')+ 
  geom_bar(data = subset(North, Gender=='Males'), aes(x = AgeGroup, y = Population,fill = PlanningArea),stat = 'identity',fill='blue')+
  scale_y_continuous(breaks = seq(-25000, 25000, 5000), 
                     labels = paste0(as.character(c(seq(25, 0, -5), seq(5, 25,5)))))+
  coord_flip() +
  facet_wrap(~PlanningArea,drop = FALSE, ncol=3)+
  ggtitle("Singapore Population Pyramids by Age Cohort by Planning Area, 2019 (North Region)\n Red:Females  Blue:Males ")+
  xlab("Age Group")+
  ylab('Population in Thousands')+
  theme(legend.position = 'none') 

4.0 Final Visualisation

The DataViz below comprises of 5 trellis plots, each represents 1 region of Singapore and each contains population pyramids of all the planing areas belonging to the region. The scales of the axes in the population pyramids are fixed to facilitate easy and accurate visual comparisons.

On the first look, one can easily tell that there are some planning areas with no population pyramids. These are non-residential areas. The second insight obtained is that some planning areas have very similar demographic structures. For example, Punggol’s & Sengkeng’s structures are similar with a large proportion of 30 to 45 age group and below 10 age group, suggesting these areas contain high proportion of families with young children. On the other hand, Ang Mo Kio’s and Bishan’s are similar with a larger proportion of 50 to 70 age group and a reducing proportion as the age group gets younger. If this trend continues for the next 10-20 years, this two areas will required more infrastructures and amenities to support the aged residents.

4.1 North region of Singapore

4.2 North East Region of Singapore

4.3 East region of Singapore

4.4 West region of Singapore

4.5 Central region of Singapore