1. Overview

Using the 2019 Singapore Population data published in Singstat, the aim of this visualisation is to reveal the demographic structure of Singapore population by age cohort and by planning area to identify insights .

1.1 About the data

The data consist of the Age, Planning Area (PA), Subzone, Genders, Type of Dwelling and the Population size. Planning area refers to areas demarcated in the Urban Redevelopment Authority ’s Master Plan 2004, and the population numbers are rounded to nearest 10.

2. Proposed Design

To visualize the demographic structure, propose to use age-gender pyramid chart and the population ternary plot

  • Age-gender pyramid
    • A classic chart commonly used to identify the age population, as it is plotting the gender against the age cohort. With Planning zone data, pyramid chart can plot for each individual zone.
  • Ternary Plot
    • Compare to normal plot, ternary plot can show three variables. Using the data to obtain the Young, Active and Old group to create three variables.
    • Static Ternary Plot can be difficult for reading, proposed to use interactive chart. With tooltips, by hovering over a point, value will be displayed.

2.1 Sketch Design

With idea in hand, a sketch is created as follow:

3. Step by Step Detail

3.1 Install R Packages

To create the age-gender pyramid and ternary plot the following packages is required:

  • tidyverse: Package for data manipulation and exploration
  • ggplot2: Create Elegant Data Visualisation using the grammar of graphics
  • plotly: Create interactive web graphics
  • reshape2: Flexibly Reshape Data
  • ggtern: Extension to ggplot2 for creation of Ternary Diagrams

Important point: ggtern are note compatible to current version of ggplot2. During the initial stage we will not install the ggtern packages. It will be installed during the creation of the ternary chart.

packages <- c('tidyverse', 'ggplot2','plotly','reshape2')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

3.2 Load the Data

Load the data using the read_csv function. Preparation and cleaning of the data was done through excel. After the data is loaded, may proceed with creating of the Age-gender pyramid chart

#Reading the data into R environment 
pop<-read_csv('pop2019.csv')

3.3 Plot Age-Gender Pyramid Chart

Loading the data, unable to tell if the variables is continuous or discrete. Therefore, manually sorting of the Age in ascending order is performed first. Then plotting Age and Gender to show the population structure of Singapore. To obtain the pyramid chart for each Planning Area, same code is with adding the Facet_wrap() function.

#turn off scientific notation and alternative method is scale_y_continuous(labels = scales::comma)
options(scipen = 999)
pop$Age<-factor(pop$Age,levels=c('0_to_4',
                                 '5_to_9',
                                 '10_to_14',
                                 '15_to_19',
                                 '20_to_24',
                                 '25_to_29',
                                 '30_to_34',
                                 '35_to_39',
                                 '40_to_44',
                                 '45_to_49',
                                 '50_to_54',
                                 '55_to_59',
                                 '60_to_64',
                                 '65_to_69',
                                 '70_to_74',
                                 '75_to_79',
                                 '80_to_84',
                                 '85_to_89',
                                 '90_and_over'))

# to set the x axis label 
brks<-seq(-400000,400000,50000)
lbls<-paste0(as.character(c(seq(400,0,-50),seq(50,400,50))),'K')

ggplot(data=pop,aes(x=Age,fill=Gender,y=ifelse(Gender=='Males',-Population,Population)))+
  geom_bar(stat='identity')+
  scale_y_continuous(breaks=brks,
                     labels=lbls)+
  coord_flip()+
  labs(y='Population', title='Age-Gender Pyramid Chart 2019')+
  scale_fill_brewer(palette = 'Set1')+
  theme_bw()

#Set the figure(Chart) size, to avoid having chart too small to be read. 

brks<-seq(-400000,400000,50000)
lbls<-paste0(as.character(c(seq(400,0,-50),seq(50,400,50))),'K')

#To create the divergent instead of stack bar, set the either Males and Female population size to negative.
ggplot(data=pop,aes(x=Age,fill=Gender,y=ifelse(Gender=='Males',Population,-Population)))+
  geom_bar(stat='identity')+
  facet_wrap(~PA)+
  scale_y_continuous(breaks=brks,
                     labels=lbls)+
  coord_flip()+
  labs(y='Population', title='Age-Gender Pyramid Chart by Planning Area 2019')+
  scale_fill_brewer(palette = 'Set1')+
    theme_bw()

3.4 Population Ternary Plot

To plot ternary, 3 axes are required. Data manipulation is required to derive the Young, Active and the Old Group.

3.4.1 Data Manipulation/Preparation

Based on the data loaded, Age is in long format. Using dcast() function to change it to wide format and mutate() function to derive Young(Age 0-24), Active(25-64) and Old(65&above) group

#convert the data to wide format and derive additional variables 
dcast(pop,PA+Subzone+Dwelling_Type+Gender~Age,value.var='Population')
agepop_mutated<-pop%>%
  spread(Age,Population)%>%
  mutate(Young=rowSums(.[5:9]))%>%
  mutate(Active=rowSums(.[10:17]))%>%
  mutate(Old=rowSums(.[18:23]))%>%
  mutate(total=rowSums(.[24:26]))%>%
  filter(total>0)

3.4.2 Plotting the Ternary Plot

Mentioned previously due to compatibility issue, ggtern will be install at this stage. After installing, using the ggtern() and manipulated data, Ternary Plot is created

#Building ternary plot using ggtern() function 
library(ggtern)
ggtern(data=agepop_mutated,aes(x=Young,y=Active,z=Old))+
  geom_point()+
  labs(title="Population Structure 2019") +
  theme_rgbw()

3.4.3 Interactive Ternary Plot

Reading off the graph tends to be tedious for static Ternary Plot. Proposed to use plotly and interactive plot can created with tooltip, and also visualized based on Planning Area.

#Formatting the axis 
axis <- function(txt) {
  list(
    title = txt, tickformat = ".0%", tickfont = list(size = 10)
  )
}

ternaryAxes = list(
  aaxis = axis("Active"), 
  baxis = axis("Young"), 
  caxis = axis("Old")
)


title_detail = list(
  size = 14,
  color = 'black')


#use plot_ly() function to build ternary plot 
plot_ly(
  agepop_mutated, 
  a = ~Active, 
  b = ~Young, 
  c = ~Old,
  color = ~PA,
  size = ~total,
  marker = list(
           line = list(color = 'rgb(152, 0, 0,)',
           width = 0.5)),
  type = "scatterternary"
) %>%
  layout(
    ternary = ternaryAxes,
    title = list(text= '<b>Population Structure 2019</b>',
                 y= 10, x= 0.000005, 
                 font= title_detail)
  )

4. Insights

From the Age-gender pyramid, the demographic structure resembles bell shape. This indicate low birth rate, death rate and Higher life expectancy. The height (X axis) tell the life expectancy and the base of the pyramid (Age 0-4) present the birth rate. The pyramid chart reveals that female tends to have higher life expectancy than Males. It is indicated by longer bars for the female in age cohort 75 and above compare to males. Singapore is experiencing low birth rate which represented by the narrow base with the Bar at age 0 - 4 being shortest compare to age range from 0 - 24. Divide the population into 3 groups - Young (0-24), Active (25-64) and Old(65above), with low birth rate it indicates an ageing economy. Lower numbers of the population will transit from Young to Active Group, and higher number of populations from the active group will transit into Old group down the years.

Plotting against the planning area, it reveals the demographic structure in each individual zone. From the age-gender by planning area chart, nonresidential areas area identified such as Tuas, where the chart is empty space instead of the pyramid chart.

Mature planning area such Choa Chu Kang, Tampines, Bedok and Woodlands have higher old age population and age cohort of 25 to 29 and 50 to 60. New planning area such as such Seng Kang and Punggol have higher young population age cohort of 0-9 and 30 to 39. This indicate the marriage age in Singapore is around 30. After married, they move out the mature planning zone to new planning area. In the new planning zone, it needs to build up more childcare facilities to cater to growing young population in the area, and for mature planning zone, old age facilities are required.

From the pyramid chart, viewing the percentage active, young and old group is not user friendly. Ternary plot is used to reveal the percentage of the active young and old. From the ternary plot, active population (Age25-64) is the most in Singapore, around 60% and Young and Old at 20% each based on the concentration of the dots.

Using interactive plot, to include planning zone, user can hover over to a point in the ternary plot to read off the value. The legend the Planning Area with the bigger the circle, the higher the numbers of the population in that planning area. User can double click on the Planning area to view the individual planning zone. For instance, double click on Seng Kang and followed by clicking Ang Mo Kio. Comparing the two planning area, it is noticed that Seng Kang have higher younger population based on the concentration of the dots on the ternary plot.

5. Challenges Faced

With design in mind, one tends to rush to plot chart in R, however it is good read up R Markdown cheat sheet, summary sheet and lecture slides before starting to understand the syntax required and guidelines on the code.

Challenges face is unable to code it correctly. Most of the time online research is needed to get the code correctly. The error message from the R does help to identify basic error due missing coma, extra bracket. Once the graph is generated, the visualization of the graph needs to be tune, for example the scientific notation and size of the figure, this was realized only after we have coded, and the output does not meet the requirement

Other challenges faced, include packages that are conflicting each other. During the installing of package warning message are usually ignored, and time was spent on checking the code and finding out the error. For instance the ggtern and ggplot there is conflicting between coordinate system. It is advised to read the warning message and identify those conflicting package and lookout for words as such function masked by in the warning message.