1 Introduction

This visualization aims to reveal the demographic structure of Singapore population by age cohort and by planning area in 2019.

1.1 Data Sources

  • Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, June 2011-2019. This data set is available at Singapore Department of Statistics.
  • URA Master Plan 2014 Planning Subzone GIS data. This data set is also available at data.gov.sg.

1.2 Challenges

Type of Challenge Description
Design Challenge There are too many age groups to fill into one chart. This will cases the plot become very cluttered.
Design Challenge To present both age group and planning area attributes in one visualization.
Data Challenge Due to the data type of the Age groups, the group “5_to_9” is placed after group “45_to_49” instead of “0_to_4”.

1.3 Plans to Address Challenges

Challenge Solution
There are too many age groups to fill into one chart. This will cases the plot become very cluttered. Regroup the ages into 3 groups.
To present both age group and planning area attributes in one visualization. Use ternary plot to present the age in axes. And plot the circle’s sizes with total population and colour with the region name.
Due to the data type of the Age groups, the group “5_to_9” is placed after group “45_to_49” instead of “0_to_4”. Hard code the column index to specify which column to use while summing up the population.

1.4 Proposed Sketch Design

2 Input

2.1 Loading libraries

The code chunk below will check if the R packages in the packaging list have been installed. if not, install the library. After the installation, it will also load the R packages in R.

packages <- c('rgdal', 'spdep',  'tmap', 'tidyverse', 'prettydoc', 'sf', 'magick', 'plotly')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Reading data

  • Storing population data into pop object
  • Storing Planning Subzone GIS data into mpsz object
pop <- read_csv("../data/aspatial/respopagesextod2011to2020.csv")
mpsz = st_read(dsn = "../data/geospatial", 
                  layer = "MP14_SUBZONE_WEB_PL")

2.3 Data Preparation

The code chunk below are used for:

  • Extracting column “REGION_N”, PLN_AREA_N" and “geometry” to object mpsz_pa_sf
  • Setting object mpsz_pa_sf crs to EPSG:3414
  • Checking if data contains NA value
  • Making geometry of object mpsz_pa_sf from invalid to valid
  • Filtering population in year 2019
  • Changing columns “PA” and “SZ” to upper case for the purpose of joining data later
mpsz_pa_sf <- st_as_sf(mpsz[c("REGION_N", "PLN_AREA_N")])
mpsz_pa_sf <- st_set_crs(mpsz_pa_sf, 3414)

mpsz_pa_sf[rowSums(is.na(mpsz_pa_sf))!=0,]
## Simple feature collection with 0 features and 2 fields
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS:  SVY21 / Singapore TM
## [1] REGION_N   PLN_AREA_N geometry  
## <0 rows> (or 0-length row.names)
mpsz_pa_sf <- st_make_valid(mpsz_pa_sf)

pop_2019 <- pop %>%
  filter(Time == 2019) %>%
  mutate_at(.vars = vars(PA, SZ),
            .funs = funs(toupper))

DT::datatable(
  head(pop_2019), extensions = 'FixedColumns',
  options = 
    list(dom = 't',
         columnDefs = list(list(width = '100px', targets = c(1, ncol(pop_2019)))),
         scrollX = TRUE,
         scrollCollapse = TRUE)
)
  • Using the spread function to convert the age group into columns and the population as the rows.

  • Mutating new columns Yong, Active, Old, and Total by summing up the values in specific columns.

    Category Age Group
    Young 0 - 24 years old
    Active 25 - 64 years old
    Old 65 years old and above
  • Saving the result to pop_2019_age

pop_2019_sub <- pop_2019 %>%
  spread(AG, Pop) %>%
  mutate(Young = rowSums(.[6:9])+rowSums(.[15]))%>%
  mutate(Active=rowSums(.[10:14])+rowSums(.[16:18]))  %>%
  mutate(Old = rowSums(.[19:24])) %>%
  mutate(Total = rowSums(.[25:27]))
pop_2019_age  <- data.frame(pop_2019_sub)
DT::datatable(
  head(pop_2019_age), extensions = 'FixedColumns',
  options = 
    list(dom = 't',
         columnDefs = list(list(width = '100px', targets = c(1, ncol(pop_2019_age)))),
         scrollX = TRUE,
         scrollCollapse = TRUE)
)

Create another object mpsz_pop_2019_age by joining the planning area (PA and PLN_AREA_N).

mpsz_pop_2019_age <- left_join(pop_2019_age, mpsz_pa_sf, by= c("PA"="PLN_AREA_N"))
DT::datatable(
  head(mpsz_pop_2019_age), extensions = 'FixedColumns',
  options = 
    list(dom = 't',
         columnDefs = list(list(width = '100px', targets = c(1, ncol(mpsz_pop_2019_age)))),
         scrollX = TRUE,
         scrollCollapse = TRUE)
)

3 Singapore Population in Ternary Plot

Create a function to format the values in each ternary axis.

axis <- function(txt) {
  list(
    title = txt, tickformat = ".0%", tickfont = list(size = 10)
  )
}

Creating a list call ternary_axes to store ternary axes values.

ternary_axes = list(
  aaxis = axis("Active"), 
  baxis = axis("Young"), 
  caxis = axis("Old")
)

Create the ternary chart.

  • Setting the ternary to Active, Yong, and Old
  • Setting the text in bubbles to PA (Planning Area)
  • Setting the size of bubbles to Total (Total Population)
  • Setting the outline of markers to color rgba(0, 0, 0, .8) and width to 0.5
  • Setting the type to scatterternary
ternary_chart <- plot_ly(mpsz_pop_2019_age, 
  a = ~Active, 
  b = ~Young, 
  c = ~Old,
  color = ~REGION_N,
  text = ~PA,
  size = ~Total,
  marker = list(
           line = list(color = 'rgba(0, 0, 0, .5)',
           width = 0.5)),
  type = "scatterternary"
)

Plot the ternary chart with setting the tile, ternary, and margin.

  • Setting the title to Demographic Structure of Singapore by Age Group and Planning Area in 2019
  • Setting the ternary as ternay_axes which is the list created above
  • Setting left and right margins to 0, bottom margin to 50, and top margin to 130
fig <- ternary_chart %>% layout(
    title = list(text = "Demographic Structure of Singapore \nby Age Group and Planning Area in 2019"),
    ternary = ternary_axes,
    margin = list(l = 0, r = 0, b = 50, t = 130)
    )

4 Final Plot and Findings

4.1 Static Plot

4.2 Interactive Plot

Use the interactive plot to figure out the findings further.

fig

In this ternary plot, the age groups are now divided into the Young, Active and Old on the 3 axes and the values of the point which corresponds to the 3 axes should add up to 100%. The population in each planning area is presented in the size of the point. And the planning area are grouped into 5 regions Central region, East region, North region, North-east region, west region with different colors.

The 3 main insights gathered from the ternary plot:

  1. The age group Active make up the largest percentage in most of the planning areas in Singapore which shows that majority of Singapore population is age between 25 and 64 years old.
  2. There are more planning areas having more people aged older than 65 than people younger than 24 since the plot shows more points fall in the age group Old comparing to the age group Young.
  3. The points distribution from the West region is clustering at the top-left corner of the plot. This means that the age distribution in most of the planning area under the West region has not much difference.