1 Introduction

This interactive visualization aims to reveal the demographic structure of Singapore population by age cohort and by planning area in 2019.

1.1 Data Sources

  • Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, June 2011-2019. This data set is available at Singapore Department of Statistics.
  • URA Master Plan 2014 Planning Subzone GIS data. This data set is also available at data.gov.sg.

1.2 Challenges

Type of Challenge Description
Design Challenge There are so many age groups that filling all into one visualization will cases the plot become very cluttered.
Design Challenge To present both age group and planning area attributes in one map visualization.
Data Challenge Due to the data type of the Age groups, the group “5_to_9” is placed after group “45_to_49” instead of “0_to_4”.

1.3 Plans to Address Challenges

Challenge Solution
There are so many age groups that filling all into one visualization will cases the plot become very cluttered. Regroup the ages into 3 main groups (Aged, Young, Economy Active).
To present both age group and planning area attributes in one map visualization. Use leaflet library to separate the age groups into 3 layers. With the leaflet map, users are able to switch between each age group by checking the checkboxes at the left side of the map.
Due to the data type of the Age groups, the group “5_to_9” is placed after group “45_to_49” instead of “0_to_4”. Hard code the column index to specify which column to use while summing up the population.

1.4 Proposed Sketch Design

2 Input

2.1 Loading libraries

The code chunk below will check if the R packages in the packaging list have been installed. if not, install the library. After the installation, it will also load the R packages in R.

packages <- c('rgdal', 'spdep',  'tmap', 'tidyverse', 'prettydoc', 'sf', 'magick', 'plotly', 'leaflet', 'RColorBrewer')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Reading data

  • Storing population data into pop object
  • Storing Planning Subzone GIS data into mpsz object
pop <- read_csv("../data/aspatial/respopagesextod2011to2020.csv")
mpsz = st_read(dsn = "../data/geospatial", 
                  layer = "MP14_SUBZONE_WEB_PL")

2.3 Data Preparation

The code chunk below are used for:

  • Extracting column “REGION_N”, PLN_AREA_N" and “geometry” to object mpsz_pa_sf
  • Setting object mpsz_pa_sf crs to EPSG:3414
  • Checking if data contains NA value
  • Making geometry of object mpsz_pa_sf from invalid to valid
mpsz_pa_sf <- st_as_sf(mpsz[c("REGION_N", "PLN_AREA_N")])
mpsz_pa_sf <- st_set_crs(mpsz_pa_sf, 3414)

mpsz_pa_sf[rowSums(is.na(mpsz_pa_sf))!=0,]
## Simple feature collection with 0 features and 2 fields
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS:  SVY21 / Singapore TM
## [1] REGION_N   PLN_AREA_N geometry  
## <0 rows> (or 0-length row.names)
mpsz_pa_sf <- st_make_valid(mpsz_pa_sf)

The code chunk below are used for:

  • Using the spread function to convert the age group into columns and the population as the rows.

  • Mutating new columns YOUNG, ECONOMY ACTIVE, AGED, and TOTAL by summing up the values in specific columns.

    Category Age Group
    Young 0 - 24 years old
    Economy Active 25 - 64 years old
    Aged 65 years old and above
  • Mutate a new column DEPENDENCY by calculating the sum of young and aged population devided by economy active population

  • Mutate a new column DEPENDENCY_R by rounding the value in DEPENDENCY column into 2 digits

  • Saving the result to popdata2019

popdata2019 <- pop  %>%
  filter(Time == 2019) %>%
  group_by(PA, SZ, AG) %>%
  summarise(`POP` = sum(`Pop`)) %>%
  ungroup()%>%
  spread(AG, POP)%>%
  mutate(`YOUNG` = rowSums(.[3:6])
        +rowSums(.[12])) %>%
  mutate(`ECONOMY ACTIVE` = rowSums(.[7:11])+rowSums(.[13:15])) %>%
  mutate(`AGED` = rowSums(.[16:21])) %>%
  mutate(`TOTAL` = rowSums(.[3:21])) %>%
  mutate(`DEPENDENCY` = (`YOUNG` + `AGED`)/`ECONOMY ACTIVE`) %>%
  select(`PA`, `SZ` , `YOUNG`, `ECONOMY ACTIVE`, `AGED`, `TOTAL`, `DEPENDENCY`)

popdata2019 <-  popdata2019 %>%
  mutate(`DEPENDENCY_R` = round(`DEPENDENCY`, digits = 2))

The code chunk below are used for:

  • Convert the popdata2019 characters in column PA and SZ to upper case.
  • Filter out those having 0 economy active population
  • Create object mpsz_sub from object mpsz by selecting only SUBZONE_N, PLN_AREA_N, REGION_N
  • Create object mpszpop2019 by merging object mpsz_sub and popdata2019 by joining the subzone names
popdata2019 <- popdata2019 %>%
  mutate_at(.vars = vars(PA, SZ), 
            .funs = list(toupper)) %>%
  filter(`ECONOMY ACTIVE` > 0)

mpsz_sub <- mpsz %>%
  select(SUBZONE_N, PLN_AREA_N, REGION_N)

mpszpop2019 <- merge(mpsz_sub, popdata2019, 
                      by.x = "SUBZONE_N", by.y = "SZ")

Create objects that duplicating object mpszpop2019. These will be used as layer names in the leaflet map.

Total_Population <- mpszpop2019
Dependency_Rate <- mpszpop2019
Aged_pop <- mpszpop2019
Young_pop<- mpszpop2019
Economy_Active_pop <- mpszpop2019

3 Creat Map

3.1 Create Tmap

The map consists 5 layers:

  • Total Population
  • Dependency_Rate
  • Aged_pop
  • Young_pop
  • Economy_Active_pop

The code chunk below are used for:

  • Add the total populations as polygons to the map
  • Add the rest of layers as bubbles to the map
  • Set the title for the legend
  • Set the palette of each layer in different colors
  • Configure the popup window details by setting the popup.vars
  • Set the bubble sizes scale to 2 which make the size bigger
  • Set the minimum zoom level to 11 and maximum zoom level to 14
tmap_mode("view")
tm <- 
  tm_shape(Total_Population) + 
  tm_polygons("TOTAL", 
          palette = "Blues",
          popup.vars=c(
                  "Planning Area: "="PLN_AREA_N",
                  "Depenency Rate: " = "DEPENDENCY_R",
                  "Young Population" = "YOUNG",
                  "Economy Active Population" = "ECONOMY ACTIVE",
                  "Aged Population" = "AGED"
                  ),
          title = "Population",
          alpha = 1,
          n = 6,
          )+
  tm_shape(Dependency_Rate) + 
  tm_bubbles(size = "DEPENDENCY",
           col = "REGION_N",
           title.col="Dependency rate",
           title.size="Dependency rate (%)", 
           scale = 2,
           border.alpha = .5,
           popup.vars = c(
                  "Planning Area: "="PLN_AREA_N",
                  "Depenency Rate: " = "DEPENDENCY_R"
                  ),
            )+
  tm_shape(Aged_pop) + 
  tm_bubbles(size = "AGED",
           col = "AGED",
           palette = "OrRd",
           title.col="Aged Population",
           scale = 2,
           border.alpha = .5,
           popup.vars = c(
                  "Planning Area: "="PLN_AREA_N",
                  "Aged Population" = "AGED"
                  ),
            )+
  tm_shape(Young_pop) + 
  tm_bubbles(size = "YOUNG",
           col = "YOUNG",
           palette = "PuRd",
           title.col="Young Population",
           scale = 2,
           border.alpha = .5,
           popup.vars = c(
                  "Planning Area: "="PLN_AREA_N",
                  "Young Population" = "YOUNG"
                  ),
            )+
  tm_shape(Economy_Active_pop) + 
  tm_bubbles(size = "ECONOMY ACTIVE",
           col = "ECONOMY ACTIVE",
           palette = "BuGn",
           title.col="Economy Active Population",
           scale = 2,
           border.alpha = .5,
           popup.vars = c(
                  "Planning Area: "="PLN_AREA_N",
                  "Economy Active Population" = "ECONOMY ACTIVE"
                  ),
            )+
  tm_borders(alpha = 0.5)+
  tm_view(set.zoom.limits = c(11,14))

3.2 Create Leaflet Map

Convert the tmap to leaflet map using tmap_leaflet() function. And hide the Economy_Active_pop, Young_pop, Aged_pop layers from the map initially.

lf <- tmap_leaflet(tm) %>%
  leaflet::hideGroup("Economy_Active_pop") %>%
  leaflet::hideGroup("Young_pop") %>%
  leaflet::hideGroup("Aged_pop")

4 Final Plot and Findings

Use the interactive plot to figure out the findings further.

lf

In this ternary plot, the age groups are now divided into the Young, Economy Active and Aged on the 3 axes and the values of the point which corresponds to the 3 axes should add up to 100%. The color density of the choropleth map represents the number of population in each subzone. And size of the bubbles represent the dependency rate in each subzone, their colors are grouped by 5 regions Central region, East region, North region, North-east region, west region into different colors. By checking the checkboxes at the left side of the map, we can see the different layers, such as aged population. The aged population also being presented into bubbles, the color and the size of bubbles change base on the population number.

The 2 main insights gathered from the interactive map:

  1. Tampines East subzone which under planning area, Tampines, contains the most population comparing to other subzones in Singapore.
  2. Loyang West subzone which under planning area, Pasir Ris, has the highest dependency rate. The subzone only contains 10 economy active population but with 180 aged population.