1. Overview

This visualization aims to reveal the demographic structure of Singapore’s population specifically by age group and planning area in the year 2019.

1.1 Data Source

“Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, June 2011-2019”, data set by Singapore Department of Statistics.

1.2 Data and Design Challenges

Type Challenge Solution
Data AG 5_to_9 is placed at the wrong position Rename it to 05_to_9 so that it comes immediately after “0_to_4”
Design The large number of age groups can show a more representative population distribution but also makes it challenging to interpret so many age groups Use histogram to show the shape of the distribution and a ternary plot to visualize the population in terms of 3 main categories: young dependents, economically active and old dependents
Data Getting the right rows and columns for the plots Prepare 2 different data sets from the original data set to fulfill the respective requirements for each plot
Design Difficult to interpret and differentiate the points on a ternary plot because of the 3 axes and the identical dots on the plot Scale the size of the markers by Old Age Support Ratio and use colors to differentiate the Planning Areas
Design Labeling x axis tick marks would make it very cluttered due to the large number of age groups Hide the tick marks and use a colored legend to distinguish the age groups instead

1.3 Proposed Sketched Design

2. Data Visualization Step-by-Step

2.1 Install and Load R Packages

packages = c('plotly', 'readr', 'dplyr', 'tidyr')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Load Data

pop_data <- read_csv('population-data/respopagesextod2011to2020.csv')

2.3 Extract Data for Year 2019 only

pop_filtered_data <- pop_data %>% 
  filter(Time==2019)

2.4 Prepare Data for Histogram

2.4.1 Place AG 5_to_9 at the Right Position

pop_filtered_data$AG <- as.character(pop_filtered_data$AG)
pop_filtered_data$AG[pop_filtered_data$AG == "5_to_9"] <-"05_to_9"
pop_filtered_data
## # A tibble: 98,192 x 7
##    PA        SZ               AG    Sex    TOD                         Pop  Time
##    <chr>     <chr>            <chr> <chr>  <chr>                     <dbl> <dbl>
##  1 Ang Mo K… Ang Mo Kio Town… 0_to… Males  HDB 1- and 2-Room Flats       0  2019
##  2 Ang Mo K… Ang Mo Kio Town… 0_to… Males  HDB 3-Room Flats             10  2019
##  3 Ang Mo K… Ang Mo Kio Town… 0_to… Males  HDB 4-Room Flats             10  2019
##  4 Ang Mo K… Ang Mo Kio Town… 0_to… Males  HDB 5-Room and Executive…    20  2019
##  5 Ang Mo K… Ang Mo Kio Town… 0_to… Males  HUDC Flats (excluding th…     0  2019
##  6 Ang Mo K… Ang Mo Kio Town… 0_to… Males  Landed Properties             0  2019
##  7 Ang Mo K… Ang Mo Kio Town… 0_to… Males  Condominiums and Other A…    50  2019
##  8 Ang Mo K… Ang Mo Kio Town… 0_to… Males  Others                        0  2019
##  9 Ang Mo K… Ang Mo Kio Town… 0_to… Femal… HDB 1- and 2-Room Flats       0  2019
## 10 Ang Mo K… Ang Mo Kio Town… 0_to… Femal… HDB 3-Room Flats             10  2019
## # … with 98,182 more rows

2.4.2 Get Population by Age Group by Planning Area Data

pop_hist_data <- pop_filtered_data %>% 
  group_by(PA,AG) %>% 
  summarise(Pop = sum(Pop)) %>%
  filter(Pop > 0)
pop_hist_data
## # A tibble: 757 x 3
## # Groups:   PA [42]
##    PA         AG         Pop
##    <chr>      <chr>    <dbl>
##  1 Ang Mo Kio 0_to_4    5420
##  2 Ang Mo Kio 05_to_9   6230
##  3 Ang Mo Kio 10_to_14  7380
##  4 Ang Mo Kio 15_to_19  7930
##  5 Ang Mo Kio 20_to_24  8920
##  6 Ang Mo Kio 25_to_29 10620
##  7 Ang Mo Kio 30_to_34 10510
##  8 Ang Mo Kio 35_to_39 10940
##  9 Ang Mo Kio 40_to_44 11760
## 10 Ang Mo Kio 45_to_49 12570
## # … with 747 more rows

2.4.3 Visualize Population Distribution by Age Group by Planning Area

hist <- pop_hist_data %>% 
  ggplot(aes(x=AG,y=Pop))+
    geom_bar(
    aes(fill = AG), stat = "identity", color = "white",
    position = position_dodge(0.9)
    )+
  facet_wrap(~PA)+
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        strip.text.x = element_text(size=6),
        axis.text.y=element_text(size=5)
        )+
  ggtitle("Population Distribution by Age Group by Planning Area 2019")+
  xlab("Age Group")+
  ylab("Population")

hist

2.5 Prepare Data for Ternary Plot

2.5.1 Pivot Data Table such that each Age Group becomes a Column

pop_pivoted_data <- pop_filtered_data %>%
  group_by(PA,AG) %>%
  summarise(Pop = sum(Pop), .groups='drop') %>%
  pivot_wider(names_from = AG, values_from = Pop)
pop_pivoted_data
## # A tibble: 55 x 20
##    PA             `0_to_4` `05_to_9` `10_to_14` `15_to_19` `20_to_24` `25_to_29`
##    <chr>             <dbl>     <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
##  1 Ang Mo Kio         5420      6230       7380       7930       8920      10620
##  2 Bedok             10020     11640      13300      14640      16660      19530
##  3 Bishan             2850      3850       4430       4740       5570       7090
##  4 Boon Lay              0         0          0          0          0          0
##  5 Bukit Batok        7130      6640       7800       8800       9850      12510
##  6 Bukit Merah        6100      6650       6640       6380       6850       9140
##  7 Bukit Panjang      6700      7230       7680       8500       9570      10560
##  8 Bukit Timah        3160      4820       5000       4810       4690       4710
##  9 Central Water…        0         0          0          0          0          0
## 10 Changi              130       130        140        110         80        100
## # … with 45 more rows, and 13 more variables: 30_to_34 <dbl>, 35_to_39 <dbl>,
## #   40_to_44 <dbl>, 45_to_49 <dbl>, 50_to_54 <dbl>, 55_to_59 <dbl>,
## #   60_to_64 <dbl>, 65_to_69 <dbl>, 70_to_74 <dbl>, 75_to_79 <dbl>,
## #   80_to_84 <dbl>, 85_to_89 <dbl>, 90_and_over <dbl>

2.5.2 Group Age Groups into Young, Active, Old and Calculate Old Age Support Ratio

pop_mutated <- pop_pivoted_data %>%
  mutate(YOUNG = rowSums(.[3:7]))%>%
  mutate(ACTIVE = rowSums(.[8:15]))  %>%
  mutate(OLD = rowSums(.[16:20])) %>%
  mutate(TOTAL = rowSums(.[3:20])) %>%
  mutate(Old_age_ratio = round(.$ACTIVE/.$OLD,2)) %>%
  filter(TOTAL > 0)
df <- data.frame(pop_mutated)

2.5.3 Visualize Ternary Plot of Subzone

axis <- function(txt) {
  list(
    title = txt, tickformat = "%", tickfont = list(size = 10)
  )
}
ternaryAxes = list(
  aaxis = axis("ACTIVE(A)"),
  baxis = axis("OLD(B)"),
  caxis = axis("YOUNG(C)")
)
ternary <- plot_ly(
  df, a = ~ACTIVE, b = ~`OLD`, c = ~YOUNG, color = ~PA, type = "scatterternary",text=~paste("<br>Planning Area:",df$PA,'<br>Old Age Support Ratio:',Old_age_ratio) ,mode="marker",marker=list(size = ~Old_age_ratio*1.2, opacity=1),colors = "Set1"
) %>%
  layout(title="Old Age Support Ratio (A/B) by Planning Area 2019 ",
    ternary = ternaryAxes,margin = 0.05,showlegend = TRUE
  ) 

ternary

3. Final Data Visualization

hist

ternary

3.1 Purpose of Visualizations

The top visualization reveals the population count as well as the spread across the age groups for each planning area.

The bottom visualization complements the top by grouping the age groups into Young, Active and Old categories and providing the Old Age Support Ratio across each planning area.

3.2 Derived Insights

  1. Bedok, Hougang, Jurong West, Sengkang, Tampines, Woodlands, Yishun are the more populated areas as seen from the area covered by the histograms. These areas are known to be affordable residential estates which could explain the observation.
  2. Contrasted against the above, Changi, Downtown Core, Lim Chu Kang, Mandai, Museum, Newton, Orchard, Outram, River Valley, Rochor, Seletar, Singapore River, Southern Islands, Sungei Kedut, Tanglin, Western Water Catchment are very lowly populated. These areas mostly fall into these few categories: outskirts of Singapore, very small land area, for non-residential purposes, and very expensive land.
  3. Punggol and Sengkang has a significantly younger population as seen from the sharp decrease in numbers for the older age groups. This could be due to the development of many BTO flats in those areas in the past decade resulting in many young couples living there.
  4. Bishan, Bukit Merah, Clementi, Geylang, Kallang, Queenstown, Toa Payoh are areas with more evenly distributed population across different age groups. These areas are known to be mature estates which could mean more multi-generational families living together.
  5. All planning areas fall within the top right of the ternary plot suggesting that the bulk of Singapore’s population fall within the economically active class.
  6. Mature estates like Bukit Merah, Outram, Queenstown, Toa Payoh have lower old age support ratio while non-mature estates like Choa Chu Kang, Punggol, Sembawang, Sengkang have a much higher old age support ratio. Additionally, the proportion of the economically active population within these areas are consistently around 60% meaning the difference in old age support ratio lies in the proportion of elderly. A likely reason for this could be that the older generation mostly live in their decades-old homes which are probably in mature estates.