Introduction

This data visualization aims to draw insights on the demographic structure of Singapore’s population by age cohort and by planning area in 2019.

(Data used will be retrieved from https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data)

1. Data and Design Challenges

  1. Number of categories

The data is categorized into 55 planning zones and 19 age groups. While the comprehensiveness and specificity of the data gives flexibility, it is difficult to visualize the data distinctly without over-clustering the visualizations.

In order to simplify the data, the 55 planning zones will be condensed into the 5 regions (Central, East, North, North-East, West)

  1. Incomplete data

Some regions such as Pioneer have a population of 0.

In order to reduce clustering of data, the data of such locations will be omitted.

  1. Inaccurate order of data

As the data is sorted by alphabetical order, the age group “5_to_9” comes after “45_to_49”

Reorder the age groups such that they are in numerical order.

Proposed Sketch Design

sketch design

2. Step-by-Step Description

  1. Install packages
packages = c('tidyverse')

for(p in packages) {
  if(!require(p, character.only = T)) {
    install.packages(p)
  }
  library(p, character.only=T)
}
## Loading required package: tidyverse
## ── Attaching packages ───────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
  1. Read and filter data
pop_data <- read.csv("respopagesextod2011to2020.csv")

pop_2019 <- subset(pop_data, Time == '2019')
  1. Reorder Age Groups
pop_2019$AG <- as.character(pop_2019$AG)

pop_2019$AG <- factor(pop_2019$AG, levels=unique(pop_2019$AG))
  1. Categorize and group planning zones
central <- c('Bishan','Bukit Merah', 'Bukit Timah','Downtown Core', 'Geylang', 'Kallang', 'Marina East','Marina South','Marine Parade','Museum','Newton','Novena','Orchard','Outram','Queenstown','River Valley','Rochor','Singapore River','Southern Islands','Straits View','Tanglin','Toa Payoh')

east <- c('Bedok', 'Changi','Changi Bay','Pasir Ris','Paya Lebar', 'Tampines')

north <- c('Central Water Catchment', 'Lim Chu Kang', 'Mandai', 'Sembawang', 'Simpang', 'Sungei Kadut', 'Woodlands', 'Yishun')

northeast <- c('Ang Mo Kio', 'Hougang', 'North-Eastern Islands', 'Punggol', 'Seletar', 'Sengkang', 'Serangoon')

west <- c('Boon Lay', 'Bukit Batok', 'Bukit Panjang', 'Choa Chu Kang', 'Clementi', 'Jurong East', 'Jurong West', 'Pioneer', 'Tengah', 'Tuas', 'Western Islands', 'Western Water Catchment')
  1. Update data set to reflect regions
pop_region_2019 <- pop_2019 %>% mutate(Region = case_when(
                                                  `PA` %in% central ~ "Central",
                                                  `PA` %in% east ~ "East",
                                                  `PA` %in% north ~ "North",
                                                  `PA` %in% northeast ~ "North East",
                                                  `PA` %in% west ~ "West",
                                                   TRUE ~ ""))
  1. Check data to ensure that the region is entered correctly
DT::datatable(head(pop_region_2019), class = 'cell-border stripe')

7. Create a facet grid. For each grid, the sum of the population Pop is added up for every age group.

ggplot(data = pop_region_2019, aes(x = AG, y = Pop)) +
  stat_summary(fun = sum, geom = "bar", fill = "grey", size = 1) +
  facet_grid(Region ~ .) +
  labs(title = "Demographic Structure of Singapore", 
    subtitle = "by age group and region", 
    y = "Population Count", 
    x = "Age Group") +
  theme(axis.text.x = element_text(angle=50, hjust=1),strip.background = element_rect(fill="white"))

3. Final Data Visualization

ggplot(data = pop_region_2019, aes(x = AG, y = Pop)) +
  stat_summary(fun = sum, geom = "bar", fill = "grey", size = 1) +
  facet_grid(Region ~ .) +
  labs(title = "Demographic Structure of Singapore", 
    subtitle = "by age group and region", 
    y = "Population Count", 
    x = "Age Group") +
  theme(axis.text.x = element_text(angle=50, hjust=1),strip.background = element_rect(fill="white"))

The data shows the demographic structure for each Singapore region over the different age groups.

  1. General trends For every region, there is a larger concentration of adults compared to youths and the elderly. There is a larger population count of age groups 25 to 64, while the ages below 24 and above 65 is lower.

Across the regions, the general population count (across all age groups) is higher in the Central, North East, and West compared to the East and North.

  1. Age specific trends There is a higher population count of children and youths in the North East and West (approximately 40 000 to 50 000 per age group) across all age groups between 0 to 19.

There is a higher population count of young adults in the West (around 60 000 to 70 000 per age group) across all age groups between 20 to 34

There is a higher population count of adults in the Central, North East, and West (approximately 60 000 to 80 000 per age group) across all age groups between 35 to 59.

While this suggests that the population density in the Central, North East, and West remains high, the different concentrations of age groups in the each regions might suggest a slow transition of younger adults moving away from the Central to the North East.