1) Overview

Data Source: https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data

This project aims to visualise the demographic structures in Singapore, giving an overview of the population structures across age groups and location.

There are 2 visualisations used:

  • A datatable visualisation to show Total Population across Planning Areas.

  • A facets visualisation consisting of bar plots to compare the Number of Males and Females in each Age Group.

2) Major data and design challenges faced and the plan to overcome these challenges

2.1) Challenges and Solutions

  • The source data has multiple rows for each Age Group and multiple rows for each Planning Area. I will summarise and aggregate the data in order to be able to create the visualisations.

  • The Age Group “5_to_9” is not read by R as the Age Group after “0_to_4”. This will mean that the Age Group “5_to_9” will not show up immediately after the Age Group “0_to_4” in the facet wrapped visualisation. To solve this issue, I defined the Age Group column as a Factor data type and specified the order of the Age Groups.

  • The source data consists of many years of data but we are only interested in data for the year 2019. Hence, I will filter for rows that contains 2019 in the column ‘Time’.

2.2) Sketch of Proposed Design

3) Step by Step Descripton of how the visualisations were prepared

3.1) Load the required R Packages

library(ggplot2)
library(tidyverse)
library(formattable)

3.2) Read the data and preview the data

df <- read_csv("respopagesextod2011to2020.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )
head(df, 5)
## # A tibble: 5 x 7
##   PA        SZ               AG    Sex   TOD                           Pop  Time
##   <chr>     <chr>            <chr> <chr> <chr>                       <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats         0  2011
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats               10  2011
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats               30  2011
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive F~    50  2011
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding thos~     0  2011

3.3) Data Wrangling

We are only interested in 2019 data, so we filter for 2019 in the ‘Time’ column

df <- df[df$Time == 2019,]
head(df,5)
## # A tibble: 5 x 7
##   PA        SZ               AG    Sex   TOD                           Pop  Time
##   <chr>     <chr>            <chr> <chr> <chr>                       <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats         0  2019
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats               10  2019
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats               10  2019
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive F~    20  2019
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding thos~     0  2019

Rename the column names

names(df)[1] <- "Planning_Area"

names(df)[3] <- "Age_Group"

names(df)[6] <- "Population"

Specify order of the Age Groups.

df$Age_Group <- factor(df$Age_Group, levels =c('0_to_4',
'5_to_9',
'10_to_14',
'15_to_19',
'20_to_24',
'25_to_29',
'30_to_34',
'35_to_39',
'40_to_44',
'45_to_49',
'50_to_54',
'55_to_59',
'60_to_64',
'65_to_69',
'70_to_74',
'75_to_79',
'80_to_84',
'85_to_89',
'90_and_over'
))

We want to see the demographic structure by age cohort and planning area. We use the group by function to aggregate the values to obtain the dataframes we need to create the visualisations.

df_age_and_gender <- df %>%  select(Age_Group, Sex, Population) %>% group_by(Age_Group, Sex) %>% summarise(Total_Population = sum(Population))

head(df_age_and_gender, 5)
## # A tibble: 5 x 3
## # Groups:   Age_Group [3]
##   Age_Group Sex     Total_Population
##   <fct>     <chr>              <dbl>
## 1 0_to_4    Females            90850
## 2 0_to_4    Males              94730
## 3 5_to_9    Females            97040
## 4 5_to_9    Males             101290
## 5 10_to_14  Females           102550
df_planning_area <- df %>%  select(Planning_Area, Population) %>% group_by(Planning_Area) %>% summarise(Total_Population = sum(Population))

head(df_planning_area, 5)
## # A tibble: 5 x 2
##   Planning_Area Total_Population
##   <chr>                    <dbl>
## 1 Ang Mo Kio              164430
## 2 Bedok                   279970
## 3 Bishan                   88230
## 4 Boon Lay                     0
## 5 Bukit Batok             154140

3.4) Create datatable visualisation to show Total Population across Planning Areas.

df_planning_area <- df_planning_area[order(-df_planning_area$Total_Population),]
formattable(df_planning_area, 
            align = c("l",rep("r", NCOL(df_planning_area) - 1)),
            list(`Indicator Name` = formatter("span", style = ~ style(color = "grey", font.weight = "bold")), 
                 `Total_Population` = color_bar("#FA614B")))
Planning_Area Total_Population
Bedok 279970
Jurong West 265010
Tampines 257020
Woodlands 255000
Sengkang 244910
Hougang 227110
Yishun 220810
Choa Chu Kang 191100
Punggol 170920
Ang Mo Kio 164430
Bukit Batok 154140
Bukit Merah 152600
Pasir Ris 148210
Bukit Panjang 139700
Toa Payoh 121060
Serangoon 116490
Geylang 110520
Kallang 101940
Queenstown 96470
Sembawang 96070
Clementi 92910
Bishan 88230
Jurong East 79230
Bukit Timah 77720
Novena 49390
Marine Parade 46450
Tanglin 21710
Outram 19050
Rochor 12860
River Valley 10180
Newton 8000
Singapore River 2940
Downtown Core 2500
Mandai 2060
Southern Islands 1880
Changi 1790
Orchard 900
Sungei Kadut 700
Western Water Catchment 680
Museum 430
Seletar 260
Lim Chu Kang 70
Boon Lay 0
Central Water Catchment 0
Changi Bay 0
Marina East 0
Marina South 0
North-Eastern Islands 0
Paya Lebar 0
Pioneer 0
Simpang 0
Straits View 0
Tengah 0
Tuas 0
Western Islands 0

3.5) Create facets visualisation containing bar plots to compare Number of Males and Females in each Age Group.

ggplot(df_age_and_gender) + geom_bar(stat="identity", aes(x=Sex,y=Total_Population,fill=Sex)) + facet_wrap(~Age_Group)

4) Final Visualisation and Insights

Data Table visualisation showing Total Population across Planning Areas

Planning_Area Total_Population
Bedok 279970
Jurong West 265010
Tampines 257020
Woodlands 255000
Sengkang 244910
Hougang 227110
Yishun 220810
Choa Chu Kang 191100
Punggol 170920
Ang Mo Kio 164430
Bukit Batok 154140
Bukit Merah 152600
Pasir Ris 148210
Bukit Panjang 139700
Toa Payoh 121060
Serangoon 116490
Geylang 110520
Kallang 101940
Queenstown 96470
Sembawang 96070
Clementi 92910
Bishan 88230
Jurong East 79230
Bukit Timah 77720
Novena 49390
Marine Parade 46450
Tanglin 21710
Outram 19050
Rochor 12860
River Valley 10180
Newton 8000
Singapore River 2940
Downtown Core 2500
Mandai 2060
Southern Islands 1880
Changi 1790
Orchard 900
Sungei Kadut 700
Western Water Catchment 680
Museum 430
Seletar 260
Lim Chu Kang 70
Boon Lay 0
Central Water Catchment 0
Changi Bay 0
Marina East 0
Marina South 0
North-Eastern Islands 0
Paya Lebar 0
Pioneer 0
Simpang 0
Straits View 0
Tengah 0
Tuas 0
Western Islands 0

Insights

  1. The Data Table shows us the total population size for each planning area in Singapore.

  2. We can see that Bedok is the Planning Area with the largest population size.

  3. The distribution of the Singapore population across the different Planning Areas is uneven, with large differences in population size across different Planning Areas.

Facets containing bar plots which compares the Number of Males and Females in each Age Group.

Insights

  1. This visualisation allows us to compare the population size of males and females across different age groups.

  2. From this, we can see that the number of females and the number of males in each age group are quite even.

  3. From the height of the bars, we can also see that population numbers are concentrated in the middle aged age groups, since the height of the bars are the tallest in the age groups between “25 to 29” and “55 to 59”.

  4. From the height of the bars, we can also see that as age group increases, the population size increases until the older age categories. We can see that after the “55 to 59” age group, the population numbers begins to decrease for both males and females.

  5. Since the population numbers are concentrated in the middle aged age groups, Singapore is likely to face an ageing population issue in the future, when the middle aged Singaporeans eventually become older. In addition, since the total population in the young age groups are lower currently, there will not be enough people from these young age groups to replace the middle aged Singaporeans when the middle age Singaporeans become older.