Demographic structure of Singapore’s population by age cohort and planning area

This dataset Singapore Residents by Subzone and Type of Dwelling, 2011 - 2019 is retrieved from Data.gov.sg. This publication aims to reveal the demographic structure of Singapore’s population by age cohort (e.g., 0-4, 5-9, ……) and identify trends in residence in non-mature estates in Singapore.

1. Major Data and Design challenges

Major challenge Description
Data The age group is recorded as x_to_y, and as a result the age group 5_to_9 will not appear in the correct order when sorted ascending.
Design Generating a population pyramid is not straightforward with ggplot. This involves searching for codes for parameters like coord_flip.
Design Creating a pie chart with ggplot is involves creating a bar chart first and preventing overlaps of percentage labels.
Design Due to the multiple lines present in the line graph, fig.height and fig.width must be specified larger, and expand the x axis if needed.

Proposed Sketch Design

sketch 2 sketch 1

2. Step-by-step description of how data visualisation was prepared using ggplot2

Load libraries and dataset

Steps:
1. Set working directory & load essential libraries
2. Store the dataset into df
3. View the dimensions of the dataset, along with the structure of variables and an overview of each column and its records
4. Check for NA values

knitr::opts_chunk$set(warning = FALSE, error = TRUE, message = FALSE, results = 'asis')
setwd("C:/Users/X Lin/Desktop/Notes/4.1/VA/Assignment 4")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
df <- read.csv("planning-area-subzone-age-group-sex-and-type-of-dwelling-june-2011-2019.csv")
dim(df) #check no. of observations
## [1] 883728      7
head(df, n = 20) # view first 20 rows
##    planning_area                subzone age_group     sex
## 1     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4   Males
## 2     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4   Males
## 3     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4   Males
## 4     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4   Males
## 5     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4   Males
## 6     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4   Males
## 7     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4   Males
## 8     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4   Males
## 9     Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Females
## 10    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Females
## 11    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Females
## 12    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Females
## 13    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Females
## 14    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Females
## 15    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Females
## 16    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Females
## 17    Ang Mo Kio Ang Mo Kio Town Centre    5_to_9   Males
## 18    Ang Mo Kio Ang Mo Kio Town Centre    5_to_9   Males
## 19    Ang Mo Kio Ang Mo Kio Town Centre    5_to_9   Males
## 20    Ang Mo Kio Ang Mo Kio Town Centre    5_to_9   Males
##                           type_of_dwelling resident_count year
## 1                  HDB 1- and 2-Room Flats              0 2011
## 2                         HDB 3-Room Flats             10 2011
## 3                         HDB 4-Room Flats             30 2011
## 4           HDB 5-Room and Executive Flats             50 2011
## 5  HUDC Flats (excluding those privatised)              0 2011
## 6                        Landed Properties              0 2011
## 7        Condominiums and Other Apartments             40 2011
## 8                                   Others              0 2011
## 9                  HDB 1- and 2-Room Flats              0 2011
## 10                        HDB 3-Room Flats             10 2011
## 11                        HDB 4-Room Flats             30 2011
## 12          HDB 5-Room and Executive Flats             60 2011
## 13 HUDC Flats (excluding those privatised)              0 2011
## 14                       Landed Properties              0 2011
## 15       Condominiums and Other Apartments             40 2011
## 16                                  Others              0 2011
## 17                 HDB 1- and 2-Room Flats              0 2011
## 18                        HDB 3-Room Flats             10 2011
## 19                        HDB 4-Room Flats             30 2011
## 20          HDB 5-Room and Executive Flats             60 2011
table(df$planning_area) # count of unique values of planning area
## 
##              Ang Mo Kio                   Bedok                  Bishan 
##                   32832                   21888                    8208 
##                Boon Lay             Bukit Batok             Bukit Merah 
##                   10944                   24624                   46512 
##           Bukit Panjang             Bukit Timah Central Water Catchment 
##                   19152                   21888                    2736 
##                  Changi              Changi Bay           Choa Chu Kang 
##                    8208                    2736                   16416 
##                Clementi           Downtown Core                 Geylang 
##                   24624                   32832                   13680 
##                 Hougang             Jurong East             Jurong West 
##                   27360                   27360                   24624 
##                 Kallang            Lim Chu Kang                  Mandai 
##                   24624                    2736                    8208 
##             Marina East            Marina South           Marine Parade 
##                    2736                    2736                   13680 
##                  Museum                  Newton   North-Eastern Islands 
##                    8208                   16416                    2736 
##                  Novena                 Orchard                  Outram 
##                   13680                    8208                   10944 
##               Pasir Ris              Paya Lebar                 Pioneer 
##                   21888                   13680                   13680 
##                 Punggol              Queenstown            River Valley 
##                   19152                   41040                   13680 
##                  Rochor                 Seletar               Sembawang 
##                   27360                   10944                   24624 
##                Sengkang               Serangoon                 Simpang 
##                   19152                   19152                   10944 
##         Singapore River        Southern Islands            Straits View 
##                    8208                    5472                    2736 
##            Sungei Kadut                Tampines                 Tanglin 
##                   13680                   13680                   10944 
##                  Tengah               Toa Payoh                    Tuas 
##                    2736                   32832                   16416 
##         Western Islands Western Water Catchment               Woodlands 
##                    8208                    2736                   24624 
##                  Yishun 
##                   24624
table(df$age_group) # count of unique values of age group
## 
##      0_to_4    10_to_14    15_to_19    20_to_24    25_to_29    30_to_34 
##       46512       46512       46512       46512       46512       46512 
##    35_to_39    40_to_44    45_to_49      5_to_9    50_to_54    55_to_59 
##       46512       46512       46512       46512       46512       46512 
##    60_to_64    65_to_69    70_to_74    75_to_79    80_to_84    85_to_89 
##       46512       46512       46512       46512       46512       46512 
## 90_and_over 
##       46512
table(df$type_of_dwelling) # count of unique values for type of dwelling 
## 
##       Condominiums and Other Apartments                 HDB 1- and 2-Room Flats 
##                                  110466                                  110466 
##                        HDB 3-Room Flats                        HDB 4-Room Flats 
##                                  110466                                  110466 
##          HDB 5-Room and Executive Flats HUDC Flats (excluding those privatised) 
##                                  110466                                  110466 
##                       Landed Properties                                  Others 
##                                  110466                                  110466
colSums(is.na(df)) # check for NA values
##    planning_area          subzone        age_group              sex 
##                0                0                0                0 
## type_of_dwelling   resident_count             year 
##                0                0                0
is.numeric(df$resident_count) # check if resident_count is numeric
## [1] TRUE

Code to generate population pyramid by gender and age in Singapore, 2019

  1. Split the dataset into 2 separate datasets for male and females
  2. Group the respective datasets according to age group
  3. Inner join the 2 datasets using age group as the key column
  4. Create a pyramid chart using geom_bar and coord_flip
  5. The code for the pyramid chart is saved into pyramid
#split the data set to females and males by resident count
males <- filter(df, sex == "Males" & year == 2019) %>% select(age_group, resident_count) #filter df to males and year 2019
males_agg <- males %>% group_by(age_group) %>% summarise(Male = sum(resident_count)) #sum resident counts by age group
males_agg[10,1] <- "05_to_9"
males_agg[1,1] <- "00_to_4"

#repeat for female 
females <- filter(df, sex == "Females" & year == 2019) %>% select(age_group, resident_count) 
females_agg <- females %>% group_by(age_group) %>% summarise( Female = sum(resident_count)) 
females_agg[10,1] <- "05_to_9" 
females_agg[1,1] <- "00_to_4"

library(sqldf)
agg <- sqldf("SELECT age_group, Female, Male from males_agg inner join females_agg using(age_group) order by age_group asc") #inner join using SQL
agg # age group 5-9 would be in the wrong order, thus I renamed it to 05-9 and 0-4 to 00-4
 age_group Female   Male

1 00_to_4 90850 94730 2 05_to_9 97040 101290 3 10_to_14 102550 105830 4 15_to_19 108910 113730 5 20_to_24 122480 127040 6 25_to_29 145960 142640 7 30_to_34 153460 140360 8 35_to_39 158850 142310 9 40_to_44 157120 144130 10 45_to_49 160230 151800 11 50_to_54 152750 149360 12 55_to_59 153590 153850 13 60_to_64 140770 138490 14 65_to_69 112900 108920 15 70_to_74 79190 71450 16 75_to_79 52680 42460 17 80_to_84 36230 26230 18 85_to_89 21430 12490 19 90_and_over 13730 5590

final_agg <- agg %>% gather(key = sex,residents, -age_group)
final_agg$age_group <- gsub("_to_","-", final_agg$age_group) #remove underscores in age group
final_agg$age_group

[1] “00-4” “05-9” “10-14” “15-19” “20-24”
[6] “25-29” “30-34” “35-39” “40-44” “45-49”
[11] “50-54” “55-59” “60-64” “65-69” “70-74”
[16] “75-79” “80-84” “85-89” “90_and_over” “00-4”
[21] “05-9” “10-14” “15-19” “20-24” “25-29”
[26] “30-34” “35-39” “40-44” “45-49” “50-54”
[31] “55-59” “60-64” “65-69” “70-74” “75-79”
[36] “80-84” “85-89” “90_and_over”

library(RColorBrewer)
as.factor(final_agg$sex)

[1] Female Female Female Female Female Female Female Female Female Female [11] Female Female Female Female Female Female Female Female Female Male
[21] Male Male Male Male Male Male Male Male Male Male
[31] Male Male Male Male Male Male Male Male
Levels: Female Male

options(scipen = 999)
pyramid<-ggplot(final_agg, aes(x = as.factor(age_group), y = residents, fill = sex)) + 
         geom_bar(data = subset(final_agg, sex == "Female"), stat = "identity") + 
        geom_bar(data = subset(final_agg, sex == "Male"), stat = "identity",
                 mapping = aes(y = -residents)) + 
                scale_colour_manual(values = c("pink", "steelblue"),
                      aesthetics = c("colour", "fill")) +
        ggtitle("Population pyramid of Singapore in 2019") +
        ylab("Number of residents") + 
  scale_y_continuous(labels = abs) +
        xlab("Age group") + 
  coord_flip()

Resident growth in non-mature estates between 2011 to 2019

Over the years, the Singapore government has been implementing ways to develop non-mature housing estates in a bid to attract potential homeowners to buy homes in these non-mature areas (which are not as favorable compared to mature estates). Non mature estates are not as attractive as they are either considered remote. More information on mature vs. non-mature estates can be found here non mature estates by Redbrick, a mortgage advisor. Here, we want to see how resident counts have changed between 2011 to 2019, and possibly identify which non-mature housing estates are gaining popularity amongst home buyers.

In 2019, the list of non-mature estates identified are:
1. Bukit Batok
2. Bukit Panjang
3. Choa Chu Kang
4. Hougang
5. Jurong East
6. Jurong West
7. Punggol
8. Sembawang
9. Sengkang
10. Woodlands
11. Yishun

Steps:
1. Narrow the data df to only those where planning area is a non-mature estatem, saved as df_nonmature 2. Group df_nonmature by planning area and year 3. Use geom_line to plot a time series graph of resident count
3. The code for the line graph is saved to line.

df_nonmature <- df %>% filter(planning_area %in% c("Bukit Batok", "Bukit Panjang", "Choa Chu Kang", "Hougang", "Jurong East",
                                                   "Jurong West", "Punggol", "Sembawang","Sengkang", "Woodlands","Yishun"))
df_nonmature2 <- sqldf("select sum(resident_count) as total_residents, planning_area, year from df_nonmature group by planning_area, year")
options(scipen = 999)
line <- df_nonmature2 %>% ggplot(aes( x = year, y = total_residents, group = planning_area)) + geom_line(aes(color = planning_area, alpha = 1, show.legend = FALSE), size = 3) + geom_text(data = df_nonmature2 %>% filter(year == 2019), aes(label = planning_area), color = "white", hjust = -.01, size = 7)  +
  ggtitle("Line graph of resident count in non mature estates") + ylab("Total number of residents")   + theme_dark(base_size = 25) + theme(legend.position="none")  + scale_x_continuous(expand = c(.125, 0), breaks = seq(2011, 2019,1))

Dominant age group and type of dwelling in non-mature estates in 2019

We are now concerned with two characteristics of households residing in non-matures estates.

1. Age group
Non-mature estates are mostly home to young couples and families, thus we aim to find out if this is the case by plotting a bar plot of the distribution of age groups for each non-mature estate.

Steps:
1. Filter df_nonmature to only records with year 2019 and group the resulting data set by age group and planning area
2. Create a bar chart using geom_bar
3. The code for the bar chart is saved as barchart

df_nonmature_2019 <- df_nonmature %>% filter(year == 2019)
df_nonmature_20192 <- sqldf("select sum(resident_count) as total_residents, planning_area, age_group from df_nonmature_2019 group by planning_area, age_group")
df_nonmature_20192$age_group <- gsub("5_to_9", "05_to_9", df_nonmature_20192$age_group)
df_nonmature_20192$age_group <- gsub("_to_", "-", df_nonmature_20192$age_group)
df_nonmature_20192$age_group <- gsub("_", " ", df_nonmature_20192$age_group)
# bar plot of each estate
library(RColorBrewer)
barchart <-ggplot(df_nonmature_20192, aes(x = as.factor(age_group), y = total_residents, fill = -total_residents)) + geom_bar(stat = "identity") +
 facet_grid(planning_area~.) + theme_bw(base_size = 16) + ggtitle("Bar plot of residents in non-mature estates by age group") + 
  ylab("Number of residents") + xlab("Age group")  + scale_fill_continuous(type = "gradient") + theme_gray(base_size = 22) + guides(fill=guide_legend(title="Total Residents"))

2. Type of dwelling
We are also interested to see which house (i.e. HDB/Landed house/Condominium) is the predominant dwelling in non-mature estates in 2019.

Steps:
1. Using the dataset filtered by year 2019 and for non mature estates df_nonmature_dwelling2019, group the dataset by type of dwelling
2. Create a new column perc, which calculates the percentage of each type of dwelling in 2019
3. Create a stacked bar chart bp using geom_bar
4. The geom_text_repel as part of the ggrepel package prevents any overlapping of labels. This is used in place of geom_text for a neater pie chart
5. Using the bar chart bp created in step 3, apply coord_polar to turn the bar chart into a circular pie chart
6. The code for the pie chart is saved as pie

df_nonmature_dwelling2019 <- df_nonmature_2019 %>% group_by(type_of_dwelling) %>% summarise(total_residents = sum(resident_count))

df_nonmature_dwelling2019$perc <- round((df_nonmature_dwelling2019$total_residents/ sum(df_nonmature_dwelling2019$total_residents))*100,1)

df_nonmature_dwelling20192 <- df_nonmature_dwelling2019[-c(6,8),] # remove HUDC and other apartments which are not large figures

bp <- df_nonmature_dwelling20192 %>% ggplot(aes(x = "", y = total_residents, fill = type_of_dwelling)) + geom_bar(width = 1, stat = "identity", color = "black") #create a stacked bar chart before making a pie chart

library(ggrepel)
pie <- bp + coord_polar("y", start=0) + geom_text_repel(aes(label = paste(perc, "%"), x = 1.6),position = position_stack(vjust = 0.3),size=3) + scale_x_discrete(expand = c(0,0)) + guides(fill=guide_legend(title="Type of Dwelling")) +
  theme_void() + scale_fill_brewer(palette="Pastel2") #make pie chart using coord_polar

3. Final data visualisation

Population pyramid by age group in Singapore, 2019

pyramid

Number of residents in non-mature estates (2011-2019)

The graph shows that as of 2019, the top 3 non mature estate by resident count are (in order): Jurong West, Woodlands, and Sengkang.

Notably, the non mature estates Sengkang, Punggol, and Sembawang have seen a significant increase in the number of residents residing in these locations (seen by the steeper slopes in recent years). This is expected, as the governmnent has been developing these areas extensively. For example, connectivity within Punggol has improved tremendously over the years, with multiple Light Rail Transit (LRT) stations built to provide greater accessibility to condominiums and public housing located further from Punggol MRT station. Other major projects in Punggol include the Punggol Digital District, which is envisioned to be a vibrant learning hub with shared spaces for industry and academia; the Punggol Town Hub, which brings together services and programmes to serve its residents. More amenities and lifestyle destinations are expected to be developed in Sengkang and Sembawang, including recreational and green spaces for residents.

line

Age distribution of residents in non-mature estates (2019)

From the bar graph, we see that younger households predominate in Sengkang and Punggol - most residents living in Sengkang and Punggol are aged between 35 to 39. We also do see more children (i.e. those aged below 10) in Sengkang and Punggol in comparison to the other non-mature estates. This suggests that Sengkang and Punggol are popular non-mature estate choices amongst younger couples and families. In areas such as Yishun, Hougang, and Woodlands, majority of the residents belong to older age groups (i.e. 50-60).

barchart

Type of dwelling in non-mature estates (2019)

From the pie chart, HDB flats is the dominant type of housing in non-mature estates. In particular, most residents live in HDB 4-room flats (39.7%) or HDB 5-room flats (33.1%). The number of residents living in Condominiums, landed properties, and 1 or 2 room HDB flats are considerably lower. This is aligned with the government’s plan to increase the supply public housing (HDB) in non-mature estates.

pie